We define the reliability score S of an interactions between proteins pi and pj as a weighted sum of three subscores of domain (Sd), function (Sf), and sub-cellular localization (Sl). This scoring method is more general than a typical method of selecting co-functional or co-localized protein pairs since some pairs of proteins interact frequently even they are not co-functional or co-localized in a cell.

 

(1)

The domain score (Sd) is defined by equation 2, and the information of domain-domain pairs can be extracted from InterPro, which are determined by the information of BIND, DIP, and HPRD.

 

(2)

where ¢²I(di,dj) is the number of interactions between domains di and dj, and ¢²di and ¢²dj are the number of domains diand dj, respectively, participating in the interactions.

For the function score (Sf), we first assign the Gene Ontology (http://www.geneontology.org/GO.doc.html) molecular functions to all human proteins in HPRD, and then compute the matrix MGO (equation 3), where each entry MGO represents the ratio of the number of the interactions between GO functions fiand fj to the number of proteins with function fi or fj, or both.

(3)

Suppose that protein pa has functions f1, f2, and f3, and protein pb has function f1 and f4, and protein pc has function f4 (see Figure 1). If there are interactions (pa, pb) and (pb, pc), then MGO (f1, f4)=1/3 since there is a single interaction between functional groups f1 and f4 and there are three proteins with function f1 or f4.

The function score (Sf) was computed using equation 4, in which k is the number of MGO entrieswith nonzero values.

(4)

The sub-cellular localization (Sl) is computed using equation 5, in whichSIl(pi, pj) is the number of interactions between proteins pi and pj in a same compartment l, and N is the number of proteins that participate in protein-protein interactions. Interactions with Sl=0 are cases in which either source or target protein has no sub-cellular localization information.

(5)

 

 

 

 

 

 

 

 

 

 

 


Figure 1. Example of interpreting protein interactions as molecular function interactions