Exploring the Role of Pfam Families in Protein Function.
Meyer, Daniel J.
- Many computational functional inference methods use GO for their set of functional labels. While informative motif informationleveraging structure is already captured in libraries of Hidden MarkovModels (HMMs), such as Pfam, creating a useful Pfam to GO mappingremains a difficult endeavor. This is because, it is amany-to-many mapping, where different Pfam annotations withina protein structure, eit... read moreher individually, or as a set, might yielddifferent amounts of specificity in regards to the set of possible GOlabels that are appropriate. Estimating the amount of specificitythat a single, or set of, Pfam-derived domains gives, in regards to GOlabeling, is confounded by the unequal representation and/or the lackof coverage of annotation in both domains across the protein universe.We revisit issues of coverage, diversity, and representationin the light of all the new data in current sequence databases. Wehave developed a suite of parsers and an Object-Relational Mappingusing Python and SQLAlchemy to represent selected information ofproteins and families from the UniProt and Pfam databasesrespectively. We use this framework to compare dcGO (Fang andGough, 2013) and GODM (Alborzi et al., 2017), which are designed tooptimize different tradeoffs for coverage versus false-positives.read less