SFLD Projects under Development

Extended SFLD (XSFLD)

To address the limits of manual curation in achieving broader coverage of the superfamily universe and to keep pace with the increasing amount of data from genomic and community sequencing projects, we have instituted the Extended SFLD (XSFLD). This effort is aimed at structure-function mapping via automated creation of sequence similarity networks for a large proportion of the known enzyme superfamilies, regardless of whether they are functionally diverse. This project will enable fast, automated, and systematic extension of the coverage of enzyme superfamilies, albeit at a shallower level of curation than provided for the Core SFLD.

The current XSFLD superfamilies are tabulated in the Browse by Superfamily page. Of these, very few have any detailed functional annotation.

As our list of XSFLD superfamilies expands, we invite groups working in those areas to contact us with the possibilities of pursuing collaborations to improve the annotation for these superfamilies.

Identifying Potential Misannotations

Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis.

Using the methodology described in Schnoes et al. PLoS Comput Biol. (2009), we have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The results of this analysis are available in SFLD functional domain (FD) pages.

To see if a protein is potentially misannotated, use any of the Search by Enzyme options. If the protein is in the SFLD, the result will be an individual FD page or a list of FDs with links to their pages. If an FD has associated predicted misannotations, they will be shown in a table at the bottom of its page.

For example, searching for UniProtKB identifier O28181 gives an FD page with the following table of suspect annotations:

We are actively re-developing this functionality. Future updates will include:

  • a summary page for the predicted misannotated proteins at the superfamily level and thence the lower levels of the SFLD hierarchy
  • a search option
  • protocols to keep these data current