UCSF home page UCSF home page About UCSF UCSF Medical Center
UCSF navigation bar   
Browse by Superfamily Browse by Reaction Search by Enzyme Search by Reaction
Instructions Citing SFLD SFLD Staff Curator's Entrance


 SFLD Detailed Description
 

The Structure-Function Linkage Database (SFLD) is a tool for the investigation of protein sequence, structure, and function. In particular, it aims to provide explicit information concerning how a given protein, or family of proteins, delivers chemical functionality. It was developed by the Babbitt Laboratory in collaboration with the UCSF Resource for Biocomputing, Visualization, and Informatics.

It is organized around the concept of mechanistically diverse superfamilies. Members of such superfamilies are evolutionarily related and, in addition to structural similarities, retain a conserved aspect of function. For example, all members of a superfamily could catalyze the same partial reaction or stabilize the same type of intermediate. (For more information, see reviews [Babbitt, 2003] and [Gerlt and Babbitt, 2001].) Superfamilies in the database are divided into families consisting of enzymes that perform the same overall reaction. Some superfamilies include an intermediate classification of families into subgroups, whose definitions are superfamily-specific.

One of the best-characterized examples of a mechanistically diverse enzyme superfamily is the enolase superfamily (ES). As of October 2004, the ES contained over 700 different sequences representing 11 different experimentally characterized functions (eight published and three additional functions yet unpublished). Analyses of available sequences and structures suggest perhaps dozens of new functions are yet to be characterized. Several of the experimentally characterized ES functions are shown in Figure 1. Remarkably, all of these different reactions are mediated by highly similar overall structures and active sites.

The similarities in these active sites are associated with a partial reaction common to all members of the superfamily, i.e., abstraction of a proton on a carbon alpha to a carboxylate group. The active site machinery associated with this proton abstraction step (and consequent metal-assisted stabilization of the enolate anion intermediate that results) is conserved over all structurally characterized members of the superfamily [Babbitt, et. al., 1996].

Although their pairwise sequence identities can be as low at 10%, all sequences assigned to the superfamily show conservation of the proton abstraction machinery and metal-binding ligands. From superfamily analysis, approximately half of enolase superfamily sequences can be assigned a specific function that proceeds from the common type of intermediate to produce a range of different products. But even for the hundreds of ES sequences for which we cannot assign a specific function, we can confidently predict that all of their overall reactions will go through an enolate anion intermediate and that all of their substrates will contain the substructure moiety associated with the proton abstraction step. Thus, the superfamily context provides a rules-based approach for inference of function for all of its members.

The SFLD stores information about not only the overall reactions catalyzed, but also the mechanistic steps conserved by families and superfamilies such as the enolase superfamily, and the conserved sequence and structural features that perform them. The SFLD includes curated sequence alignments for superfamilies, subgroups, and families, along with the corresponding Hidden Markov Models (HMMs).

The SFLD can be used in different ways:

  • One can search the database with a sequence of unknown function to determine if it contains features conserved within a set of proteins of known function. Searching with a sequence entails comparison to HMMs rather than exact matching. Because the SFLD stores information about the conserved reactions and partial reaction mechanisms of families and superfamilies, being able to place a sequence into a family or superfamily can go a long way toward determining its function.
  • One can search the database by reaction, to find proteins, families, and superfamilies that perform a given overall or partial reaction. This is exceptionally useful in enzyme engineering, where one has a desired chemical functionality in mind and seeks an appropriate structural template. The list of reactions can be browsed directly, or searched with an Enzyme Commission (EC) number or chemical structure.

The SFLD allows the nature of the data to be examined as well. Nearly every important assignment of function, structure, or family classification comes with a detailed evidence code to allow users to understand how each decision was made. Links to literature references are included when available. In addition, the SFLD includes metadata fields in nearly every table, allowing curators to enter additional information about specific elements.

The SFLD will continue to evolve, both in the addition of content and in the development of new methods to explore and display the data. Users should be aware of some current limitations. While there will undoubtedly be growing pains, the goal is to create a resource of use to a wide community. Comments and suggestions are welcome (send e-mail to sfld-help@cgl.ucsf.edu).

Users wishing to cite the SFLD should use the reference:

Pegg SC, Brown SD, Ojha S, Seffernick J, Meng EC, Morris JH, Chang PJ, Huang CC, Ferrin TE, Babbitt PC. "Leveraging enzyme structure-function relationships for functional inference and experimental design: the structure-function linkage database.", Biochemistry, 2006 Feb 28;45(8):2545-55

The SFLD is developed by the Babbitt Laboratory in collaboration with the UCSF Resource for Biocomputing, Visualization, and Informatics. Funding is provided by NIH grant R01-GM60595 (Babbitt), NSF grant DBI-0234768 (Babbitt), and NIH grant P41-RR01081 (Ferrin).

 


Contact us at sfld-help@cgl.ucsf.edu.