About the SFLD

The Structure-Function Linkage Database (SFLD) is a hierarchical classification of enzymes that relates specific sequence-structure features to specific chemical capabilities. It is developed by the Babbitt Laboratory in collaboration with the UCSF Resource for Biocomputing, Visualization, and Informatics .

Questions and suggestions can be sent to sfld-help@cgl.ucsf.edu or to the discussion forum sfld-users@cgl.ucsf.edu. The forum is open to all interested parties (subscribe ).

Organization and Terminology

The SFLD classifies evolutionarily related enzymes according to shared chemical functions and maps these shared functions to conserved active site features. The classification is hierarchical, where broader levels encompass more distantly related proteins with fewer shared features.

Shared chemical functions could include:

  • catalyzing a specific partial reaction
  • stabilizing a specific type of reaction intermediate
  • binding a metal ion or other cofactor

Levels of classification:

  • A family is a set of evolutionarily related enzymes that catalyze the same overall reaction.
  • A superfamily is a broader set of evolutionarily related enzymes with a shared chemical function that maps to a conserved set of active site features. Superfamilies in which members can be highly divergent and catalyze many different overall reactions are termed functionally diverse or mechanistically diverse. Such superfamilies tend to exhibit complicated structure-function relationships and pose challenges to protein annotation and design.

For example, the figure below shows striking conservation of active site residues among diverse members of the enolase superfamily. The conserved residues (left) participate in a common partial reaction, proton abstraction, within very different overall reactions (right):

enolase SF active sites enolase SF reactions
Above: Superposition of active sites of diverse members of the enolase superfamily. Sidechain and ligand carbon atoms are shown in a different color for each structure, divalent metal ions in yellow, oxygens red, and nitrogens blue. Right: Some of the chemical reactions catalyzed by enolase superfamily members. The proton abstracted to initiate each reaction is shown in red.

Additional levels in the hierarchy:

  • A functional domain (or enzyme functional domain) is a single member of a family, either a whole protein or the domain(s) responsible for the enzymatic activity.
  • A subgroup is a set of evolutionarily related enzymes that have more shared features than the superfamily as a whole, but may still catalyze different overall reactions (narrower than a superfamily but possibly including more than one family).
  • A suprafamily is a set of evolutionarily related enzymes with shared active site features that are used in substantially different ways, so that they no longer map to a shared chemical function (broader than a superfamily).

See also the SFLD glossary.

Philosophy and Scope

The Core SFLD is mainly concerned with superfamilies that are functionally diverse, whereas the Extended SFLD may contain superfamilies that are not functionally diverse, or that are functionally diverse but not highly curated. These two sections of the SFLD are shown in separate tables in the Browse by Superfamily page.

Among enzyme resources, the Core SFLD is unique in its emphasis on how conserved residues map to catalysis of partial reactions or other shared functions at a finer level of detail than overall reactions. The Core SFLD is highly curated; coverage of the enzyme universe is currently limited because deciphering sequence-structure-function relationships in functionally diverse superfamilies includes steps that are difficult to automate. However, superfamilies will be added to both sections of the SFLD and updated as analyses are performed. The SFLD provides evidence codes to clarify the source of a given piece of information and to provide a sense of its reliability. See also the SFLD caveats.

The SFLD will continue to evolve, both in content and in the development of new methods to explore the data.

Types of Data Available

  • step-by-step reaction mechanisms
  • a mapping of conserved functions to conserved sequence-structure features at different levels of classification
  • alignments of representative sequences at different levels of classification
  • Hidden Markov Models for comparison to query sequences
  • 3D protein structures
  • active site images and corresponding UCSF Chimera session files for many families with known 3D structures
  • sequence similarity networks for display and analysis with Cytoscape

Uses of the SFLD

  • exploring hierarchical sequence-structure-function relationships
  • classifying “unknown” sequences to annotate them or to identify misannotations
  • comparing a new sequence or structure with those in the database
  • identifying residues required to perform a specific partial reaction or other chemical function
  • finding promising starting points for engineering a new function

Selected References

The Structure-Function Linkage Database. Akiva E, Brown S, Almonacid DE, Barber AE 2nd, Custer AF, Hicks MA, Huang CC, Lauck F, Mashiyama ST, Meng EC, Mischel D, Morris JH, Ojha S, Schnoes AM, Stryke D, Yunes JM, Ferrin TE, Holliday GL, Babbitt PC. Nucleic Acids Res. 2014 Jan 1;42(1):D521-30.

Divergent evolution in enolase superfamily: strategies for assigning functions. Gerlt JA, Babbitt PC, Jacobson MP, Almo SC. J Biol Chem. 2012 Jan 2;287(1):29-34.

Inference of functional properties from large-scale analysis of enzyme superfamilies. Brown SD, Babbitt PC. J Biol Chem. 2012 Jan 2;287(1):35-42.

Toward mechanistic classification of enzyme functions. Almonacid DE, Babbitt PC. Curr Opin Chem Biol. 2011 Jun;15(3):435-42.

Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. Schnoes AM, Brown SD, Dodevski I, Babbitt PC. PLoS Comput Biol. 2009 Dec;5(12):e1000605.

Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. Atkinson HJ, Morris JH, Ferrin TE, Babbitt PC. PLoS One. 2009;4(2):e4345.

Using the Structure-Function Linkage Database to characterize functional domains in enzymes. Brown S, Babbitt P. Curr Protoc Bioinformatics. 2006 Mar;Chapter 2:Unit 2.10.

Leveraging enzyme structure-function relationships for functional inference and experimental design: the Structure-Function Linkage Database. Pegg SC, Brown SD, Ojha S, Seffernick J, Meng EC, Morris JH, Chang PJ, Huang CC, Ferrin TE, Babbitt PC. Biochemistry. 2006 Feb 28;45(8):2545-55.

Definitions of enzyme function for the structural genomics era. Babbitt PC. Curr Opin Chem Biol. 2003 Apr;7(2):230-7.

Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies. Gerlt JA, Babbitt PC. Annu Rev Biochem. 2001;70:209-46.