SFLD Archive Caveats

Superfamily content

  • Due to the deep level of manual curation, the SFLD contains a limited set of superfamilies.
  • The SFLD is primarily concerned with mechanistically diverse superfamilies, not superfamilies whose members all catalyze essentially the same chemical transformation (e.g., phosphorylation).

Sequence and structure content

  • The SFLD is intended to include representative sets of sequences rather than all known sequences belonging to a family, subgroup, or superfamily.
  • The SFLD may include multiple sequences that actually describe the same protein, but are considered unique because of differences in length or altered residues (for example, a phosphorylated residue in a structure may be represented by an X in the corresponding sequence file).
  • The SFLD includes mutant sequences, but if a family functional residue is mutated, that sequence will only be annotated to the superfamily or subgroup level (not to the family).
  • Though the SFLD includes NCBI gi number among the available sequence identifiers that link to external databases, the NCBI has phased out the use of sequence gi numbers. These numbers are still included in the SFLD for historical compatibility, but should no longer be used as primary sequence identifiers.
  • As of April 2019, data in the SFLD is no longer being updated. Thus, newly sequenced proteins and crystal structures will not be found in the database. Further, annotation information, such as the reaction(s) catalyzed by a given enzyme, may not be up to date.

Reaction information

  • Not all reactions have been assigned an EC number .
  • For very promiscuous enzymes, not all reactions catalyzed may be included, or even known.

Links to Other Databases

  • Links to other databases have been maintained in the SFLD archive where possible. However, because outside databases may change URL structure at any time, such links may not be functional.

Hidden Markov Models (HMMs)

  • HMMs were not built for families with insufficient sequence data (those with only one member or two highly similar members).
  • The statistical significance of a match to an HMM is not the same as biological significance; other information such as conservation of residues important for function, operon context, etc., should also be considered when available.

Sequence similarity networks