|
|
Superfamily content:
- due to the intense level of curation,
the SFLD currently contains a limited set of superfamilies;
however, more superfamilies will be added as analyses are performed
- the scope of the SFLD is to include
mechanistically diverse superfamilies,
not superfamilies whose members catalyze essentially
the same chemical transformation (e.g., phosphorylation)
Sequence and structure content:
- the SFLD is intended to include representative sets of sequences
rather than all known sequences belonging
to a family,
subgroup, or
superfamily
- the SFLD may include multiple sequences that actually describe
the same protein, but are considered unique because of
differences in length or altered residues (for example,
a phosphorylated residue in a structure may be represented by an X
in the corresponding sequence file)
- the SFLD includes mutant sequences, but none with mutations in
residues identified as catalytic
- the SFLD includes mutant structures, no matter which residues
are mutated; however, structures with mutated catalytic residues are linked
to the corresponding wild type sequences
Reaction information:
- SMILES/SMARTS
were not designed to represent enzymatic reactions,
and are insufficient for describing active site features such as
metal ions or interactions that stabilize an intermediate or
transition state
- not all reactions have been assigned EC numbers
- for very promiscuous enzymes, not all reactions catalyzed may be
included, or even known
Hidden Markov Models:
- HMMs are not built for families
with insufficient sequence data
(those with only one member or two highly similar members)
- the statistical significance of a match to an HMM is not the same as
biological significance; other information such as conservation of
residues important for function, operon context, etc.,
should also be considered when available
|
|
|