FunTree uses the CATH classification to annotate and analyse super families of enzymes. In particular the relationships between known 3 dimensional structures, associated sequences and the enzyme chemistry is highlighted. This can be used to place structures and sequences in context of their evolution and the range and similarities / differences in the reaction chemistry (both bond order changes and small molecule substructure). In addition it can be used to infer the possible range of reactions of enzyme structures and sequences that do not have a known function as well as indicating other reactions or substrates available to an enzyme that as yet may not have been observed (enzyme promiscuity).
Often the domain super families can be highly diverse. This diversity prohibits confident atomic superimposition of all the structures in the super family. Thus, the super family is sub-clustered into distant structural clusters, where all structures have a SIMAX score of less than 9 Angstroms between each other (where SIMAX is the RMSD between two domains multiplied by the he number of residues in the larger domain divided by the number of aligned residues).
The structural superimposition is used to generate a structure guided sequence alignment based on a novel agglomerative clustering technique. The resulting alignment is then used to generate a phylogenetic tree using the methods implemented in TreeBest. The species tree required by TreeBest to guide the phylogenetic tree is derived from the species relationships as defined by the NCBI Taxonomic definitions.
For each of the domain structures / sequences the Enzyme Commission (E.C.) code is collect (if it has been assigned) either from UniProtKB or from the annotation made by the depositing crystallographer. The reaction (or reactions) associated with that EC code is compared to each other using ECBlast and the substrate and product small molecule substructure similarities are also compared using SMSD.
Please see the following papers
Various views of the collected and processed data are shown (described below) for the entire super family and for each of the sub-clusters.
A blue number - this is the confidence score provided by TreeBest for the bifurcation at the node. Please note that as these trees are automatically generated some of the trees might have low confidence scores at nodes in the tree. These trees should be considered with caution as they may not be reliable.
The first number/text section is the node name (internal to FunTree) made up of a number and the taxonomic code. The next three circles represents the similairty between overall reactions (if the branch represnts an enzyme). Colouring is based on the degree of similarity as calulated by EC-Blast and clustered using PVClust in R. The three circles represent bond order, reaction center and sub-structure similairty scores represctivly. These are folowed by the primary EC number, which is hyperlinked to the IntEnz database, and the UniprotKB identifier (that links to the UniprotKB record). If the sequence represnts a known stuctureal domain then the PDB identified (linked to PDBe entry) and CATH domain (lined to the CATH domain pages) are shown.
Finally on the far right the multi domain architecture of the protein of each leaf is depicted. Domains are given unique colours. Domain annotation is derived from Gene3D. The domains are shown as bars along a line, the position and relative proportion of which is proportional to the total sequence. Hovering over each bar will show the domain CATH code.
It is important to note that both GO and E.C. annotations are assigned to entire gene products and not to domains. Thus as FunTree is a domain centric resource some annotations might not be being performed soley by that particular domain. Most functions can be asscribed to a single domain but many are a product of domain combinations or multipule gene products i.e. molecular machines. We and others are working on methods to over come this annotation problem.
This shows a reduced version of the phylogenetic tree where all the nodes that ahve an E.C. annotation (at the E.C. third level) are shown as a circular tree. At the end of each branch the E.C. code is shown. as well as the FunTree leaf idenifier. By hovering over a leaf the contrbution that the function has at each internal node in the lineage is shown. Conversely, hovering over an internal node shows the most probable function at that node.
This page shows a similarity tree of all the small molecules found in all the reaction in the cluster / super family. The similarities are calculated using SMSD and the clustering is made using the PVClust methods as implemented in R.
This page shows three similarity trees of all the reactions molecules found in the cluster / super family. The similarities are calculated using ECBlast and the clustering is made using the PVClust methods as implemented in R. Three different trees can be selected for. Each tree is based on a different measure of reaction similarity: bond order changes only; overall small molecule substructure similarity or a combination of the two. Each leaf shows a schematic of the reaction. The default shown is for bond similarity.
This page shows the E.C. hierarchy as an un-rooted tree. The braches corresponding to the E.C. codes found in the cluster / super family are highlighted and the E.C. code shown in blue at the leaf and internal nodes.
The alignment page shows a JalView applet of the alignment used to build the phylogenetic tree. The sequences in the alignment that have a known structure are annotated by secondary structure and catalytic side residues as catalogued the CSA. If the active site residues are found in the curated section of the CSA it is coloured bright red. Otherwise if it comes from the predicted section it is coloured light red.
This page shows an interactive force directed graph of the multi domain architectures as generated by ArchSchema. In addition to the MDA's, the E.C. codes annotated by UniProtKB for sequences for each MDA are shown. The key to both the MDA's is shown in the table to the left of the graph.