Overview of FunTree

FunTree uses the CATH classification to annotate and analyse super families of enzymes. In particular the relationships between known 3 dimensional structures, associated sequences and the enzyme chemistry is highlighted. This can be used to place structures and sequences in context of their evolution and the range and similarities / differences in the reaction chemistry (both bond order changes and small molecule substructure). In addition it can be used to infer the possible range of reactions of enzyme structures and sequences that do not have a known function as well as indicating other reactions or substrates available to an enzyme that as yet may not have been observed (enzyme promiscuity).

Often the domain super families can be highly diverse. This diversity prohibits confident atomic superimposition of all the structures in the super family. Thus, the super family is sub-clustered into distant structural clusters, where all structures have a SIMAX score of less than 9 Angstroms between each other (where SIMAX is the RMSD between two domains multiplied by the he number of residues in the larger domain divided by the number of aligned residues).

The structural superimposition is used to generate a structure guided sequence alignment based on a novel agglomerative clustering technique. The resulting alignment is then used to generate a phylogenetic tree using the methods implemented in TreeBest. The species tree required by TreeBest to guide the phylogenetic tree is derived from the species relationships as defined by the NCBI Taxonomic definitions.

For each of the domain structures / sequences the Enzyme Commission (E.C.) code is collect (if it has been assigned) either from UniProtKB or from the annotation made by the depositing crystallographer. The reaction (or reactions) associated with that EC code is compared to each other using ECBlast and the substrate and product small molecule substructure similarities are also compared using SMSD.

Please see the following papers

  • Furnham N, Sillitoe I, Holliday GL, Cuff AL, Rahman SA, Laskowski RA, Orengo CA, Thornton JM. FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies. Nucleic Acids Res. 2012 Jan;40(Database issue):D776-82.
    Link to Paper
  • Furnham N, Sillitoe I, Holliday GL, Cuff AL, Laskowski RA, Orengo CA, Thornton JM. Exploring the evolution of novel enzyme functions within structurally defined protein superfamilies. PLoS Comput Biol. 2012;8(3):e1002403.
    Link to Paper
  • Various views of the collected and processed data are shown (described below) for the entire super family and for each of the sub-clusters.

    Explinations of the various data views.

  • Rooted Phylogenetic Tree (with Clusters)

    For each of the clusters the alignments generated can be used to generate phylogenetic trees (see Overview). As these trees can become quite large and contain a lot of information they can become difficult to easily navigate. To mitigate this we have developed a tool for allowing the trees to be navigated based in the Google Maps API. Thus each tree can be navigated in the same way geographic maps can be navigated in Google Maps. The mouse wheel can be used to zoom in / out and by pressing and holding the left mouse button the tree can be moved in the same direction as the mouse. The map can be re-centerd by clicking on the middle of the direction buttons above the zoom level on the top left of the tree window. An overview (the blue box in which outlines the current view) can also be used to navigate the tree.

    At each node to the tree the following information can be found:

    SUP - this is a hyperlink to a new window with 'superimposed structures' in the clade rooted at that node. Following the link with open a new window, launching the JMol applet, to show all the structures found in the clade (headed by the node at which you are at). The structures are superimposed on each other based on the superimposition used to generate the alignment. Each structure is shown with a cartoon representation and coloured by the colour given to the E.C. code at the tree leaf (see below). In addition the know active site residues (as catalogued by the CSA) are highlighted as space filled atoms.

    A blue number - this is the confidence score provided by TreeBest for the bifurcation at the node. Please note that as these trees are automatically generated some of the trees might have low confidence scores at nodes in the tree. These trees should be considered with caution as they may not be reliable.

    At the end of each branch is the following:

    The first number/text section is the node name (internal to FunTree) made up of a number and the taxonomic code. The next three circles represents the similairty between overall reactions (if the branch represnts an enzyme). Colouring is based on the degree of similarity as calulated by EC-Blast and clustered using PVClust in R. The three circles represent bond order, reaction center and sub-structure similairty scores represctivly. These are folowed by the primary EC number, which is hyperlinked to the IntEnz database, and the UniprotKB identifier (that links to the UniprotKB record). If the sequence represnts a known stuctureal domain then the PDB identified (linked to PDBe entry) and CATH domain (lined to the CATH domain pages) are shown.

    Finally on the far right the multi domain architecture of the protein of each leaf is depicted. Domains are given unique colours. Domain annotation is derived from Gene3D. The domains are shown as bars along a line, the position and relative proportion of which is proportional to the total sequence. Hovering over each bar will show the domain CATH code.

    It is important to note that both GO and E.C. annotations are assigned to entire gene products and not to domains. Thus as FunTree is a domain centric resource some annotations might not be being performed soley by that particular domain. Most functions can be asscribed to a single domain but many are a product of domain combinations or multipule gene products i.e. molecular machines. We and others are working on methods to over come this annotation problem.

  • Ancestral Charater Estimation (ACE) Tree

    This shows a reduced version of the phylogenetic tree where all the nodes that ahve an E.C. annotation (at the E.C. third level) are shown as a circular tree. At the end of each branch the E.C. code is shown. as well as the FunTree leaf idenifier. By hovering over a leaf the contrbution that the function has at each internal node in the lineage is shown. Conversely, hovering over an internal node shows the most probable function at that node.

  • Ligand Similarity Tree

    This page shows a similarity tree of all the small molecules found in all the reaction in the cluster / super family. The similarities are calculated using SMSD and the clustering is made using the PVClust methods as implemented in R.

  • Reaction Similarity Tree

    This page shows three similarity trees of all the reactions molecules found in the cluster / super family. The similarities are calculated using ECBlast and the clustering is made using the PVClust methods as implemented in R. Three different trees can be selected for. Each tree is based on a different measure of reaction similarity: bond order changes only; overall small molecule substructure similarity or a combination of the two. Each leaf shows a schematic of the reaction. The default shown is for bond similarity.

  • EC Hierarchy As Unrooted Tree

    This page shows the E.C. hierarchy as an un-rooted tree. The braches corresponding to the E.C. codes found in the cluster / super family are highlighted and the E.C. code shown in blue at the leaf and internal nodes.

  • Alignment Page

    The alignment page shows a JalView applet of the alignment used to build the phylogenetic tree. The sequences in the alignment that have a known structure are annotated by secondary structure and catalytic side residues as catalogued the CSA. If the active site residues are found in the curated section of the CSA it is coloured bright red. Otherwise if it comes from the predicted section it is coloured light red.

  • Multi Domain Architectures

    This page shows an interactive force directed graph of the multi domain architectures as generated by ArchSchema. In addition to the MDA's, the E.C. codes annotated by UniProtKB for sequences for each MDA are shown. The key to both the MDA's is shown in the table to the left of the graph.