PHYLODIGM – PHYLOgenetic tree DIGitalisation Manager

by Witold Januszewski

Summary

The article presents an original application for automated phylogenetic tree digitalisation. PHYLODIGM – PHYLOgenetic tree DIGitalisation Manager uses a set of image processing and recognition methods so as to build an acyclic graph with accompanying Newick format description, the latter being an interpunction-based phylogeny description standard.

Introduction

Phylogenetic trees have proved optimal branching diagrams for presenting the findings of molecular evolution. The formal analysis of phylogenetic trees has revealed their acyclic graph structure[k1] . Each node of  a phylogenetic tree corresponds to one species, whereas the distance between two nodes represents the evolutionary distance between two species.

These properties of phylogenetic trees were used to create the Newick format in 1986, which is an interpunction-based description format for graph-theoretical trees.  The format provides a text method for describing parent nodes, leaf nodes and the distances between them, thus reducing the size of digital phylogram repositories.

Newly, phylogram reconstruction programs handling the Newick format such as TreeSnatcher, TreeRipper or Dendroscope have been introduced. Henceforth, creation of a comprehensive and immediate phylogenetic tree digitalisation method has become a significant issue in bioinformatics.[k2]

Methods

PHYLODIGM – PHYLOgenetic tree DIGitalisation Manager was conceived as a Java application, which, firstly, follows the principle of multiplatform Object-Oriented Programming, secondly, allows producing mobile versions of the programme for portable camera-bearing or image-reading devices such as mobile phones or certain palmtops, thirdly, alleviates testing and providing extension modules such as reasoners or additional image processing methods. These paradigms extend the use cases of PHYLODIGM considerably.

Preprocessing

After the source tree image (see Fig. 1a and 1b.) has been uploaded by the user via the ‘Acquire Image’ command button, the initial stage of phylogenetic tree digitalisation, namely the preprocessing, is ready to begin (see GUI snapshot in the Fig A.). Following the ‘Execute Preprocessing’ command PHYLODIGM converts the source image to .PNG format and trims image borders (where existent) for processing efficiency.

Fig.A: Preprocessing (GUI Snapshot)

Junction detection and edge extraction

Prior to junction detection and edge extraction the source image undergoes binary segmentation and appropriate morphological operations:  erosion and skeletonisation (see Fig. 2-3).  PHYLODIGM proceeds with Hit-and-Miss (HMT) morphological pattern matching. The method locates and returns all pixel positions of the image matrix that match the provided kernel pattern.  PHYLODIGM runs HMT twice, first with three meeting lines kernel pattern to retrieve junctions (Fig. 4), next with free line end kernel pattern to retrieve the pixels at the edge endings (Fig. 5).  The user can either accept the automated processing or use the drawing GUI tools (Fig B.) to add the missing tips and edges or delete the misrecognised ones.

Fig.2

 

Fig.3

Fig.4

Fig.5

Fig.B: Junction detection and edge extraction (GUI Snapshot)

Node linkage

The detected junctions become the acyclical graph nodes and undergo a linkage process based on linear interpolation according to the formulas shown on the graphic (Fig C.). The value of a connected pixel  P0 is calculated according to the neighbour pixel values P31 and P42, in respect to the distances:

- dx: between P0 and the center of the line linking P1 and P3

- dy: between P0 and the center of the line linking P2 and P4

Fig.C: Node linkage method by interpolation

Results

The interpolated image data, after the measurement of leaf node trajectories, forms the resulting digital phylogram according to the grammar rules of the Newick format. We provide both the Newick format and respective graph visualisation as the output (Fig D.). The results may be exported as either raw image, .PDF file or textual Newick format. The same GUI panel enables the user to import a custom Newick format tree description into PHYLODIGM as well. Tests on both rectangular and freeform trees have brought in expected digitalisation results[k3] .

Fig.D: Exporting the constructed graph (GUI Snapshot)

Conclusion

The author believes that PHYLODIGM provides an innovative method for automated phylogenetic tree digitalisation. The output phylograms should satisfy the requirements of phylogenetic analysis in a broad sense. In the prospective development phase PHYLODIGM will be extended with a .PDF tree search engine, either a neural network or machine learning-based reasoning module and the capability of phylogenetic network digitalisation.

Rapid phylogenetic tree digitalisation with the usage of PHYLODIGM versions for PCs and mobile devices would serve the process of knowledge discovery from archival documents containing printed or hand-drawn images of phylogenetic trees. Libraries storing documents on phylogenetics in any language could qualify as data source. This could lead to creation of a database of global usage and respect, thus spurring the development of phylogeny analysis in bioinformatics and other sciences[k4] .

References

1. Hughes, J.; TreeRipper: towards a fully automated optical tree recognition software; Nature Preceedings; Nature Publishing Group; 2010

2. Huson, D.; Richter, D.; Rausch, C.; Dezulian, T.; Franz, M. & Rupp, R.; Dendroscope: An interactive viewer for large phylogenetic trees; Bmc Bioinformatics 8/6; BioMed Central Ltd; 2007

3. Laubach, T. & Von Haeseler, A.; TreeSnatcher: coding trees from images; Bioinformatics 23/6; Oxford University Press; 2007

Warsaw University of Technology, Faculty of Electronics and Information Technology, Institute of Radioelectronics, Nuclear and Medical Electronics Division, Laboratory of Detection and Spectrometry

Address: Nowowiejska 15/19, 00-665, Warsaw, Poland,

E-mail: W.Januszewski@stud.elka.pw.edu.pl;

[starrater tpl=45]

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn
  • PDF
  • Technorati