2 The last stage of the proposed pipeline is naturally to take advantage
3 of the produced core and pan genomes for biological studies. As
4 this key stage is not directly related to the methodology for core
5 and pan genomes discovery, we will only outline a few tasks that
6 can be operated on the produced data.
10 %\includegraphics[scale=0.215]{tree}
11 %\caption{Part of a core genomes evolutionary tree (NCBI gene names)}
15 Obtained results may be visualized by building a core genomes evolutionary tree.
16 % All core genes generated represent an important information in the tree,
17 % because they provide ancestor information of two or more
19 Each node in this tree represents a chloroplast genome or
20 a predicted core. %, as depicted in Figure~\ref{coreTree}. In this figure, nodes labels are of the form \textit{(Genes number:Family name\_Scientific name\_Accession number)},while an edge is labeled with the number of gene loss when compared to its parents (a leaf genome or an intermediate core genome). Such numbers can answer questions like: how many genes are different between two species? Which functionality has been lost between an ancestor and its children ? For complete core treesbased either on NCBI names or on DOGMA ones, see supplementary data.
22 A second application of such data is obviously to build accurate phylogenetic
23 trees, using tools like
24 PHYML\cite{guindon2005phyml} or
25 RAxML{\cite{stamatakis2008raxml,stamatakis2005raxml}.
26 Consider a set of species, the last common core genome in the core tree
27 contains all the genes shared in common by these species. These genes may be
28 multi aligned to serve as input of the phylogenetic tools mentioned above.
29 An example of such a phylogenetic tree on core 58 (NCBI cores tree, see
30 supplementary data) is provided in Appendix~\ref{philoTree}. Remark that, in
31 order to constitute a relevant outgroup, we have simply blasted each gene
32 of this core on a chosen \emph{Cyanobacteria}.
36 %TNT\cite{goloboff2008tnt}}.
37 % In this work, we chose to use
38 % RAxML\cite{stamatakis2008raxml,stamatakis2005raxml} because it is
39 % fast, accurate, and can build large trees when dealing with a large
40 % number of genomic sequences.
42 % The procedure used to built a phylogenetic tree is as follows:
44 % \item For each gene in a core gene, extract its sequence and store it in the database.
45 % \item Use multiple alignment tools such as (****to be write after see christophe****)
46 % to align these sequences with each others.
47 % \item Use an outer-group genome from cyanobacteria to calculate distances.
48 % \item Submit the resulting aligned sequences to RAxML program to compute
49 % the distances and finally draw the phylogenetic tree.