lost genes from a leaf genome or an intermediate core gene. Such
numbers are very interesting because they give an information about
the evolution: how many genes were lost between two species whether
-they belong to the same family or not. By the principle of
-classification, a small number of genes lost among species indicates
-that those species are close to each other and belong to same family,
-while a large lost means that we have an evolutionary relationship
-between species from different families. To depict the links between
+they belong to the same lineage or not. Phylogenetic relationships are mainly built by comparison of sets of coding and non-coding sequences. Phylogenies of photosynthetic plants are important to assess the origin of chloroplasts (REF) and the modalities of gene loss among lineages. These phylogenies are usually done using less than ten chloroplastic genes (REF), and some of them may not be conserved by evolution process for every taxa. As phylogenetic relationships inferred from data matrices complete for each species included and with the same evolution history are better assumptions, we selected core genomes for a new investivation of photosynthetic plants phylogeny. To depict the links between
species clearly, we built a phylogenetic tree showing the
relationships based on the distances among genes sequences. Many tools
are available to obtain a such tree, for example:
\end{figure}
\section{Implementation}
-We implemented the three algorithms using dell laptop model latitude E6430 with 4 GB of memory and Intel core i5 processor of 2.6 Ghz and 3 MB of cash. We built the code using python version 2.7 under ubuntu 12.04 LTS. We also used python packages such as os, Biopython, memory\_profile, re, numpy, time, shutil, and xlsxwriter to extract core genes from large amount of chloroplast genomes. Table \ref{Etime}, show the annotation type, execution time, and the number of core genes for each method:
+We implemented the three algorithms using dell laptop model latitude E6430 with 6 GB of memory, and Intel core i5 processor of 2.5 Ghz$\times 4$ with 3 MB of CPU cash. We built the code using python version 2.7 under ubuntu 12.04 LTS. We also used python packages such as os, Biopython, memory\_profile, re, numpy, time, shutil, and xlsxwriter to extract core genes from large amount of chloroplast genomes. Table \ref{Etime}, show the annotation type, execution time, and the number of core genes for each method:
\begin{center}
\begin{tiny}
& \multicolumn{2}{c}{Annotation} & \multicolumn{2}{c}{Features} & \multicolumn{2}{c}{E. Time} & \multicolumn{2}{c}{C. genes} & \multicolumn{2}{c}{Bad Gen.} \\
~ & N & D & Name & Seq & N & D & N & D & N & D \\
\hline
-Gene prediction & $\surd$ & - & - & $\surd$ & ? & - & ? & - & 0 & -\\[0.5ex]
+Gene prediction & $\surd$ & - & - & $\surd$ & 1.7 & - & ? & - & 0 & -\\[0.5ex]
Gene Features & $\surd$ & $\surd$ & $\surd$ & - & 4.98 & 1.52 & 28 & 10 & 1 & 0\\[0.5ex]
Gene Quality & $\surd$ & $\surd$ & $\surd$ & $\surd$ & \multicolumn{2}{c}{$\simeq$3 days + 1.29} & \multicolumn{2}{c}{4} & \multicolumn{2}{c}{1}\\[1ex]
\hline
\caption{Memory usages in (MB) for each methodology}\label{mem}
\begin{tabular}{p{2.5cm}p{1.5cm}p{1cm}p{1cm}p{1cm}p{1cm}p{1cm}p{1cm}}
\hline\hline
-Method& & Load Gen. & Conv. gV & Read gV & ICM & Gen. tree & Core Seq. \\
+Method& & Load Gen. & Conv. gV & Read gV & ICM & Core tree & Core Seq. \\
\hline
-Gene prediction & ~ & ~ & ~ & ~ & ~ & ~ & ~\\
+Gene prediction & NCBI & 100 & - & - & - & 108 & -\\
\multirow{2}{*}{Gene Features} & NCBI & 15.4 & 18.9 & 17.5 & 18 & 18 & 28.1\\
& DOGMA& 15.3 & 15.3 & 16.8 & 17.8 & 17.9 & 31.2\\
-Gene Quality & ~ & 15.3 & $\le$200 & 16.1 & 17 & 17.1 & 24.4\\
+Gene Quality & ~ & 15.3 & $\le$3G & 16.1 & 17 & 17.1 & 24.4\\
\hline
\end{tabular}
\end{table}