\end{figure}
\section{Implementation}
- We implemented the three algorithms using dell laptop model latitude E6430 with 6 GB of memory, and Intel core i5 processor of 2.5 Ghz$\times 4$ with 3 MB of CPU cash. We built the code using python version 2.7 under ubuntu 12.04 LTS. We also used python packages such as os, Biopython, memory\_profile, re, numpy, time, shutil, and xlsxwriter to extract core genes from large amount of chloroplast genomes. Table \ref{Etime}, show the annotation type, execution time, and the number of core genes for each method:
+
+ The different algorithms have been implemented using Python version
+ 2.7, on a laptop running Ubuntu~12.04~LTS. More precisely, the
+ computer is a Dell Latitude laptop - model E6430 with 6~GiB memory and
+ a quad-core Intel core~i5~processor with an operating frequency of
+ 2.5~GHz. Many python packages such as os, Biopython, memory\_profile,
+ re, numpy, time, shutil, and xlsxwriter were used to extract core
+ genes from large amount of chloroplast genomes.
\begin{center}
- \begin{tiny}
- \begin{table}[H]
- \caption{Type of Annotation, Execution Time, and core genes for each method}\label{Etime}
- \begin{tabular}{p{2.5cm}p{0.5cm}p{0.5cm}p{0.5cm}p{0.5cm}p{0.5cm}p{0.5cm}p{0.5cm}p{0.5cm}p{0.5cm}p{0.2cm}}
+ \begin{table}[b]
+ \caption{Type of annotation, execution time, and core genes
+ for each method}\label{Etime}
+ {\scriptsize
+ \begin{tabular}{p{2cm}p{0.5cm}p{0.25cm}p{0.5cm}p{0.25cm}p{0.5cm}p{0.25cm}p{0.5cm}p{0.25cm}p{0.5cm}p{0.2cm}}
\hline\hline
- & \multicolumn{2}{c}{Annotation} & \multicolumn{2}{c}{Features} & \multicolumn{2}{c}{E. Time} & \multicolumn{2}{c}{C. genes} & \multicolumn{2}{c}{Bad Gen.} \\
+ Method & \multicolumn{2}{c}{Annotation} & \multicolumn{2}{c}{Features} & \multicolumn{2}{c}{Exec. time (min.)} & \multicolumn{2}{c}{Core genes} & \multicolumn{2}{c}{Bad genomes} \\
~ & N & D & Name & Seq & N & D & N & D & N & D \\
\hline
-Gene prediction & $\surd$ & - & - & $\surd$ & ? & - & ? & - & 0 & -\\[0.5ex]
-Gene features & $\surd$ & $\surd$ & $\surd$ & - & 4.98 & 1.52 & 28 & 10 & 1 & 0\\[0.5ex]
-Gene quality & $\surd$ & $\surd$ & $\surd$ & $\surd$ & \multicolumn{2}{c}{$\simeq$3 days + 1.29} & \multicolumn{2}{c}{4} & \multicolumn{2}{c}{1}\\[1ex]
+Gene prediction & $\surd$ & - & - & $\surd$ & 1.7 & - & ? & - & 0 & -\\[0.5ex]
+Gene Features & $\surd$ & $\surd$ & $\surd$ & - & 4.98 & 1.52 & 28 & 10 & 1 & 0\\[0.5ex]
+Gene Quality & $\surd$ & $\surd$ & $\surd$ & $\surd$ & \multicolumn{2}{c}{$\simeq$3 days + 1.29} & \multicolumn{2}{c}{4} & \multicolumn{2}{c}{1}\\[1ex]
\hline
\end{tabular}
+ }
\end{table}
- \end{tiny}
\end{center}
- In table \ref{Etime}, we show that all methods need low execution time to finish extracting core genes from large chloroplast genomes except in gene quality method where we need about 3-4 days for sequence comparisons to construct quality genomes then it takes just 1.29 minute to extract core genes. This low execution time give us a privilage to use these methods to extract core genes on a personal comuters rather than main frames or parallel computers. In the table, \textbf{N} means NCBI, \textbf{D} means DOGMA, and \textbf{Seq} means Sequence. Annotation is represent the type of algorithm used to annotate chloroplast genome. We can see that the two last methods used the same annotation sources. Features means the type of gene feature used to extract core genes, and this is done by extracting gene name, gene sequence, or both of them. The execution time is represented the whole time needed to extract core genes in minutes. We can see in the table that the second method specially with DOGMA annotation has the lowest execution time of 1.52 minute. In last method We needs approxemetly three days (this period is depend on the amount of genomes) to finish the operation of extracting quality genomes only, while the execution time will be 1.29 minute if we have quality genomes. The number of core genes is represents the amount of genes in the last core genome. The main goal is to find the maximum core genes that simulate biological background of chloroplasts. With NCBI we have 28 genes for 96 genomes instead of 10 genes with DOGMA for 97 genomes. But the biological distribution of genomes with NCBI in core tree did not reflect good biological perspective. While in the core tree with DOGMA, the distribution of genomes are biologically good. Bad genomes are the number of genomes that destroy core genes because of the low number of gene intersection. \textit{NC\_012568.1 Micromonas pusilla}, is the only genome that observed to destroy the core genome with NCBI based on the method of gene features and in the third method of gene quality. \\
-
- The second important factor is the amount of memory usage in each methodology. Table \ref{mem} show the amounts of memory consumption by each method.
+ \vspace{-1cm}
+
+ Table~\ref{Etime} presents for each method the annotation type,
+ execution time, and the number of core genes. We use the following
+ notations: \textbf{N} denotes NCBI, while \textbf{D} means DOGMA,
+ and \textbf{Seq} is for sequence. The first {\it Annotation} columns
+ represent the algorithm used to annotate chloroplast genomes, the {\it
+ Features} columns mean the kind of gene feature used to extract core
+ genes: gene name, gene sequence, or both of them. It can be seen that
+ almost all methods need low {\it Execution time} to extract core genes
+ from large chloroplast genome. Only the gene quality method requires
+ several days of computation (about 3-4 days) for sequence comparisons,
+ once the quality genomes are construced it takes just 1.29~minutes to
+ extract core gene. Thanks to this low execution times we can use these
+ methods to extract core genes on a personal computer rather than main
+ frames or parallel computers. The lowest execution time: 1.52~minutes,
+ is obtained with the second method using Dogma annotations. The number
+ of {\it Core genes} represents the amount of genes in the last core
+ genome. The main goal is to find the maximum core genes that simulate
+ biological background of chloroplasts. With NCBI we have 28 genes for
+ 96 genomes, instead of 10 genes for 97 genomes with
+ Dogma. Unfortunately, the biological distribution of genomes with NCBI
+ in core tree do not reflect good biological perspective, whereas with
+ DOGMA the distribution of genomes is biologically relevant. {\it Bad
+ genomes} gives the number of genomes that destroy core genes due to
+ low number of gene intersection. \textit{NC\_012568.1 Micromonas
+ pusilla} is the only genome which destroyed the core genome with NCBI
+ annotations for both gene features and gene quality methods.
+
+ The second important factor is the amount of memory being used by each
+ methodology. Table \ref{mem} shows the memory usage of each
+ method. We used a package from PyPI~(\textit{the Python Package
+ Index}) named \textit{Memory\_profile} (located at~{\tt
+ https://pypi.python.org/pypi}) to extract all the values in
+ table~\ref{mem}. In this table, the values are presented in megabyte
+ unit and \textit{gV} means genevision~file~format. We can notice that
+ the level of memory which is used is relatively low for all methods
+ and is available on any personal computer. The different values also
+ show that the gene features method based on Dogma annotations has the
+ more reasonable memory usage, except when extracting core
+ sequences. The third method gives the lowest values if we already have
+ the quality genomes, otherwise it will consume far more
+ memory. Moreover, the amount of memory used by the third method also
+ depends on the size of each genome.
\begin{center}
- \begin{tiny}
\begin{table}[H]
\caption{Memory usages in (MB) for each methodology}\label{mem}
+ {\scriptsize
\begin{tabular}{p{2.5cm}p{1.5cm}p{1cm}p{1cm}p{1cm}p{1cm}p{1cm}p{1cm}}
\hline\hline
Method& & Load Gen. & Conv. gV & Read gV & ICM & Core tree & Core Seq. \\
\hline
- Gene prediction & NCBI & 100 & - & - & - & 108 & -\\
-Gene prediction & ~ & ~ & ~ & ~ & ~ & ~ & ~\\
++Gene prediction & NCBI & 108 & - & - & - & - & -\\
\multirow{2}{*}{Gene Features} & NCBI & 15.4 & 18.9 & 17.5 & 18 & 18 & 28.1\\
& DOGMA& 15.3 & 15.3 & 16.8 & 17.8 & 17.9 & 31.2\\
Gene Quality & ~ & 15.3 & $\le$3G & 16.1 & 17 & 17.1 & 24.4\\
\usepackage{algorithm}
\usepackage{algorithmic}
\usepackage{pdflscape}
++\usepackage{authblk}
++\usepackage[T1]{fontenc}
\usepackage{multirow,longtable}
\usepackage{amsmath,mathtools}
\usepackage{amssymb}
% correct bad hyphenation here
\hyphenation{op-tical net-works semi-conduc-tor}
--
--\begin{document}
- \title{Finding the core-genes of Plant Species Chloroplast}
-\title{Finding the core-genes of Chloroplast Species}
--\author{
- Bassam AlKindy\footnote{email: bassam.al-kindy@univ-fcomt\'{e}.fr} \and Jean-Fran\c{c}ois Couchot \and Christophe Guyeux \and Arnaud Mouly \and Michel Salomon \and Jacques Bahi \\
- FEMTO-ST Institute, UMR 6174 CNRS,\\
- Computer Science Department DISC, \and Lab. Chrono-Environnement, UMR 6174 CNRS,\\
-Bassam AlKindy\footnote{email: bassam.al-kindy@univ-fcomt\'{e}.fr} \and Jacques Bahi
-\and Jean-Fran\c{c}ois Couchot \and Christophe Guyeux \and Arnaud Mouly \and
-Michel Salomon \and\\
-FEMTO-ST Institute, UMR 6174 CNRS, \\
-Computer Science Department DISC, \\
--Universit\'{e} de Franche-Comt\'{e}, France \\
--{\small \it Authors in alphabetic order}
--}
++
++\title{Finding the Core-Genes of Plant Species Chloroplast}
++\author[1]{Bassam AlKindy} %\footnote{email: bassam.al-kindy@univ-fcomt\'{e}.fr}
++\author[1]{Jacques Bahi}
++\author[1]{Jean-Fran\c{c}ois Couchot}
++\author[1]{Christophe Guyeux}
++\author[2]{Arnaud Mouly}
++\author[1]{Michel Salomon}
++\affil[1]{FEMTO-ST Institute, UMR 6174 CNRS, Computer Science Department DISC, Universit\'{e} de Franche-Comt\'{e}, France}
++\affil[2]{Lab. Chrono-Environnement, UMR 6174 CNRS, Universit\'{e} de Franche-Comt\'{e}, France}
++%{\small \it Authors in alphabetic order}
++
++\renewcommand\Authands{ and }
++\begin{document}
\newcommand{\JFC}[1]{\begin{color}{green}\textit{}\end{color}}
\newcommand{\CG}[1]{\begin{color}{blue}\textit{}\end{color}}
% make the title area
++
\maketitle
%IEEEtran, journal, \LaTeX, paper, template.