+These last years the cost of sequencing genomes has been greatly
+reduced, and thus more and more genomes are sequenced. Therefore
+automatic annotation tools are required to deal with this continuously
+increasing amount of genomical data. Moreover, a reliable and accurate
+genome annotation process is needed in order to provide strong
+indicators for the study of life\cite{Eisen2007}.
+
+Various annotation tools (\emph{i.e.}, cost-effective sequencing
+methods\cite{Bakke2009}) producing genomic annotations at many levels
+of detail have been designed by different annotation centers. Among
+the major annotation centers we can notice NCBI\cite{Sayers01012011},
+Dogma \cite{RDogma}, cpBase \cite{de2002comparative},
+CpGAVAS \cite{liu2012cpgavas}, and
+CEGMA\cite{parra2007cegma}. Usually, previous studies used one out of
+three methods for finding genes in annoted genomes using data from
+these centers: \textit{alignment-based}, \textit{composition based},
+or a combination of both~\cite{parra2007cegma}. The alignment-based
+method is used when trying to predict a coding gene (\emph{i.e.}.
+genes that produce proteins) by aligning a genomic DNA sequence with a
+cDNA sequence coding an homologous protein \cite{parra2007cegma}.
+This approach is also used in GeneWise\cite{birney2004genewise}. The
+alternative method, the composition-based one (also known
+as \textit{ab initio}) is based on a probabilistic model of gene
+structure to find genes according to the gene value probability
+(GeneID \cite{parra2000geneid}). Such annotated genomic data will be
+used to overcome the limitation of the first method described in the
+previous section. In fact, the second method we propose finds core
+genes from large amount of chloroplast genomes through genomic
+features extraction.
+
+Figure~\ref{Fig1} presents an overview of the entire method pipeline.
+More precisely, the second method consists of three
+stages: \textit{Genome annotation}, \textit{Core extraction},
+and \textit{Features Visualization} which highlights the
+relationships. To understand the whole core extraction process, we
+describe briefly each stage below. More details will be given in the
+coming subsections. The method uses as starting point some sequence
+database chosen among the many international databases storing
+nucleotide sequences, like the GenBank at NBCI \cite{Sayers01012011},
+the \textit{EMBL-Bank} \cite{apweiler1985swiss} in Europe
+or \textit{DDBJ} \cite{sugawara2008ddbj} in Japan. Different
+biological tools can analyze and annotate genomes by interacting with
+these databases to align and extract sequences to predict genes. The
+database in our method must be taken from any confident data source
+that stores annotated and/or unannotated chloroplast genomes. We have
+considered the GenBank-NCBI \cite{Sayers01012011} database as sequence
+database: 99~genomes of chloroplasts were retrieved. These genomes
+lie in the eleven type of chloroplast families and Table \ref{Tab2}
+summarizes their distribution in our dataset.\\
+
+\begin{figure}[h]