+The first method, described below, considers NCBI annotations and uses
+a distance-based similarity measure. We start with the following
+preliminary Definition:
+
+\begin{definition}
+\label{def1}
+Let $A=\{A,T,C,G\}$ be the nucleotides alphabet, and $A^\ast$ be the
+set of finite words on $A$ (\emph{i.e.}, of DNA sequences). Let
+$d:A^{\ast}\times A^{\ast}\rightarrow[0,1]$ be a distance on
+$A^{\ast}$. Consider a given value $T\in[0,1]$ called a threshold. For
+all $x,y\in A^{\ast}$, we will say that $x\sim_{d,T}y$ if
+$d(x,y)\leqslant T$.
+\end{definition}
+
+\noindent $\sim_{d,T}$ is obviously an equivalence relation and when $d=1-\Delta$, where $\Delta$ is the similarity scoring function embedded into the emboss package (Needleman-Wunch released by EMBL), we will simply denote $\sim_{d,0.1}$ by $\sim$.
+
+The method begins by building an undirected graph based on similarity
+rates $r_{ij}$ between DNA~sequences $g_{i}$ and $g_{j}$ (\emph{i.e.},
+$r_{ij}=\Delta\left(g_{i},g_{j}\right)$). In this latter graph, nodes
+are constituted by all the coding sequences of the set of genomes
+under consideration, and there is an edge between $g_{i}$ and $g_{j}$
+if the similarity rate $r_{ij}$ is greater than a given similarity
+threshold. The Connected Components (CC) of the ``similarity'' graph
+are thus computed.
+
+This process also results in an equivalence relation between sequences
+in the same CC based on Definition~\ref{def1}. Any class for this
+relation is called ``gene'' here, where its representatives
+(DNA~sequences) are the ``alleles'' of this gene. Thus this first
+method produces for each genome $G$, which is a set
+$\left\{g_{1}^G,...,g_{m_G}^G\right\}$ of $m_{G}$ DNA coding
+sequences, the projection of each sequence according to $\pi$, where
+$\pi$ maps each sequence into its gene (class) according to $\sim$. In
+other words, a genome $G$ is mapped into
+$\left\{\pi(g_{1}^G),...,\pi(g_{m_G}^G)\right\}$. Note that a
+projected genome has no duplicated gene since it is a set.
+
+Consequently, the core genome (resp. the pan genome) of two genomes
+$G_{1}$ and $G_{2}$ is defined as the intersection (resp. as the
+union) of their projected genomes. We then consider the intersection
+of all the projected genomes, which is the set of all the genes
+$\dot{x}$ such that each genome has at least one allele in
+$\dot{x}$. The pan genome is computed similarly as the union of all
+the projected genomes. However such approach suffers from producing
+too small core genomes, for any chosen similarity threshold, compared
+to what is usually expected by biologists regarding these
+chloroplasts. We are then left with the following questions: how can
+we improve the confidence put in the produced core? Can we thus guess
+the evolution scenario of these genomes?