+\subsubsection{Intersection Core Matrix (\textit{ICM})}
+
+To extract core genes, we iteratively collect the maximum number of
+common genes between genomes and therefore during this stage
+an \textit{Intersection Core Matrix} (ICM) is built. ICM is a two
+dimensional symmetric matrix where each row and each column correspond
+to one genome. Hence, an element of the matrix stores
+the \textit{Intersection Score} (IS): the cardinality of the core
+genes set obtained by intersecting one genome with another
+one. Maximum cardinality results in selecting the two genomes having
+the maximum score. Mathematically speaking, if we have $n$ genomes in
+local database, the ICM is an $n \times n$ matrix whose elements
+satisfy:
+\begin{equation}
+score_{ij}=\vert g_i \cap g_j\vert
+\label{Eq1}
+\end{equation}
+\noindent where $1 \leq i \leq n$, $1 \leq j \leq n$, and $g_i, g_j$ are
+genomes. The generation of a new core gene depends obviously on the
+value of the intersection scores $score_{ij}$. More precisely, the
+idea is to consider a pair of genomes such that their score is the
+largest element in ICM. These two genomes are then removed from matrix
+and the resulting new core genome is added for the next iteration.
+The ICM is then updated to take into account the new core gene: new IS
+values are computed for it. This process is repeated until no new core
+gene can be obtained.
+
+We can observe that the ICM is very large due to the amount of
+data. As a consequence, the computation of the intersection scores is
+both time and memory consuming. However, since ICM is a symetric
+matrix we can reduce the computation overhead by considering only its
+triangular upper part. The time complexity for this process after
+enhancement is thus $O(\frac{n.(n-1)}{2})$. Algorithm ~\ref{Alg1:ICM}
+illustrates the construction of the ICM matrix and the extraction of
+the core genes, where \textit{GenomeList} represents the database
+storing all genomes data. At each iteration, it computes the maximum
+core genes with its two genomes parents.
+
+% ALGORITHM HAS BEEN REWRITTEN
+
+\begin{algorithm}[H]
+\caption{Extract Maximum Intersection Score}
+\label{Alg1:ICM}
+\begin{algorithmic}
+\REQUIRE $L \leftarrow \text{genomes sets}$
+\ENSURE $B1 \leftarrow \text{Max Core set}$
+\FOR{$i \leftarrow 0:len(L)-1$}
+ \STATE $score \leftarrow 0$
+ \STATE $core1 \leftarrow set(GenomeList[L[i]])$
+ \STATE $g1 \leftarrow L[i]$
+ \FOR{$j \leftarrow i+1:len(L)$}
+ \STATE $core2 \leftarrow set(GenomeList[L[j]])$
+ \STATE $Core \leftarrow core1 \cap core2$
+ \IF{$len(Core) > score$}
+ \STATE $score \leftarrow len(Core)$
+ \STATE $g2 \leftarrow L[j]$
+ \ENDIF
+ \ENDFOR
+ \STATE $B1[score] \leftarrow (g1,g2)$
+\ENDFOR
+\RETURN $max(B1)$
+\end{algorithmic}
+\end{algorithm}
+
+\subsection{Features visualization}
+
+The goal is to visualize results by building a tree of evolution. All
+core genes generated represent important information in the tree,
+because they provide information about the ancestors of two or more
+genomes. Each node in the tree represents one chloroplast genome or
+one predicted core called \textit{(Genes count:Family name\_Scientific
+names\_Accession number)}, while an edge is labeled with the number of
+lost genes from a leaf genome or an intermediate core gene. Such
+numbers are very interesting because they give an information about
+the evolution: how many genes were lost between two species whether
+they belong to the same family or not. By the principle of
+classification, a small number of genes lost among species indicates
+that those species are close to each other and belong to same family,
+while a large lost means that we have an evolutionary relationship
+between species from different families. To depict the links between
+species clearly, we built a phylogenetic tree showing the
+relationships based on the distances among genes sequences. Many tools
+are available to obtain a such tree, for example:
+PHYML\cite{guindon2005phyml},
+RAxML{\cite{stamatakis2008raxml,stamatakis2005raxml}, BioNJ, and
+TNT\cite{goloboff2008tnt}}. In this work, we chose to use
+RAxML\cite{stamatakis2008raxml,stamatakis2005raxml} because it is
+fast, accurate, and can build large trees when dealing with a large
+number of genomic sequences.
+
+The procedure used to built a phylogenetic tree is as follows:
+\begin{enumerate}
+\item For each gene in a core gene, extract its sequence and store it in the database.
+\item Use multiple alignment tools such as (****to be write after see christophe****)
+to align these sequences with each others.
+\item Submit the resulting aligned sequences to RAxML program to compute the distances and finally draw the phylogenetic tree.
+\end{enumerate}
+
+\begin{figure}[H]
+ \centering \includegraphics[width=0.75\textwidth]{Whole_system}
+ \caption{Overview of the pipeline}\label{wholesystem}
+\end{figure}