considered the GenBank-NCBI \cite{Sayers01012011} database as sequence
database: 99~genomes of chloroplasts were retrieved. These genomes
lie in the eleven type of chloroplast families and Table \ref{Tab2}
- summarizes their distribution in our dataset.
+ summarizes their distribution in our dataset.\\
\begin{figure}[h]
\centering
- \includegraphics[width=0.75\textwidth]{generalView}
+ \includegraphics[width=0.8\textwidth]{generalView}
\caption{A general overview of the annotation-based approach}\label{Fig1}
\end{figure}
\label{Eq1}
\end{equation}
\noindent where $1 \leq i \leq n$, $1 \leq j \leq n$, and $g_i, g_j$ are
-genomes. The generation of a new core gene depends obviously on the
-value of intersection scores $score_{ij}$:
+genomes. The generation of a new core gene depends obviously on the
+value of the intersection scores $score_{ij}$. More precisely, the
+idea is to consider a pair of genomes such that their score is the
+largest element in ICM. These two genomes are then removed from matrix
+and the resulting new core genome is added for the next iteration.
+The ICM is then updated to take into account the new core gene: new IS
+values are computed for it. This process is repeated until no new core
+gene can be obtained.
-% TO BE CONTINUED
-
-$$
-\text{new Core} =
-\begin{cases}
-\text{Ignored} & \text{if $\textit{score}=0$;} \\
-\text{new Core id} & \text{if $\textit{Score}>0$.}
-\end{cases}
-$$
-
-if $\textit{Score}=0$ then we have \textit{disjoint
-relation} \emph{i.e.}, no common genes between two genomes. In this
-case the system ignores the genome that annul the core gene
-size. Otherwise, The system removes these two genomes from ICM and add
-new core genome with a \textit{coreID} of them to ICM for the
-calculation in next iteration. This process reduces the size of ICM
-and repeats until all genomes are treated \emph{i.e.} ICM has no more
-genomes. We observe that ICM is very large because of the amount of
-data that it stores. This results to be time and memory consuming for
-calculating the intersection scores. To increase the speed of
-calculations, it is sufficient to only calculate the upper triangle
-scores. The time complexity for this process after enhancement is thus
-$O(\frac{n.(n-1)}{2})$. Algorithm \ref{Alg1:ICM} illustrates the
-construction of the ICM matrix and the extraction of the core genes
-where \textit{GenomeList}, represents the database where all genomes
-data are stored. At each iteration, it computes the maximum core genes
-with its two genomes parents.
+We can observe that the ICM is very large due to the amount of
+data. As a consequence, the computation of the intersection scores is
+both time and memory consuming. However, since ICM is a symetric
+matrix we can reduce the computation overhead by considering only its
+triangular upper part. The time complexity for this process after
+enhancement is thus $O(\frac{n.(n-1)}{2})$. Algorithm ~\ref{Alg1:ICM}
+illustrates the construction of the ICM matrix and the extraction of
+the core genes, where \textit{GenomeList} represents the database
+storing all genomes data. At each iteration, it computes the maximum
+core genes with its two genomes parents.
% ALGORITHM HAS BEEN REWRITTEN
\end{enumerate}
\begin{figure}[H]
- \centering \includegraphics[width=0.75\textwidth]{Whole_system}
+ \centering \includegraphics[width=0.8\textwidth]{Whole_system}
\caption{Overview of the pipeline}\label{wholesystem}
\end{figure}