From: Michel Salomon Date: Mon, 2 Dec 2013 13:04:01 +0000 (+0100) Subject: Merge branch 'master' of ssh://bilbo/chloroplast13 X-Git-Url: https://bilbo.iut-bm.univ-fcomte.fr/and/gitweb/chloroplast13.git/commitdiff_plain/38c1dcfbefcbed8161d2e34081f977f7e1222f64?ds=sidebyside;hp=-c Merge branch 'master' of ssh://bilbo/chloroplast13 --- 38c1dcfbefcbed8161d2e34081f977f7e1222f64 diff --combined annotated.tex index e9e3049,3a97542..819cc33 --- a/annotated.tex +++ b/annotated.tex @@@ -47,11 -47,11 +47,11 @@@ that stores annotated and/or unannotate considered the GenBank-NCBI \cite{Sayers01012011} database as sequence database: 99~genomes of chloroplasts were retrieved. These genomes lie in the eleven type of chloroplast families and Table \ref{Tab2} - summarizes their distribution in our dataset. + summarizes their distribution in our dataset.\\ \begin{figure}[h] \centering - \includegraphics[width=0.75\textwidth]{generalView} + \includegraphics[width=0.8\textwidth]{generalView} \caption{A general overview of the annotation-based approach}\label{Fig1} \end{figure} @@@ -240,25 -240,36 +240,25 @@@ score_{ij}=\vert g_i \cap g_j\ver \label{Eq1} \end{equation} \noindent where $1 \leq i \leq n$, $1 \leq j \leq n$, and $g_i, g_j$ are -genomes. The generation of a new core gene depends obviously on the -value of intersection scores $score_{ij}$: +genomes. The generation of a new core gene depends obviously on the +value of the intersection scores $score_{ij}$. More precisely, the +idea is to consider a pair of genomes such that their score is the +largest element in ICM. These two genomes are then removed from matrix +and the resulting new core genome is added for the next iteration. +The ICM is then updated to take into account the new core gene: new IS +values are computed for it. This process is repeated until no new core +gene can be obtained. -% TO BE CONTINUED - -$$ -\text{new Core} = -\begin{cases} -\text{Ignored} & \text{if $\textit{score}=0$;} \\ -\text{new Core id} & \text{if $\textit{Score}>0$.} -\end{cases} -$$ - -if $\textit{Score}=0$ then we have \textit{disjoint -relation} \emph{i.e.}, no common genes between two genomes. In this -case the system ignores the genome that annul the core gene -size. Otherwise, The system removes these two genomes from ICM and add -new core genome with a \textit{coreID} of them to ICM for the -calculation in next iteration. This process reduces the size of ICM -and repeats until all genomes are treated \emph{i.e.} ICM has no more -genomes. We observe that ICM is very large because of the amount of -data that it stores. This results to be time and memory consuming for -calculating the intersection scores. To increase the speed of -calculations, it is sufficient to only calculate the upper triangle -scores. The time complexity for this process after enhancement is thus -$O(\frac{n.(n-1)}{2})$. Algorithm \ref{Alg1:ICM} illustrates the -construction of the ICM matrix and the extraction of the core genes -where \textit{GenomeList}, represents the database where all genomes -data are stored. At each iteration, it computes the maximum core genes -with its two genomes parents. +We can observe that the ICM is very large due to the amount of +data. As a consequence, the computation of the intersection scores is +both time and memory consuming. However, since ICM is a symetric +matrix we can reduce the computation overhead by considering only its +triangular upper part. The time complexity for this process after +enhancement is thus $O(\frac{n.(n-1)}{2})$. Algorithm ~\ref{Alg1:ICM} +illustrates the construction of the ICM matrix and the extraction of +the core genes, where \textit{GenomeList} represents the database +storing all genomes data. At each iteration, it computes the maximum +core genes with its two genomes parents. % ALGORITHM HAS BEEN REWRITTEN @@@ -321,7 -332,7 +321,7 @@@ to align these sequences with each othe \end{enumerate} \begin{figure}[H] - \centering \includegraphics[width=0.75\textwidth]{Whole_system} + \centering \includegraphics[width=0.8\textwidth]{Whole_system} \caption{Overview of the pipeline}\label{wholesystem} \end{figure}