Merge branch 'master' of ssh://bilbo/chloroplast13

author Michel Salomon <salomon@caseb.iut-bm.univ-fcomte.fr>

Mon, 2 Dec 2013 13:04:01 +0000 (14:04 +0100)

committer Michel Salomon <salomon@caseb.iut-bm.univ-fcomte.fr>

Mon, 2 Dec 2013 13:04:01 +0000 (14:04 +0100)
author Michel Salomon <salomon@caseb.iut-bm.univ-fcomte.fr>
Mon, 2 Dec 2013 13:04:01 +0000 (14:04 +0100)
committer Michel Salomon <salomon@caseb.iut-bm.univ-fcomte.fr>
Mon, 2 Dec 2013 13:04:01 +0000 (14:04 +0100)
diff --combined annotated.tex

index e9e30496d4504bd9018a1ea27125c400919fd288,3a97542e12805305f0e0939e760ea57f80a3626b..819cc33ba537a959ec6d35317e2a90a058dfcfe8
--- 1/annotated.tex
--- 2/annotated.tex
+++ b/annotated.tex
@@@ -47,11 -47,11 +47,11 @@@ that stores annotated and/or unannotate
   considered the GenBank-NCBI \cite{Sayers01012011} database as sequence
   database:  99~genomes of chloroplasts  were retrieved.   These genomes
   lie in  the eleven type  of chloroplast families and  Table \ref{Tab2}
- summarizes their distribution in our dataset.
+ summarizes their distribution in our dataset.\\
   
   \begin{figure}[h]  
     \centering
-     \includegraphics[width=0.75\textwidth]{generalView}
+     \includegraphics[width=0.8\textwidth]{generalView}
   \caption{A general overview of the annotation-based approach}\label{Fig1}
   \end{figure}
   
@@@ -240,25 -240,36 +240,25 @@@ score_{ij}=\vert g_i \cap g_j\ver
   \label{Eq1}
   \end{equation}
   \noindent where $1 \leq i \leq n$, $1 \leq j \leq n$, and $g_i, g_j$ are 
- -genomes. The  generation of a new  core gene depends obviously on the
- -value of intersection scores $score_{ij}$:
+ +genomes. The  generation of a new  core gene depends  obviously on the
+ +value  of the  intersection scores  $score_{ij}$. More  precisely, the
+ +idea is  to consider a  pair of genomes  such that their score  is the
+ +largest element in ICM. These two genomes are then removed from matrix
+ +and the  resulting new  core genome is  added for the  next iteration.
+ +The ICM is then updated to take into account the new core gene: new IS
+ +values are computed for it. This process is repeated until no new core
+ +gene can be obtained.
   
- -% TO BE CONTINUED
- -
- -$$
- -\text{new Core} = 
- -\begin{cases}
- -\text{Ignored} & \text{if $\textit{score}=0$;} \\
- -\text{new Core id} & \text{if $\textit{Score}>0$.}
- -\end{cases}
- -$$
- -
- -if     $\textit{Score}=0$     then     we    have     \textit{disjoint
- -relation} \emph{i.e.},  no common genes between two  genomes.  In this
- -case  the  system  ignores  the   genome  that  annul  the  core  gene
- -size. Otherwise, The system removes these two genomes from ICM and add
- -new  core  genome  with a  \textit{coreID}  of  them  to ICM  for  the
- -calculation in  next iteration. This  process reduces the size  of ICM
- -and repeats until all genomes  are treated \emph{i.e.} ICM has no more
- -genomes.  We observe  that ICM is very large because  of the amount of
- -data that it stores. This results  to be time and memory consuming for
- -calculating  the  intersection  scores.   To  increase  the  speed  of
- -calculations, it  is sufficient to  only calculate the  upper triangle
- -scores. The time complexity for this process after enhancement is thus
- -$O(\frac{n.(n-1)}{2})$.   Algorithm   \ref{Alg1:ICM}  illustrates  the
- -construction of  the ICM matrix and  the extraction of  the core genes
- -where \textit{GenomeList},  represents the database  where all genomes
- -data are stored. At each iteration, it computes the maximum core genes
- -with its two genomes parents.
+ +We  can observe  that  the ICM  is very  large  due to  the amount  of
+ +data. As a consequence, the  computation of the intersection scores is
+ +both  time and  memory consuming.  However,  since ICM  is a  symetric
+ +matrix we can reduce the  computation overhead by considering only its
+ +triangular  upper part.  The  time complexity  for this  process after
+ +enhancement is thus $O(\frac{n.(n-1)}{2})$.  Algorithm ~\ref{Alg1:ICM}
+ +illustrates the construction  of the ICM matrix and  the extraction of
+ +the  core  genes, where  \textit{GenomeList}  represents the  database
+ +storing all genomes  data. At each iteration, it  computes the maximum
+ +core genes with its two genomes parents.
   
   % ALGORITHM HAS BEEN REWRITTEN
   
@@@ -321,7 -332,7 +321,7 @@@ to align these sequences with each othe
   \end{enumerate} 
   
   \begin{figure}[H]
-   \centering \includegraphics[width=0.75\textwidth]{Whole_system}
+   \centering \includegraphics[width=0.8\textwidth]{Whole_system}
     \caption{Overview of the pipeline}\label{wholesystem}
   \end{figure}
author	Michel Salomon <salomon@caseb.iut-bm.univ-fcomte.fr>
	Mon, 2 Dec 2013 13:04:01 +0000 (14:04 +0100)
committer	Michel Salomon <salomon@caseb.iut-bm.univ-fcomte.fr>
	Mon, 2 Dec 2013 13:04:01 +0000 (14:04 +0100)