quelques typos

author couchot <couchot@couchot-Latitude-E6320.(none)>

Thu, 21 Mar 2013 07:54:19 +0000 (08:54 +0100)

committer couchot <couchot@couchot-Latitude-E6320.(none)>

Thu, 21 Mar 2013 07:54:19 +0000 (08:54 +0100)
author couchot <couchot@couchot-Latitude-E6320.(none)>
Thu, 21 Mar 2013 07:54:19 +0000 (08:54 +0100)
committer couchot <couchot@couchot-Latitude-E6320.(none)>
Thu, 21 Mar 2013 07:54:19 +0000 (08:54 +0100)
diff --git a/art.tex b/art.tex

index 1bcaf15c9e24f35c37bb94b9534d9c8673f5de65..92a8c4838e950619c0280a5e8d64540401fdcbcc 100644 (file)
--- a/art.tex
+++ b/art.tex
@@ -13,7 +13,7 @@ To achieve this, the authors
    SNPs data.
  \end{enumerate}
  
-In~\cite{10.1371/journal.pone.0052841} the authors analyse 435 Mycobacterium Tuberculosis complex isolates of the same clade. By focusing on the H37Rv genome, 
+In~\cite{10.1371/journal.pone.0052841} the authors analyze 435 Mycobacterium Tuberculosis complex isolates of the same clade. By focusing on the H37Rv genome, 
  they produce 13382 SNPs. Later, they  compare  44 genomes to this one 
  regarding these SNP. The way they extract this phylogenetic tree is 
  not detailed. They focus then on Percy256 and Percy556 since both these 
diff --git a/classEquiv.tex b/classEquiv.tex

index cdc1280d2bf86f1628f3160b8853d08da090e0b7..b77c51ac3bddcd20373d15cbd79facfd35360925 100644 (file)
--- a/classEquiv.tex
+++ b/classEquiv.tex
@@ -6,10 +6,10 @@ is a pair that gives the similarity rate $r_{ij}$ between the two genes
  $g_{i}$ and $g_{j}$.
  
  The first step of this stage consists in building the following non-oriented
-graph furthere denoted as to \emph{similarity graphe}.
+graph further denoted as to \emph{similarity graph}.
  In this one, the vertices are the genes. There is an edge between 
  $g_{i}$ and $g_{j}$ if the rate $r_{ij}$ is greater than a given similarity 
-treeshold $t$.
+threshold $t$.
  
  We then define the relation $\sim$  such that
  $ x \sim y$ if $x$ and $y$ belong in the same connected component.
@@ -21,21 +21,21 @@ All the genes which are  equivalent to each other
  are also elements of the same equivalence class.
  Let us then consider the set of all equivalence classes of the set of genes 
  by $\sim$, denoted $X/\sim = \{\dot{x} | x \textrm{ is a gene}\}$. 
-defined by \pi(x) = \dot{x} 
-which maps each gene  into it respective equivalence classe by $\sim$.
+defined by $\pi(x) = \dot{x}$
+which maps each gene  into it respective equivalence class by $\sim$.
  
  
  
  
-For each genome $[g_l,\ldot,g{l+m}]$, the second step computes 
+For each genome $[g_l,\ldots,g{l+m}]$, the second step computes 
  the projection of each gene according to $\pi$. 
  The resulting genome  which is 
  $$
-[\pi(g_l),\ldot,\pi(g{l+m})]
+[\pi(g_l),\ldots,\pi(g{l+m})]
  $$ 
  is again of size $m$.
  
-Intuitivelly speaking, for two genes $g_i$ and $g_j$ 
+Intuitively speaking, for two genes $g_i$ and $g_j$ 
  in the same equivalence class, there is path from  $g_i$ and $g_j$.
  It signifies that  each evolution step 
  (represented by an edge in the similarity graph) 
@@ -48,8 +48,8 @@ We compute the core genome as follow.
  Each genome is projected according to $\pi$. We then consider the 
  intersection of all the projected genomes which are considered as sets of genes
  and not as sequences of genes.
-This results as the set of all the class representents $\dot{x}$
-such that each geneome has an gene $x$ in  $\dot{x}$.
+This results as the set of all the class $\dot{x}$
+such that each genome has an gene $x$ in  $\dot{x}$.
  The pan genome is computed similarly: the union of all the 
  projected genomes in computed here.
  
diff --git a/closedgenomes.tex b/closedgenomes.tex

index d468f102bc25e700ba30a8981f56bcafdc561323..eb61eb7fbd2d4978a30b65e122c65180abeb5c05 100644 (file)
--- a/closedgenomes.tex
+++ b/closedgenomes.tex
@@ -1,4 +1,4 @@
-The approache is further based on the ability to decide how far is each 
+The approach is further based on the ability to decide how far is each 
  genome from each others. To achieve this, we combine XXX metrics which are 
  detailed in this part.
  
@@ -14,12 +14,12 @@ soit une métrique élevée soit une métrique très faible}
  %1/ On SNPs of the core genome strict
  All the $y$ are thus aligned 
  thanks to a global alignment tool. The SNPs may thus be extracted.
-For each genome, one can thus compute the vector of boolean values 
-memorizing at index $i$ wether the SNP $i$ is present in one of its gene 
-(postive value) or  not (null value). 
+For each genome, one can thus compute the vector of Boolean values 
+memorizing at index $i$ whether the SNP $i$ is present in one of its gene 
+(positive value) or  not (null value). 
  A Hamming distance between two vectors allows to build the distance 
  between two genes. 
-This metric is further refered as to $m_S$.
+This metric is further referred as to $m_S$.
  
  % plus il y a de diff, plus le nombre est élevé
  
@@ -28,19 +28,28 @@ This metric is further refered as to $m_S$.
  The $m_S$ method does not consider genes to have the same incidence in the 
  metric value. A gene with many SNPs has a larger influence in 
  the metric computation than a gene with fewer ones. 
-The metric further refered as to $m_{|S|}$ gives the same weight to each gene
+The metric further referred as to $m_{|S|}$ gives the same weight to each gene
  without considering the number of SNP it contains. 
  
  % plus il y a de diff, plus le nombre est élevé
  
-
-%3/ On gene content (symmetric difference)
-The third metric consider the symetric difference $\Delta$ 
-between the two sets $G_1$ and $G_2$ of genes.
+\subsection{Symmetric Difference based metric}
+The third metric consider the symmetric difference $\Delta$ 
+between the two sets $G_1$ and $G_2$ of genes recalled hereafter
  $$
  G_1\Delta G2 = 
-(G1\cup G_2)\setminus (G1\cap G_2) = (G1\setminus G_2)\cup(G_2\setminus G1) 
+(G1\cup G_2)\setminus (G1\cap G_2) = (G1\setminus G_2)\cup(G_2\setminus G1).
  $$
+The cardinality of $G_1\Delta G2$, give the metric.
+This metric is furthered referred as to $m_{\Delta}$.
+
+Practically, let $k$ be the number of all the equivalence classes. Due to the definition of the pan genome, this number is equal to the cardinality of this set.
+For each genome, if we only consider which gene belongs into it \textit{i.e.}, if  we abstract away all the position this gene appears, this genome may be 
+memorized as a vector of $k$ Boolean values. The element at index $i, 0 \le i \le k-1$ is true if and only if the $i$-th gene of the pan genome belongs to this 
+one.  
+This metric is equal to the Hamming distance between the two corresponding  
+vectors of Boolean values.
+
  \end{document}
  
  % 4/ Using EPFL method
author	couchot <couchot@couchot-Latitude-E6320.(none)>
	Thu, 21 Mar 2013 07:54:19 +0000 (08:54 +0100)
committer	couchot <couchot@couchot-Latitude-E6320.(none)>
	Thu, 21 Mar 2013 07:54:19 +0000 (08:54 +0100)
art.tex		patch \| blob \| history
classEquiv.tex		patch \| blob \| history
closedgenomes.tex		patch \| blob \| history