1 The approache is further based on the ability to decide how far is each
2 genome from each others. To achieve this, we combine XXX metrics which are
5 \subsection{Core SNP based metric}
6 Due to the definition of the core genome, for each element $\dot{x}$
7 in this set, there is a gene $x \in \dot{x}$ in each genome.
8 Let us consider a class
9 $\dot{x}= \{y | x \sim y\}$.
11 \JFC{Il faudrait être cohérent: deux génomes proches devraient partout avoir
12 soit une métrique élevée soit une métrique très faible}
14 %1/ On SNPs of the core genome strict
15 All the $y$ are thus aligned
16 thanks to a global alignment tool. The SNPs may thus be extracted.
17 For each genome, one can thus compute the vector of boolean values
18 memorizing at index $i$ wether the SNP $i$ is present in one of its gene
19 (postive value) or not (null value).
20 A Hamming distance between two vectors allows to build the distance
22 This metric is further refered as to $m_S$.
24 % plus il y a de diff, plus le nombre est élevé
27 %2/ On SNPs of the core genome strict, each gene having the same weight
28 The $m_S$ method does not consider genes to have the same incidence in the
29 metric value. A gene with many SNPs has a larger influence in
30 the metric computation than a gene with fewer ones.
31 The metric further refered as to $m_{|S|}$ gives the same weight to each gene
32 without considering the number of SNP it contains.
34 % plus il y a de diff, plus le nombre est élevé
37 %3/ On gene content (symmetric difference)
38 The third metric consider the symetric difference $\Delta$
39 between the two sets $G_1$ and $G_2$ of genes.
42 (G1\cup G_2)\setminus (G1\cap G_2) = (G1\setminus G_2)\cup(G_2\setminus G1)
46 % 4/ Using EPFL method
47 % 5/ On size of the biggest syntheny bloc
48 % 6/ On average size of syntheny blocs
49 % 7/ On number of syntheny blocs.