classEquiv.tex

   1 This step considers as input the set
   2 $\{((g_1,g_2),r_{12}), (g_1,g_3),r_{13}), (g_{n-1},g{n}),r_{n-1.n})\}$ of
   3 $\frac{n(n-1)}{2}$ elements.
   4 Each one $(g_i,g_j),r_{ij})$ where $i < j$,
   5 is a pair that gives the similarity rate $r_{ij}$ between the two genes
   6 $g_{i}$ and $g_{j}$.
   7
   8 The first step of this stage consists in building the following non-oriented
   9 graph further denoted as to \emph{similarity graph}.
  10 In this one, the vertices are the genes. There is an edge between
  11 $g_{i}$ and $g_{j}$ if the rate $r_{ij}$ is greater than a given similarity
  12 threshold $t$.
  13
  14 We then define the relation $\sim$  such that
  15 $ x \sim y$ if $x$ and $y$ belong in the same connected component.
  16 Mathematically speaking, it is obvious that this
  17 defines an equivalence relation.
  18 Let $\dot{x}= \{y  | x \sim y\}$
  19 denotes the equivalence class to which $x$ belongs.
  20 All the genes which are  equivalent to each other
  21 are also elements of the same equivalence class.
  22 Let us then consider the set of all equivalence classes of the set of genes
  23 by $\sim$, denoted $X/\sim = \{\dot{x} | x \textrm{ is a gene}\}$.
  24 defined by $\pi(x) = \dot{x}$
  25 which maps each gene  into it respective equivalence class by $\sim$.
  26
  27
  28
  29
  30 For each genome $[g_l,\ldots,g{l+m}]$, the second step computes
  31 the projection of each gene according to $\pi$.
  32 The resulting genome  which is
  33 $$
  34 [\pi(g_l),\ldots,\pi(g{l+m})]
  35 $$
  36 is again of size $m$.
  37
  38 Intuitively speaking, for two genes $g_i$ and $g_j$
  39 in the same equivalence class, there is path from  $g_i$ and $g_j$.
  40 It signifies that  each evolution step
  41 (represented by an edge in the similarity graph)
  42 has produced a gene s.t. the similarity with the previous one
  43 is greater than $t$.
  44 Genes $g_i$ and $g_j$ may thus have a common ancestor.
  45
  46
  47 We compute the core genome as follow.
  48 Each genome is projected according to $\pi$. We then consider the
  49 intersection of all the projected genomes which are considered as sets of genes
  50 and not as sequences of genes.
  51 This results as the set of all the class $\dot{x}$
  52 such that each genome has an gene $x$ in  $\dot{x}$.
  53 The pan genome is computed similarly: the union of all the
  54 projected genomes in computed here.
  55