1 This step considers as input the set
2 $\{((g_1,g_2),r_{12}), (g_1,g_3),r_{13}), (g_{n-1},g{n}),r_{n-1.n})\}$ of
3 $\frac{n(n-1)}{2}$ elements.
4 Each one $(g_i,g_j),r_{ij})$ where $i < j$,
5 is a pair that gives the similarity rate $r_{ij}$ between the two genes
8 The first step of this stage consists in building the following non-oriented
9 graph further denoted as to \emph{similarity graph}.
10 In this one, the vertices are the genes. There is an edge between
11 $g_{i}$ and $g_{j}$ if the rate $r_{ij}$ is greater than a given similarity
14 We then define the relation $\sim$ such that
15 $ x \sim y$ if $x$ and $y$ belong in the same connected component.
16 Mathematically speaking, it is obvious that this
17 defines an equivalence relation.
18 Let $\dot{x}= \{y | x \sim y\}$
19 denotes the equivalence class to which $x$ belongs.
20 All the genes which are equivalent to each other
21 are also elements of the same equivalence class.
22 Let us then consider the set of all equivalence classes of the set of genes
23 by $\sim$, denoted $X/\sim = \{\dot{x} | x \textrm{ is a gene}\}$.
24 defined by $\pi(x) = \dot{x}$
25 which maps each gene into it respective equivalence class by $\sim$.
30 For each genome $[g_l,\ldots,g{l+m}]$, the second step computes
31 the projection of each gene according to $\pi$.
32 The resulting genome which is
34 [\pi(g_l),\ldots,\pi(g{l+m})]
38 Intuitively speaking, for two genes $g_i$ and $g_j$
39 in the same equivalence class, there is path from $g_i$ and $g_j$.
40 It signifies that each evolution step
41 (represented by an edge in the similarity graph)
42 has produced a gene s.t. the similarity with the previous one
44 Genes $g_i$ and $g_j$ may thus have a common ancestor.
47 We compute the core genome as follow.
48 Each genome is projected according to $\pi$. We then consider the
49 intersection of all the projected genomes which are considered as sets of genes
50 and not as sequences of genes.
51 This results as the set of all the class $\dot{x}$
52 such that each genome has an gene $x$ in $\dot{x}$.
53 The pan genome is computed similarly: the union of all the
54 projected genomes in computed here.