1 Identifying core genes is important to understand evolutionary and
2 functional phylogenies. Therefore, in this work we present methods to
3 build a genes content evolutionary tree. More precisely, we focus on
4 the following questions considering a collection of 99~chloroplasts
5 annotated from NCBI \cite{Sayers01012011} and Dogma \cite{RDogma}: how
6 can we identify the best core genome and what is the evolutionary
7 scenario of these chloroplasts.\\
8 Chloroplast (such as mitochondria) are fondamental key elements in
9 living organisms history. Indeed, chloroplast in Eucaryotes are organites responsible for
10 photosynthesis. Photosynthesis is the main way to produce organic matter
11 from mineral matter, using solar energy. Consequently photosynthetic
12 organisms are at the base of most ecosystems trophic chains and
13 photosynthesis in Eucaryotes allowed a great speciation in the lineage
14 (to a great biodiversity). From an ecological point of view,
15 photosynthetic organisms are at the origin of the presence of dioxygen
16 in the atmosphere (allowing extant life) and are the main source of mid-
17 to long term carbon stockage (using atmospheric CO2) an important feature in the
18 context of climate change. Chloroplasts found in Eucaryotes have an endosymbiotic origin, meaning
19 that they from the incorporation of a photosynthetic bacteria (Cyanobacteria) within an eucaryotic cell. \\
21 By the principle of phylogenetic classification, a mutation in the DNA shared by two to several taxa has a higher probability to be inherited from common ancestor than to have evolved independently. In such a process, shared changes in the genomes allow to build relationships between species. In the case of chloroplasts, an important category of changes in the genome is the loss of functional genes, when inoperant or when transferred to the nucleus. Thereby, we hypothesize that small number of gene losses among species indicates
22 that these species are close to each other and belong to same lineage,
23 while a large loss means that we have an evolutionary relationship
24 between species from much more distant lineages. Phylogenetic relationships are mainly built by comparison of sets of coding and non-coding sequences. Phylogenies of photosynthetic plants are important to assess the origin of chloroplasts (REF) and the modalities of gene loss among lineages. These phylogenies are usually done using less than ten chloroplastic genes (REF), and some of them may not be conserved by evolution process for every taxa. As phylogenetic relationships inferred from data matrices complete for each species included and with the same evolution history are better assumptions, we selected core genomes for a new investigation of photosynthetic plants phylogeny. To depict the links between species clearly, we here intend to built a phylogenetic tree showing the relationships based on the distances among gene sequences of a core genome. The circumscription of the core chloroplast genomes for a given set of photosynthetic organisms needs bioinformatic tools for sequence annotation and comparison that we describe here.
26 Other possible scientific questions to consider for introduction improvement:
27 Which bioinformatic tools are necessary for genes comparison in selected complete chloroplast genomes? Which bioinformatic tools are necessary to build a phylogeny of numerous genes and species, etc?