conclusion.tex

   1 In this paper, we applied three methodologies for extracting core genes from large chloroplasts genomes. Extracted core genes depend on gene features and sequences. We developed a program using python to extract the core genes based on three methodologies. We considered first to extract core genes by sequence comparisons based on NCBI annotation. But the method failed to produce a core gene with different similarity thresholds because of NCBI annotation problems. We considered then to use DOGMA annotation tool to enhance core genes. Second and third methods used the annotation from NCBI and DOGMA. Second method is to extract gene names from gene features. An Intersection core metrix built where each position stores the intersection score by intersect two genomes (\emph{i.e. set of genes}) at a time. Core genes then constructed by selecting the maximum IS from ICM, remove the two intersected genomes with maximum IS, and add the corresponding core genes to ICM. In third method, a gene quality test is considered to ensure that the gene produced from NCBI annotation is the same gene (\emph{i.e.} gene name and sequence) produced by DOGMA. A gene  quality test take place to construct new genomes according to the genes that pass a specific similarity threshold of 65\%, ICM then will take place to extract the core genes.\\
   2 Core tree are generated from each method to display the distribution of chloroplasts and core genomes. The tree from second method based on DOGMA annotation shows that the distribution of chloroplasts (\emph{i.e. Green Algae, Red Algae, and Land plants}) match chloroplasts evolution history where each endosymbiosis event is branched well in the tree.