-\textit{GenomeList} represents the local database.\\
-
-In second Method, due to the number of annotated genomes, annotate each genome can be very exhausted task specially with Dogma, because dogma offer a web tool for annotation, so that, each genome must annotate using this web tool. This operation need to do manually. We prefer to recover this problem by choosing one reference chloroplast and querying each reference gene by using \textit{Blastn} to examin its existance in remaining unannotated genomes in blast database. collect all match genomes from each gene hits, to satisfy the hypothesis "the gene who exists in maximum number of genomes also exist in a core genes". In addition, we can also extract the maximum core genes by examine how many genes present with each genome?. Algorithm \ref{Alg2:secondM}, state the general algorithm for second method. \\
-
-\begin{algorithm}[H]
-\caption{Extract Maximum Core genes based on Blast}
-\label{Alg2:secondM}
-\begin{algorithmic}
-\REQUIRE $Ref\_Genome \leftarrow \text{Accession No}$
-\ENSURE $core \leftarrow \text{Genomes for each gene}$
-\FOR{$gene \leftarrow Ref\_Genome$}
- \STATE $G\_list= \text{empty list}$
- \STATE $File \leftarrow Blastn(gene)$
- \STATE $G\_list \leftarrow File[\text{Genomes names}]$
- \STATE $Core \leftarrow [Accession\_No:G\_list]$
-\ENDFOR
-\RETURN $Core$
-\end{algorithmic}
-\end{algorithm}
-
-The hypothesis in last method state: we can predict the best annotated genome by merge the annotated genomes from NCBI and dogma based on the quality of genes names and sequences. To generate all quality genes of each genome. the hypothesis state: Any gene will be in predicted genome if and only if the annotated genes between NCBI and Dogma pass a specific threshold of\textit{quality control test}. To accept the quality test, we applied Needle-man Wunch algorithm to compare two gene sequences with respect to pass a threshold. If the alignment score pass this threshold, then the gene will be in the predicted genome, else the gene will be ignored. After predicting all genomes, one of previous two methods can be applied to extract core genes. As shown in Algorithm \ref{Alg3:thirdM}.
-
-\begin{algorithm}[H]
-\caption{Extract new genome based on Gene Quality test}
-\label{Alg3:thirdM}
-\begin{algorithmic}
-\REQUIRE $Gname \leftarrow \text{Genome Name}, Threshold \leftarrow 65$
-\ENSURE $geneList \leftarrow \text{Quality genes}$
-\STATE $dir(NCBI\_Genes) \leftarrow \text{NCBI genes of Gname}$
-\STATE $dir(Dogma\_Genes) \leftarrow \text{Dogma genes of Gname}$
-\STATE $geneList=\text{empty list}$
-\STATE $common=set(dir(NCBI\_Genes)) \cap set(dir(Dogma\_Genes))$
-\FOR{$\text{gene in common}$}
- \STATE $g1 \leftarrow open(NCBI\_Genes(gene)).read()$
- \STATE $g2 \leftarrow open(Dogma\_Genes(gene)).read()$
- \STATE $score \leftarrow geneChk(g1,g2)$
- \IF {$score > Threshold$}
- \STATE $geneList \leftarrow gene$
- \ENDIF
-\ENDFOR
-\RETURN $geneList$
-\end{algorithmic}
-\end{algorithm}