-\subsubsection{Genome annotation from Dogma}
-Dogma is an annotation tool developed in the university of Texas by \cite{RDogma} in 2004. Dogma is an abbreviation of \textit{Dual Organellar GenoMe Annotator}\cite{RDogma} for plant chloroplast and animal mitochondrial genomes.
-It has its own database for translated the genome in all six reading frames and query the amino acid sequence database using Blast\cite{altschul1990basic}(i.e Blastx) with various parameters, and to identify protein coding genes\cite{parra2007cegma,RDogma} in the input genome based on sequence similarity of genes in Dogma database. Further more, it can produce the \textit{Transfer RNAs (tRNA)}\cite{RDogma}, and the \textit{Ribosomal RNAs (rRNA)}\cite{RDogma} and verifying their start and end positions rather than NCBI annotation tool. There are no gene duplication with dogma after solving gene fragmentation. \\
-Genome Anntation with dogma can be the key difference of extracting core genes. In figure \ref{dog:Annotation}, The step of annotation divided into two tasks: First, It starts to annotate complete choloroplast genomes (i.e \textit{Unannotated genomes} from NCBI by using Dogma web tool. The whole annotation process was done manually. The output from dogma is considered to be collection of coding genes file for each genome in the form of GeneVision\cite{geneVision} file format.\\
-Where the second task is to solve gene fragments. Defragment process starts immediately after the first task to solve fragments of coding genes for each genome to avoid gene duplication. All genomes after this stage are fully annotated, their genes were de-fragmented, genes lists and counts were identified. These information stored in local database.\\
-\begin{figure}[H]
- \centering
- \includegraphics[width=0.7\textwidth]{Dogma_GeneName}
- \caption{Dogma Annotation for Chloroplast genomes}\label{dog:Annotation}
-\end{figure}
+\begin{algorithm}[H]
+\caption{Extract new genome based on Gene Quality test}
+\label{Alg3:thirdM}
+\begin{algorithmic}
+\REQUIRE $Gname \leftarrow \text{Genome Name}, Threshold \leftarrow 65$
+\ENSURE $geneList \leftarrow \text{Quality genes}$
+\STATE $dir(NCBI\_Genes) \leftarrow \text{NCBI genes of Gname}$
+\STATE $dir(Dogma\_Genes) \leftarrow \text{Dogma genes of Gname}$
+\STATE $geneList=\text{empty list}$
+\STATE $common=set(dir(NCBI\_Genes)) \cap set(dir(Dogma\_Genes))$
+\FOR{$\text{gene in common}$}
+ \STATE $g1 \leftarrow open(NCBI\_Genes(gene)).read()$
+ \STATE $g2 \leftarrow open(Dogma\_Genes(gene)).read()$
+ \STATE $score \leftarrow geneChk(g1,g2)$
+ \IF {$score > Threshold$}
+ \STATE $geneList \leftarrow gene$
+ \ENDIF
+\ENDFOR
+\RETURN $geneList$
+\end{algorithmic}
+\end{algorithm}
+
+\textbf{geneChk} is a subroutine, it is used to find the best similarity score between two gene sequences after applying operations like \textit{reverse, complement, and reverse complement}. The algorithm of geneChk is illustrated in Algorithm \ref{Alg3:genechk}.
+
+\begin{algorithm}[H]
+\caption{Find the Maximum similarity score between two sequences}
+\label{Alg3:genechk}
+\begin{algorithmic}
+\REQUIRE $gen1,gen2 \leftarrow \text{NCBI gene sequence, Dogma gene sequence}$
+\ENSURE $\text{Maximum similarity score}$
+\STATE $Score1 \leftarrow needle(gen1,gen2)$
+\STATE $Score2 \leftarrow needle(gen1,Reverse(gen2))$
+\STATE $Score3 \leftarrow needle(gen1,Complement(gen2))$
+\STATE $Score4 \leftarrow needle(gen1,Reverse(Complement(gen2)))$
+\RETURN $max(Score1, Score2, Score3, Score4)$
+\end{algorithmic}
+\end{algorithm}