-\begin{algorithm}[H]
-\caption{Extract Maximum Core genes based on Blast}
-\label{Alg2:secondM}
-\begin{algorithmic}
-\REQUIRE $Ref\_Genome \leftarrow \text{Accession No}$
-\ENSURE $core \leftarrow \text{Genomes for each gene}$
-\FOR{$gene \leftarrow Ref\_Genome$}
- \STATE $G\_list= \text{empty list}$
- \STATE $File \leftarrow Blastn(gene)$
- \STATE $G\_list \leftarrow File[\text{Genomes names}]$
- \STATE $Core \leftarrow [Accession\_No:G\_list]$
-\ENDFOR
-\RETURN $Core$
-\end{algorithmic}
-\end{algorithm}
-
-The hypothesis in last method state: we can predict the best annotated genome by merge the annotated genomes from NCBI and dogma based on the quality of genes names and sequences. To generate all quality genes of each genome. the hypothesis state: Any gene will be in predicted genome if and only if the annotated genes between NCBI and Dogma pass a specific threshold of\textit{quality control test}. To accept the quality test, we applied Needle-man Wunch algorithm to compare two gene sequences with respect to pass a threshold. If the alignment score pass this threshold, then the gene will be in the predicted genome, else the gene will be ignored. After predicting all genomes, one of previous two methods can be applied to extract core genes. As shown in Algorithm \ref{Alg3:thirdM}.
-
-\begin{algorithm}[H]
-\caption{Extract new genome based on Gene Quality test}
-\label{Alg3:thirdM}
-\begin{algorithmic}
-\REQUIRE $Gname \leftarrow \text{Genome Name}, Threshold \leftarrow 65$
-\ENSURE $geneList \leftarrow \text{Quality genes}$
-\STATE $dir(NCBI\_Genes) \leftarrow \text{NCBI genes of Gname}$
-\STATE $dir(Dogma\_Genes) \leftarrow \text{Dogma genes of Gname}$
-\STATE $geneList=\text{empty list}$
-\STATE $common=set(dir(NCBI\_Genes)) \cap set(dir(Dogma\_Genes))$
-\FOR{$\text{gene in common}$}
- \STATE $g1 \leftarrow open(NCBI\_Genes(gene)).read()$
- \STATE $g2 \leftarrow open(Dogma\_Genes(gene)).read()$
- \STATE $score \leftarrow geneChk(g1,g2)$
- \IF {$score > Threshold$}
- \STATE $geneList \leftarrow gene$
- \ENDIF
-\ENDFOR
-\RETURN $geneList$
-\end{algorithmic}
-\end{algorithm}
-
-Here, geneChk is a subroutine in python, it is used to find the best similarity score between two gene sequences after applying operations like \textit{reverse, complement, and reverse complement}. The algorithm of geneChk is illustrated in Algorithm \ref{Alg3:genechk}.
-
-\begin{algorithm}[H]
-\caption{Find the Maximum similarity score between two sequences}
-\label{Alg3:genechk}
-\begin{algorithmic}
-\REQUIRE $gen1,gen2 \leftarrow \text{NCBI gene sequence, Dogma gene sequence}$
-\ENSURE $\text{Maximum similarity score}$
-\STATE $Score1 \leftarrow needle(gen1,gen2)$
-\STATE $Score2 \leftarrow needle(gen1,Reverse(gen2))$
-\STATE $Score3 \leftarrow needle(gen1,Complement(gen2))$
-\STATE $Score4 \leftarrow needle(gen1,Reverse(Complement(gen2)))$
-\IF {$max(Score1, Score2, Score3, Score4)==Score1$}
- \RETURN $Score1$
-\ELSIF {$max(Score1, Score2, Score3, Score4)==Score2$}
- \RETURN $Score2$
-\ELSIF {$max(Score1, Score2, Score3, Score4)==Score3$}
- \RETURN $Score3$
-\ELSIF {$max(Score1, Score2, Score3, Score4)==Score4$}
- \RETURN $Score4$
-\ENDIF
-\end{algorithmic}
-\end{algorithm}