+Dogma stands for \textit{Dual Organellar GenoMe Annotator}. It is an
+annotation tool developed at University of Texas in 2004 for plant
+chloroplast and animal mitochondrial genomes. This tool has its own
+database for translating a genome in all six reading frames and
+queries the amino acid sequence database using
+BLAST \cite{altschul1990basic} (\emph{i.e.} Blastx) with various
+parameters. Protein coding genes are identified in an input genome
+using sequence similarity of genes in Dogma database. In addition in
+comparison with NCBI annotation tool, Dogma can produce
+both \textit{Transfer RNAs (tRNA)} and \textit{Ribosomal RNAs (rRNA)},
+verify their start and end positions. further more, there is no gene duplication with gene annotations from Dogma after applying gene de-fragmentation process. In fact, genome annotation with Dogma can be the key difference when extracting core genes.
+
+The Dogma annotation process is divided into two tasks. First, we
+manually annotate chloroplast genomes using Dogma web tool. The output
+of this step is supposed to be a collection of coding genes files for
+each genome, organized in GeneVision file. The second task is to solve
+the gene duplication problem and therefore we have used two
+methods. The first method, based on gene name, translates each genome
+into a set of genes without duplicates. The second method avoid gene
+duplication through a defragment process. In each iteration, this
+process starts by taking a gene from gene list, searches for gene
+duplication, if a duplication is found, it looks on the orientation of
+the fragment sequence. If it is positive it appends directly the
+sequence to gene files. Otherwise reverse complement operations are
+applied on the sequence, which is then also append to gene files.
+Finally, a check for missing start and stop codons is performed. At
+the end of the annotation process, all the genomes are fully
+annotated, their genes are defragmented, and gene counts are
+available.
+
+\subsection{Core genes extraction}
+
+The goal of this stage is to extract maximum core genes from sets of
+genes. To find core genes, the following methodology is applied.
+
+\subsubsection{Preprocessing}
+
+In order to extract core genomes in a suitable manner, the genomic
+data are preprocessed with two methods: on the one hand a method based
+on gene name and count, and on the other hand a method based on a
+sequence quality control test.
+
+In the first method, we extract a list of genes from each chloroplast
+genome. Then we store this list of genes in the database under genome
+nam and genes counts can be extracted by a specific length command.
+The \textit{Intersection Core Matrix}, described in next subsection,
+is then computed to extract the core genes. The problem with this
+method can be stated as follows: how can we ensure that the gene which
+is predicted in core genes is the same gene in leaf genomes? The
+answer to this problem is that if the sequences of any gene in a
+genome annotated from Dogma and NCBI are similar with respect to a
+given threshold, the method is operational when the sequences are not similar. The problem of attribution of a sequence to a gene in the core genome come to light.
+
+The second method is based on the underlying idea that it is possible to predict the the best annotated genome by merging the annotated genomes from NCBI
+and Dogma according to a quality test on genes names and sequences. To
+obtain all quality genes of each genome, we consider the following
+hypothesis: any gene will appear in the predicted genome if and only
+if the annotated genes in NCBI and Dogma pass a specific threshold
+of \textit{quality control test}. In fact, the Needle-man Wunch
+algorithm is applied to compare both sequences with respect to a
+threshold. If the alignment score is above the threshold, then the
+gene will be retained in the predicted genome, otherwise the gene is
+ignored. Once the prediction of all genomes is done,
+the \textit{Intersection Core Matrix} is computed on these new genomes
+to extract core genes, as explained in Algorithm \ref{Alg3:thirdM}.