annotated.tex

   1 The field of Genome annotation pay a lot of attentions where the ability to collect and analyze genomical data can provide strong indicator for the study of life\cite{Eisen2007}. A lot of genome annotation centres present various types of annotation tools (cost-effective sequencing methods\cite{Bakke2009}) on different annotation levels. In this section, we will consider a new method of annotation for extracting core genome from large amount of chloroplast genomes as a solution of the previous method where stated in section two. This method is based on extracting gene features from well annotated genomes. The question now is how can we have good annotation genome? To answer this question, we need to focusing on studying the annotation's accuracy (systematically\cite{Bakke2009}) of the genome. The general overview of the system is illustrated in Figure 1.\\
   2
   3
   4 \begin{figure}[h]
   5 \caption{A general overview of the system}
   6   \centering
   7     \includegraphics[width=0.5\textwidth]{generalView}
   8 \end{figure}
   9
  10 In Figure 1, we illustrate the general overview of the system. In this system, there are three main stages: \textit{Database, Gene extraction ,} and \textit{relationships}. There are many international nucleotide sequence databases like (GenBank/NCBI in USA at (http://www.ncbi.nlm.nih.gov/genbank/),\\ EMBL-Bank/ENA/EBI in Europe at (http://www.ebi.ac.uk/ena/), and DDBJ in Japon at (http://www.ddbj.nig.ac.jp/)). In our work, the database must be any confident data source that store annotated or unannotated chloroplast genomes. We will consider GenBank/NCBI database as our nucleotide sequences database. Extract Gene Features, we refer to our main process of extracting needed information to find core genome from well large annotation genomes. Thanks to good annotation tool that lead us to extract good gene features. Here, Gene features can be anything like (genes names, gene sequences, protein sequence,...etc). To verify the results from our system, we need to organize and represent our results in the form of (tables, phylogenetic trees, graphs,...,etc), and compare these results with another annotation tool like Dogma\cite{RDogma}. All this work is to see the relationship among our large population of chloroplast genomes and find the core genome for root ancestral node. Furthermore, in this part we can visualize the evolution relationships of different chloroplast organisms.\\
  11 The output from each stage in our system will be considered to be an input to the second stage and so on. The rest of this section, in section 3.1, we will introduce some annotation problem with NCBI chloroplast genomes and we will discuss our method for how can we extract useful data. Section 3.2 we will present here our system for calculating evolutionary core genome based on another annotation tool than NCBI.
  12
  13 \subsection{Gene Extraction Techniques from annotated NCBI genomes}
  14 With NCBI, the idea is to use the existing annotations of NCBI for chloroplast genomes to extract the core and pan genome. Techniques used here is by using Gene name and Gene contents based on some similarity issues.
  15
  16 \subsubsection{Core genome based on NCBI Genes Names}
  17 Our simple idea to construct core genome is based on the extraction of Genes names from chloroplast genomes annotated by NCBI. For instant, in this stage neither sequence comparison nor new annotation were made, we just want to extract genes names as stored in each chloroplast genome in NCBI.
  18 By built a dictionary of genes names of each genome, from collection process, we consider a number of duplications in each genome, in other words, name duplication comes from genes fragments a long chloroplast DNA sequences. To achieve core genome, we need to reach the identical state, without regard to the position or gene orientation, where each gene has only one name. To filter the dictionary from gene name duplications, we change the list of genes names of each genome to be a set of genes names. By using the concept and the definition of a set in mathematics, we remove all the duplications and reach the identical state. \\
  19 By using the intersection among these genomes
  20
  21 \subsubsection{Extracting Core genes from NCBI annotations}
  22
  23 \subsection{Dogma Annotation tool}
  24
  25
  26 \subsubsection{Why Dogma?}