Signals for Gene Finding
- There are many signals associated with genes, each of which suggests but does not prove the existence of a gene.
- Gene finding programs put together evidence from these various types of signals to suggest possible locations of genes in large unannotated sequences.
- Examples of signals which can be modeled with weight matrices (in order):
- Transcription factor binding sites.
- Cap site.
- Start codon.
- Splice sites.
- Stop codon.
- Polyadenylation site.
- Another important source of information for gene finding is the frequency of various codons in coding and non-coding regions. For example:
- The stop codons, TAA, TAG and TGA occur infrequently in coding regions.
- Codon's appear with different frequencies depending upon their neighbors and the local secondary structure.
- The frequencies of codons in coding and non-coding regions are accounted for in gene finding programs by using a log likelihood ratio similar to a weight matrix.