|
|
|
IntroductionComparative genome analysis is a powerful method for aiding gene identification, inferring function of a gene�s product, and identifying novel functional elements such as those involved in transcriptional control. In order to identify the functionally important units it may be necessary to compare genome sequences from a variety of organisms, although any organism-specific features will not be detected by this strategy. The more distantly related organisms are likely to show sequence conservation in coding regions alone. This may also be the case for distantly related vertebrates such as fish and human. The more closely related organisms, such as two mammals, or two species of worm, are likely to be conserved in coding regions, but also in other functional elements such as regulatory sequences. However, the closer the evolutionary relationship between the two organisms being considered, the more �sequence noise� is likely to arise where non-functional sequence appears similar because insufficient time has elapsed for the two sequences to diverge. The closer the organism, the more the differences become important, e.g. between human and chimp. The most extreme example of this is seen in human sequence variation. Comparative Analysis of GenesOrthologous genes are defined as being homologous genes in different organisms derived from the same gene during speciation. When inferring the function of one gene based on the function of a predicted orthologue, it is important to be able to distinguish, where possible, between:
When analyzing potential orthologous sequences, it is important to be confident that you are dealing with true orthologues. This can be done using the following information:
Comparative Genome AnalysisWhen two species diverge from a common ancestor those sequences that maintain their original function are likely to remain conserved in both species throughout their subsequent independent evolution. Therefore comparing sequences in different species is a powerful tool for increasing the confidence of a predicted functional unit, or identifying novel functional units. Click here for a table showing the major genomes sequenced or in progress, their web addresses, and useful sites for viewing the data. (Check also Ensembl, UCSC and NCBI first to see what data they are making available as these three sites make data available in a uniform and linked fashion). This table is not exhaustive. The WGS method is being used to generate large numbers of sequences for many genomes and these unassembled sequence traces are made available through two trace acrchives at The Ensembl trace archive (http://trace.ensembl.org) and the NCBI trace archive (http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?) Aims
Exercises - Comparing Genomic DNAExercise 1 � Comparative gene analysis: Using the mouse Hoxc9 gene in Ensembl (or your favourite gene)
Exercise 2 � Comparative genome analysis: Generate genomic sequence alignments using PIP and VISTA to look for conserved features.Using the mouse Cpeb2 gene:
Now try the process yourself. Using the mouse Cpeb2 gene in Ensembl (save the following sequence files as text files, not word files):
|
Exercise 2 Files |