Resources

Exercises

Apocalyptic Biology

  1. You can read about the Cohen Modal Haplotype
  2. View one of the associated variants

Investigating dbSNP

  1. Go to Entrez and search for rs669
  2. Click on the SNP result (or just click here)
  3. Check out the available views
  4. Click here to access rs300, which occurs on lipoprotein lipase
  5. How is this information represented at Ensembl and UCSC?
  6. What mutation does this SNP cause?
  7. Where is this amino acid located in the protein?

Extended Exercise

This exercise presents a true SNP workout: you will identify SNPs, haplotypes, and their relationships manually, search for SNPs in a human gene, discover how SNPs affect protein sequence and structure, and examine the relationship between SNPs and disease.


Exercise 1 - SNPs, haplotypes, and relationships

Determine the differences between the following six sequences - either manually or by using a Multiple Sequence Alignment tool such as Pasteur Clustalw, BCM ClustalW, EBI ClustalW, or DNALC Sequence Server.

  • Determine manually how the following six haplotypes could be derived from each other. - (Haplotypes are groups of linked SNPs which are somewhat inherited in a linked fashion.) Draw a phylogenetic tree, assuming that at each branching point only one nucleotide is being altered. Thus, from each branching point two lineages proceed: one carrying the parental haplotype and one carrying the haplotype that contains a change in one nucleotide position.
    1. --A--G--T--A--C--
    2. --A--C--T--A--T--
    3. --A--G--C--A--T--
    4. --A--G--T--A--T--
    5. --G--C--T--A--T--
    6. --A--G--T--G--C--
  • Perform a tree prediction - using a Maximum Likelihood program at WebPhylip.
    • In each haplotype above replace each double dash -- with agctggctgaagctggctga
    • Go to WebPhylip.
    • Select DNA.
    • Select Run under 5. Max. Likeli. with mol. clock.
    • In the frame on the lower right hand view example. Format the six sequences exactly as required. (Six sequences, 125 nucleotides each: write: 6 125. Each sequence gets one line. First 10 fields are reserved for name. Therefore each sequence starts at position 11.)
    • Enter the six sequences into the window and select Submit.
    • View the result and compare with your manual prediction.

Exercise 2 - Find SNPs in a human gene

Find out what NFGR is and identify a SNP within its CDS.

  • Check out the gene at NCBI.
    • Go to NCBI's human genome browser.
    • Type NGFR into the Search for-window. Select Find
    • Hits are indicated by red bars. Select the number underneath the chromsome.
    • Zoom into the chromosome view until you can determine the length of the gene.
    • How long is NGFR and how many introns does it contain?
  • Find the ngfr gene in the human and mouse genomes at UCSC.
    • Go to UC Santa Cruz Genome Bioinformatics.
    • Make sure human is selected as the organism.
    • Select the browser button on left hand side.
    • Copy and paste the ngfr RefSeq ID (NP_002498) in the position box.
    • Selecting submit should take you to the browser displaying the ngfr gene (left column).
  • Locate and activate UCSC browser's SNP tracks.
    • Set the Overlap SNPs, and the Random SNPs tracks to full and refresh.
    • zoom out 1.5x
    • Select the SNP in the 4th exon (rs2072446).
    • Information on the SNP is being provided. Further info can be found at dbSNP by selecting the dbSNP link
    • There, even more useful info is supplied, including its position in the gene model and if it causes any amino acid substitutions.
    • Find and click the Contig accession number; click again.
    • Under Edit select Find and type in 2072446; click Find next.
    • Back in the Reference SNP Cluster Report, find and click See rs2072446 in Sequence Viewer.
    • Is this a synonymous or non-synonymous change? Which amino acids are the different nucleotides triplets coding for?
  • Find out whether rs2072446 causes disease.
    • Go to NCBI's Human Genome Viewer.
    • Search for rs2072446.
    • Click on the chromosome number.
    • Is SNP in exon or intron?
    • Change Region Shown: to 50050000 and 50080000.
    • Select Go.
    • Is SNP in exon or intron?
    • Change Region Shown: to 50067000 and 50069000.
    • Select Go.
    • Is SNP in exon or intron?
    • Under Maps & Options remove everything except Gene, Morbid/Disease, dbSNP haplotype and Variation. (Variation should be at the bottom of the list).
    • Change Region Shown: to 50067700 and 50068200.
    • Select Go.
    • Move cursor to the Morbid/Disease column, then move it slowly down. It will pass two areas of six and five flags, respectively. What diseases are these flags denoting?
    • Are there diseases only denoted in exon? How could you explain these? How others?
    • Look up some of the diseases by clicking respective flags.
    • It is actually wrong to connect any of these diseases with the NGFR gene; why? (To answer this question change Region Shown: to 40000000 and 55000000).

Exercise 3 - SNPs and Proteins

  • View effect of SNPs on protein 3D structure.
    • Go to NCBI dbSNP
    • Type rs300 into Search by IDs window.
    • Click Search.
    • Find NCBI Resource Links. Underneath, find 3D structure mapping: Hits to proteins with structure available: NP_000228.
    • Click NP_000228.
    • In order to view 3D structures download and install Cn3D to your computer.
    • On the previous page go to the column Cn3D and check the boxes for rs1121923 and rs300.
    • Click Selected.
    • Click Open.
    • Click Open again.
    • 3D structure in the Cn3D window..
    • On the previous page you can also examine all synonymous and/or non-synonymous changes at once. Just select the respective checkboxes and click Selected.
  • What protein is affected by SNP rs300?
    • On the previous page go up to Search (gray bar) and type in the id for the protein NP_000228.1.
    • Change the database from SNP to Protein.
    • Select Go.
    • What's the protein?
    • Click NP_000228.1.
    • Check out the links on the result page; what happens if you go to BLink? To Domains?


Exercise 4 - SNPs, haplotypes, and genes: Identify SNPs in SNP cluster/haplotype

  • Go to NCBI's dbSNP. Search for rs2211792.
  • What's the polymorphism?
  • Find Handle|Submitter ID and PERLEGEN|P00220042
  • To get to the Perlegen Haplotypesfor this SNP select on the previous page select PERLEGEN.
  • Which haplotypes can you identify?
  • What's the name of the Haplotype Block?
  • Go to NCBI's Human Genome Viewer.
  • Search for rs2211792.
  • Click on the chromosome number
  • Under Maps & Options remove everything except Gene, dbSNP haplotype and Variation. (Variation should be at the bottom of the list).
  • Zoom in to 8x.
  • Repeat (make sure rs2211792 stays centered).
  • What gene is this SNP in? (Change Maps & Options by moving Gene to the bottom).
  • Go to Sequence View and find SNP in the sequence. Is it in an intron or exon?
  • What haplotype block is this SNP in? (Change Maps & Options by moving dbSNP haplotype to the bottom.)
  • What other gene is covered by this haplotype block?
  • What are the four haplotypes for this haplotype block? (Change Maps & Options by moving dbSNP haplotype to the bottom; select PERLEGEN.)


Exercise 5 - SNPs in biomedicine

SNPs can be indicative for disease risk as well as the ability to respond to treatments and medications. A recent study determined haplotypes of SNPs in the promoter and coding region of �2-adrenergic receptor (�2-AR) gene (Drysdale et al. 2000). �2-AR is expressed on bronchial smooth muscle and acts to relax contracted smooth muscle, resulting in bronchodilation. Agonists to this receptor, such as albuterol, are the most effective acute treatment for the reversal of bronchospasm in asthma. The study was able to correlate certain SNP haplotypes with patient responsiveness to the agonist and provides a glimpse into the application of SNPs in pharmacogenetics/pharmacogenomics.

  • Open the article here - try to answer some of the questions below.
    • How many different SNPs did the authors examine? How many different halplotypes could theoretically be expected if all possible combinations of these SNPs would be realized in people? How did the author determine this number? How many different haplotypes did the authors identify in cohorts of asthma patients?.
    • Which single SNP is sufficient as a diagnostic tool to determine the affinity of a person to develop asthma?
    • Which haplotypes are prevalent in Hispanic Latino asthma patients as opposed to those from other ethnic groups?
    • Which haplotypes have a high occurence across all populations?
    • Which haplotype pairs (=genotypes) are common in asthmatics in different groups?
    • Which SNPs have been found to be associated with patients' response to Albuterol? Which amino acids are being replaced as a consequence of these SNPs? As a consequence of respective haplotypes?
    • How did the authors examine the physiological responses that are associated with certain haplotypes? What responses did they find associated with specific haplotypes of molecular bases?

Exercise 6 - Human Disease and SNPs

  • Identify SNPs that cause disease.
    • Open this view of the human Chromosome 13. Find and select the link Disease Browser.
    • Type a disease into the search window in the upper portion of the page (e.g. Breast Cancer), select Lookup.
    • Select a gene (e.g. BRCA 2).
    • Select Ensembl ID (e.g. ENSG00000139618).
    • Examine the information about the gene (length of gene; length and number of exons, introns; splice-site architecture).
    • Examine View Evidence (not View supporting evidence!) and View Protein. The View evidence link has disappeared in January 2003. Discard this step!
    • Examine some of the links listed on the previous page, specifically:

      • HUGO ('BRCA2') (read about the genetics of this disorder)
      • LOCUS ('675') (hub to a variety of databases that contain information about the gene)
      • OMIM ('600185') (find a lot of clinical information about the disease here)
      • REFSEQ ('NM_000059') (NCBI entries on the gene)
      • SP ('BRC2_HUMAN [align]') (information on the protein, incl. aa sequence)

Exercise 7 - Identify SNPs in db

  • Identify SNPs in BRCA1 and BRCA2 (read about these genes here)
    • Go to National Cancer Institute and explore the link to Locations ... and Search Candidate SNPs ...
    • Use the keywords BRCA1 and BRCA2 to identify SNPs in these genes utilizing GenBank, OMIM and the SNP Consortium Database links from the NCI web site.


Exercise 8 - Identify SNPs in sequences

  • Identify SNPs in newly sequenced sequences.
    • Go to National Cancer Institute and get yourself a free User ID and password.
    • Upon logging in you can create projects, upload your trace files, and identify SNPs in your sequences. The program performs basecalling, assembly and searches for SNPs. However, you can't do this today because it takes a little while to get your user name and password.


Exercise 9 - Identify SNP clusters/haplotypes in NCBI's Hapmap

  • Hapmap is a haplotype mapping project under development.

Resources

  • GeneDis, Human Genetic Disease Database
  • HGMD, Human Gene Mutation Database Cardiff
  • HGBASE, Human Genic Bi-Allelic Sequences
  • HGVbase, Human Genome Variation Database
  • HUGO, Human Genome Variation Society
  • Mouse SNPs, Roche mouse SNP database

  • HuGENet� at CDC is a global collaboration of individuals and organizations who develop and communicate epidemiologic information on the human genome. Researchers are studying genetic variation to:
    • Develop population-specific prevalence data on human gene variants;
    • Develop epidemiologic data on the association between genetic variation and diseases in different populations;
    • Develop quantitative population-based data on gene-environment interaction;
    • Determine population impact of genetic tests and services.
    The results of these studies should lead to the use of genetic information to prevent disease and improve health.