Working with Electronic Sequence Databases

Electronic Sequence Formats

Converting Electronic Sequences

A full listing of Conversion Programs can be found at the Pasteur Institute.  The most commonly used of these is Readseq, available at a number of locations:
Pasteur (Advanced; Java) | IUBio | BIMAS | Download (Java) | Baylor

Exercises

A. A Mammoth Undertaking

  1. Search Entrez protein using the keyword "mammoth"
  2. What confounding results show up?
  3. Do you get useful results searching Entrez nucleotide?
  4. Repeat the search at SRS. Are there any differences?

B. Format Conversion

  1. Search Entrez protein for accession number AY044919
  2. Search Entrez for the biglycan (BGN) gene in the Indian elephant (Elephas maximus)
  3. Download one of the sequences in GenBank format
  4. Convert the sequence to FASTA format using ReadSeq
  5. Look for the OMIM entry for BGN

C. Tasmanian Taxonomy

  1. Search the Taxonomy Browserfor the Tasmanian Tiger
  2. Click on the Metatheria (Marsupial) link in the lineage of the tiger.  Get the nucleotide sequences
  3. Refine the Entrez query to get only cytochrome b sequences
  4. Download the set in FASTA format
  5. Browse through the lineage to get an outgroup
  6. See what other extinct taxa are available
  7. See what LinkOut resources are available for Drosophila melanogaster

D. Batch Entrez

  1. Construct a batch file of Indian elephant biglycan genes
  2. Retrieve the sequences using Batch Entrez
  3. Using this batch file, obtain biglycan genes for the Elephantidae

E. Looking for Lubber

  1. Search Entrez for "lubber"
  2. Is there a human homolog for the first protein listed?
  3. Use the BLink to see other promising areas of exploration
  4. Investigate the pheromone-related genes -- are there any Bombyx homologs?
  5. In Entrez Structure, search for 1GM0

References

  1. GenBank, Benson et al., NAR
  2. LocusLink
  3. Database Resources of NCBI, 2001
  4. Human Genome Resources at NCBI Factsheet
  5. Gibbs AJ & McIntyre GA. 1970. The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur J Biochem 1970 16(1):1-11 PMID: 5456129
  6. Peter D. Karp, Suzanne Paley, and Jingchun Zhu. Database verification studies of SWISS-PROT and GenBank. Bioinformatics 2001 17: 526-532