V22.0480.002

Special Topics in CS: Intro to Bioinformatics

Fall 2004 Mon/Wed 2:00 p.m.-3:15 p.m. WW 102

Prof. Jack Schwartz (212)673-3242 jack@brainlink.com

719 Broadway Room 1226

Course Website: http://www.settheory.com/bioinformatics_syllabus/bioinformatics_syllabus.html

Course Grader: Chris Quackenbush: satkin@nyu.edu

Syllabus

Contacting Me: You can contact me by an appointment, preferably by E-mail, but by phone in more urgent cases. I can make appointments Monday thru Wednesday, including meetings after the end of our Wednesday class (but not before.)

Course Requirements: A small programming project will be assigned approximately every other week and is due at the end of the following week. There will be 4 examinations: two in-class examinations, a mid-term, and a final. Small projects will be assigned approximately every other week. Your grade will be determined by your examination scores, the project work you submit, and your class participation. Normally there will be one project every other week, but some projects count as double, and cover three weeks. You can ask advice of fellow students, and do projects collaboratively with the explicit prior permission of the instructor, but concealed outright copying is forbidden. Starting in the second week of the course you will be asked to establish an Internet home page for yourself (if you do not have one already). Your project work should be submitted by E-mail to the grader.

Special projects: Various more advanced 'special projects' are also listed for students who wish to try something larger and more challenging. Each successfully completed special project excuses the student successfully submitting it from an indicated number of ordinary projects. More such projects will be announced in the course of the term.

Textbooks:

(1) Essential Cell Biology: An introduction to the Molecular Biology of the Cell
by Bruce Alberts, Dennis Bray, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, Peter Walter.
Hardcover: approx 800 pages; Publisher: Garland Pub; 2 edition (2004)

A more advanced version of this material is available online at

http://www.ncbi.nlm.nih.gov:80/books/bv.fcgi?call=bv.View..ShowSection&rid=cell.part.1

(2) The NCBI Handbook. Bethesda (MD): National Library of Medicine (US), NCBI; Nov. 2002. This is available online at

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?call=bv.View..ShowTOC&rid=handbook.TOC&depth=2.

Some Websites of importance:

National Center for Biological Information: http://www.ncbi.nlm.nih.gov/Entrez/

'Golden Path' human and other genome site at UCSC Genome Bioinformatics Home: http://genome.ucsc.edu/

Kyoto Genomics and Enzyme site: http://www.genome.ad.jp/kegg/kegg2.html

C. Elegans Nematode worm: http://elegans.swmed.edu/
and http://www.wormbase.org/

Useful free software (downloadable; locate via Google):

  1. Cn3D
  2. Rasmol
  3. Chime 2.6
  4. Protein Explorer
  5. ISIS Draw

Some very easy introductory material which you are required to master early in the course is found at the Cold Spring Harbor DNA learning website: http://www.dnaftb.org/dnaftb/15/concept/index.html/ Study modules 15-41 of this series as a quick introduction to the more detailed material in the textbook. A good knowledge of this material will be assumed on the first two exams.


** Week 1: (September 6) General Introduction. The macromolecules of biology. Cell life as the working of a program. Data yet to be supplied in the main databases. Viewing large molecules in 3D.

Readings: NCBI Handbook: 14. The Entrez Search and Retrieval System

** Week 2: (September 13) More about the macromolecules of biology. The genetic code.

Readings: Alberts: Chapter 1. Introduction to Cells

Readings: Alberts: Chapter 2. Chemical Components of cells

Readings: NCBI Handbook: 1. GenBank: The Nucleotide Sequence Database

Software: prepare_dna, translate_dna, rev_comp_dna, reverse_dna, complementary_dna and cds_start_and_translation from genetic_code_pak.stl

** Week 3: (September 20) The protein synthesis path. Additional biologically significant molecules. The notion of 'open reading frame'. Survey of the major public databases.

Readings: Alberts: Chapter 4. Protein structure and function

Readings: NCBI Handbook: 15. The BLAST Sequence Analysis Tool

Software: Routine histo_dna from genetic_code_pak.stl

Project 1: Project 1 (Due 2 weeks after assignment): Develop pair and triplet statistics for a selected one of the mitochondrial genomes NC_004926 (Chinese giant salamander); NC_002779 (bush moa); NC_002084 (African malaria mosquito); AB03855 (Japanese eel); NC_003415 (human hookworm); AB042952 (bowfin fish)

** Week 4: (September 27) Half hour exam, covering material of the first set of DNA Learning Center units, and also Alberts Chapters I, II.

Lecture: More about biologically significant molecules and forces; Restriction enzymes; Reverse transcriptase; the 'Palindrome' theme

Readings: Alberts: Chapter 5. DNA and Chromosomes.

Readings: NCBI Handbook: 4. The Taxonomy Project

Software: Routine align_by_mers from genetic_code_pak.stl

Project 2: Use the BLAST genome searcher to identify the organisms from which the following pieces of sequenced DNA come. In which of these do there seem to have been point mutations?

Sequence1: ggccaatagagttcgtgccttgaaagttagataagtggatgttgacg
Sequence2: aactcagcagatattagttgatgaagctgagaaccaattaaacaaactga
Sequence3: cgccgctgcctgcatcttcgctggactgtctactactacaggtgctaatg
Sequence4: ggggaagctcctgtttgctcctaacttgctcctggacaaatcaaggta
Sequence5: aatacactatacaccagacacaataaccgccttctcatcagtcacaca
Sequence6: caaatgccccgcgagccagcactcaacgtgcaggtccattaagctagtga


** Week 5: (October 4) Structure and properties of DNA; Structure and properties of Proteins.

Readings: Alberts: Chapter 7. From DNA to Protein.

Readings: NCBI Handbook: 6. The Gene Expression Omnibus (GEO): A Gene Expression and Hybridization Repository

Software: Data warehousing considerations: the collection of routines in rix_flat_pak.stl

Project 3: Use protein-protein BLAST searcher to identify the following proteins, the organisms from which they come, and various organisms in which related proteins occur.

Sequence1: HMWLGNECSQDESGAAAIFSTQLDDFLGGSPVQFREVQNNESLTFLGYFKSGIKYMQGGV
Sequence2: QPERKITRNQKRKHDEINHVQKTYAEMDPTTAALEKEHEAITKVKYVDKIHIGNYEIDAWYFSPFPEDYGKQPKLWLCEYCLKYMKYEK
Sequence3: RFVLTKLRVIQKGAFSGFGDLEKIEISQNDVLEVIEADVFSNLPKLHEIRIEKANNLLYITPEAFQNLPNLQYLLISNTGIKHL
Sequence4: ESVRGKDVFIIQTVSKDVNTTIMELLIMVYACRTSCARNIIGVIPYFPYSKQCKMRKRGSIVSKLLASMMCKAGLTHLITMDLHQKEIQG
Sequence5: EDFDRVKVIGRGAFGEVQLVRHKASQKVYAMKVLSKFEMIKRSDSAFFWEERDIMAFADSPWVVQ
Sequence6: YLLKFEQIYLSKPTHWERDGAPSPMMPNEARLRNLTYSAPLYVDITKTIIKDGEEQQQTQHQKTFIG

Advanced Project: Collect all internal exons of a few typical lengths using the Unigene-derived '.psl.n' files available from the course instructor, and align them either by eye of using some suitable alignment tool. Then group them by similarity. What groupings do you find?


** Week 6: (October 11) Midterm Exam: Covering material in second set of DNA Learning Center units, plus textbook Chapters 4 and 5

Lecture: Structure and properties of RNA

Readings: Alberts: Chapter 8. Control of gene expression

Readings: NCBI Handbook: 19. Using the Map Viewer to Explore Genomes

** Week 7: (October 18) Gene regulation. Basic experimental techniques and materials: DNA sequencing, Polymerase chain reaction, gene manipulation. Genomic libraries. Commercial sources of molecular biology materials, services, and equipment.

Readings: Alberts: Chapter 10. Manipulating genes and cells.

Readings: NCBI Handbook: 21. The Clusters of Orthologous Groups (COGs) Database: Phylogenetic Classification of Proteins from Complete Genomes

Project 5: Use Unigene database to build a collection of the genes on human chromosome X with introns removed. Does this seem to have statistics which differ significantly from those of the full chromosome? Do the genes seem to be a statistically homogeneous collection?

** Week 8: (October 25) Basic experimental techniques and materials II: electrophoresis, affinity columns, biotin/avidin.strepatavidin systems, microarrays and hybridization, phage display, artificial evolution of ribozymes, use of small interfering RNAs, GFP techniques, optical tweezers and force-sensing.

Readings: Alberts: Chapter 11. Membrane Structure

Readings: NCBI Handbook: 2. PubMed: The Bibliographic Database

Software: Use of histogramming to detect remote genomic relationships.

Advanced Project: Collect the RNA polymerase II genes of as many vertebrate species as you can find, and align them for comparison. How similar are the aligned genes? In how many arthropods can you find related genes? Can you find any in gastropods?

** Week 9: (November 1) More on cell structure; Mitochondria and taxonomic studies; Exon length stability, Use of hyperstable genes.

Readings: Alberts: Chapter 9. How genes and genomes evolve.

Readings: NCBI Handbook: 3. Macromolecular Structure Databases

Project 5: Survey the E coli genome for palindromes. These are patterns like ACCGTTNNAACGGA which could cause a single strand of RNA or DNA to bend in a 'hairpin' shape. Look for regions rich in such patterns. Do they carry any interesting annotations in the Genbank databases?

** Week 10: (November 8) DNA replication, repair, and mutation; The cell membrane More experimental techniques:

Readings: Alberts: Chapter 6. DNA replication, repair, and recombination

Readings: NCBI Handbook: 7. Online Mendelian Inheritance in Man (OMIM): A Directory of Human Genes and Genetic Disorders

** Week 11: (November 15) Membrane structure and transport; A look at future possibilities. Half hour exam, covering Alberts Chapters 6, 8, 9, and 10.

Readings: Alberts: Chapter 12. Membrane Transport

Readings: NCBI Handbook: 18. LocusLink: A Directory of Genes

Project 6: Get the Genbank mRNAs BC027933, Y15075, NM_002754, and BC063029. These are 4 very similar proteins. Write software to align them (you can simply translate the 'align_by_mers' procedure from genetic_code_pak.stl into a convenient programming language.) Submit the resulting alignments along with your software. Us Blast to find other organisms containing similar proteins, and submit a summary of the results obtained.

Advanced Project: Find all the complete mitochondrial genomes in species other than human which are not included in the list collected last year by the instructor. Find the mitohoondrial gene order on these genes.

** Week 12: (November 22) The cytoskeleton; Intracellular transport, motor molecules

Readings: Alberts: Chapter 16. Cell Communication

Readings: Alberts: Chapter 17. The Cytoskeleton

Readings: NCBI Handbook: 17. The Reference Sequence (RefSeq) Project

** Week 13: (November 29) Intercell communication, Mitosis, apoptosis, and telomeres

Readings: Alberts: Chapter 15. Intracellular compartments and transport

Readings: Alberts: Chapter 18. Cell cycle control and cell death

Readings: UCSC Genome Browser User Guide: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html; see especially the 'Blat' genome searcher.

** Week 14: (December 6) Miscellaneous topics, Review.

Readings: Alberts: Chapter 19. Cell division.

Final Examination to follow..

All project submissions due one day before the final examination.