CS 690: Computational Genomics
Pre-Requisites:
Familiarity with algorithms and data structures, probability and statistics and computer programming. Knowledge of machine learning is also helpful for the course.
About the Course
Computational genomics is a novel and very active application field of computer science where biological mechanisms are deciphered from genome sequencing data using computational and statistical analyses. In the past twenty years, an explosion of genomic data (from human and several other organisms) has revolutionized a number of subfields of biology – cell and molecular biology, developmental biology, disease biology and so on. Computer science plays a central role in genomics – from sequencing and assembling of DNA and RNA sequences to analyzing genomes (or transcriptomes) for elucidating diverse biological mechanisms through innovations in machine learning, data structure and algorithms. In this course, you will be introduced to some of the most seminal machine learning and algorithmic approaches for sequence analysis as well as the most recent advances in the field. The course will be structured as a combination of lectures and discussion of recent publications in the field. The lectures will introduce the topics and seminal algorithms followed by research paper discussions on advanced and most recent developments.
Tentative Topics
- Fundamentals of Biological Sequence Analysis: Sequence alignment algorithms, Hidden Markov Models (HMMs) and modeling of biological sequences
- Genome-Scale Index Structures: Suffix array, Suffix tree, Burrows-Wheeler transform (BWT), BWT index and applications
- Genome-Scale Short Read Alignment: Dynamic programming along suffix tree paths, short read alignment algorithms
- Genome Assembly Algorithms: De Bruijn graphs
- Variant Calling Algorithms: Probabilistic dynamic programming
- Transcriptomics: Gene expression analysis, normalization, differential expression analysis, clustering
- Single-cell omics: Dimension reduction algorithms, Generative processes, Manifold learning, Trajectory inference, Regulatory networks
- Cancer Genomics: Probabilistic graphical models for tumor heterogeneity analysis, somatic mutation detection algorithms, driver mutation detection, matrix factorization problems in cancer genomics
- Deep Learning in Genomics: ChIP–seq data, transcription factor binding sites, Graph-convolutional neural networks, transfer learning
- Phylogenetics: Markov models of molecular evolution, character-based phylogeny algorithms – maximum parsimony, maximum likelihood and Bayesian inference
Recommended Books
- "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids", by Durbin et al., Cambridge University Press.
- "Understanding Bioinformatics", by M. Zvelebil and J.O. Baum. Published by Garland Science, 2008.
- "Bioinformatics algorithms: an active learning approach", by Phillip Compeau and Pavel Pevzner. Published by Active Learning Pub.
- "Inferring Phylogenies", by Joseph Felsenstein.
- "Genome-Scale Algorithm Design", by V. Makinen, D. Belazzougui, F. Cunial and A. Tomescu, Cambridge University Press, 2015.
The above books are recommended but not required. In addition, a number of research papers will be discussed.