CSE - IIT Kanpur

CS 690: Computational Genomics

Pre-Requisites:

Familiarity with algorithms and data structures, probability and statistics and computer programming. Knowledge of machine learning is also helpful for the course.

About the Course

Computational genomics is a novel and very active application field of computer science where biological mechanisms are deciphered from genome sequencing data using computational and statistical analyses. In the past twenty years, an explosion of genomic data (from human and several other organisms) has revolutionized a number of subfields of biology – cell and molecular biology, developmental biology, disease biology and so on. Computer science plays a central role in genomics – from sequencing and assembling of DNA and RNA sequences to analyzing genomes (or transcriptomes) for elucidating diverse biological mechanisms through innovations in machine learning, data structure and algorithms. In this course, you will be introduced to some of the most seminal machine learning and algorithmic approaches for sequence analysis as well as the most recent advances in the field. The course will be structured as a combination of lectures and discussion of recent publications in the field. The lectures will introduce the topics and seminal algorithms followed by research paper discussions on advanced and most recent developments.

Tentative Topics

Fundamentals of Biological Sequence Analysis: Sequence alignment algorithms, Hidden Markov Models (HMMs) and modeling of biological sequences
Genome-Scale Index Structures: Suffix array, Suffix tree, Burrows-Wheeler transform (BWT), BWT index and applications
Genome-Scale Short Read Alignment: Dynamic programming along suffix tree paths, short read alignment algorithms
Genome Assembly Algorithms: De Bruijn graphs
Variant Calling Algorithms: Probabilistic dynamic programming
Transcriptomics: Gene expression analysis, normalization, differential expression analysis, clustering
Single-cell omics: Dimension reduction algorithms, Generative processes, Manifold learning, Trajectory inference, Regulatory networks
Cancer Genomics: Probabilistic graphical models for tumor heterogeneity analysis, somatic mutation detection algorithms, driver mutation detection, matrix factorization problems in cancer genomics
Deep Learning in Genomics: ChIP–seq data, transcription factor binding sites, Graph-convolutional neural networks, transfer learning
Phylogenetics: Markov models of molecular evolution, character-based phylogeny algorithms – maximum parsimony, maximum likelihood and Bayesian inference

Recommended Books

"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids", by Durbin et al., Cambridge University Press.
"Understanding Bioinformatics", by M. Zvelebil and J.O. Baum. Published by Garland Science, 2008.
"Bioinformatics algorithms: an active learning approach", by Phillip Compeau and Pavel Pevzner. Published by Active Learning Pub.
"Inferring Phylogenies", by Joseph Felsenstein.
"Genome-Scale Algorithm Design", by V. Makinen, D. Belazzougui, F. Cunial and A. Tomescu, Cambridge University Press, 2015.

The above books are recommended but not required. In addition, a number of research papers will be discussed.

CS 690: Computational Genomics

Pre-Requisites:

About the Course

Tentative Topics

Recommended Books

People

Resources

Programs

Admissions

Department

Research