CSE - IIT Kanpur

CS 610: Programming for Performance

Credits: 3-0-0-0-[9]

Prerequisite:

Exposure to CS 210 (Computer Organization), CS 330 (Operating Systems), CS 335 (Compiler Design), and CS 422 (Computer Architecture) (or equivalent for non-IITK courses) is desirable.
Programming maturity (primarily C/C++/Java) is desirable.

Who can take the course:

PhD, MTech, MS(R), 4th year UG

Course Description

Course Objective

To obtain good performance, one needs to write correct but scalable parallel programs using programming language abstractions like threads. In addition, the developer needs to be aware of and utilize many architecture-specific features like vectorization to extract the full performance potential. In this course, we will discuss programming language abstractions with architecture-aware development to learn to write scalable parallel programs.

This course will involve programming assignments to use the concepts learnt in class and appreciate the challenges in extracting performance.

Course Contents

The course will primarily focus on the following topics:

Introduction: Challenges in parallel programming, correctness and performance errors, understanding performance, performance models
Exploiting spatial and temporal locality with caches, analytical cache miss analysis
Compiler transformations: Dependence analysis, Loop Transformations
Shared-memory programming and Pthreads
Compiler vectorization: vector ISA, auto-vectorizing compiler, vector intrinsics, assembly
OpenMP: Core OpenMP, Advanced OpenMP, Heterogeneous programming with OpenMP
Parallel Programming Models and Patterns
Intel Threading Building Blocks
GPGPU programming: GPU architecture and CUDA Programming
Performance bottleneck analysis: PAPI counters, Using performance analysis tools

Optional topics

Heterogeneous Programming with OpenMP
Fork-Join Parallelism
Concurrent data structures
Shared-memory synchronization
Memory consistency models
Transactional memory

References

Optimizing Compilers for Modern Architectures - R. Allen and K. Kennedy
Automatic Parallelization: An Overview of Fundamental Compiler Techniques - Samuel P. Midkiff
An Introduction to Parallel Programming - Peter S. Pacheco
Parallel Computer Architecture: A Hardware/Software Approach - D. Culler, J, Singh with A Gupta
Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism - J. Reindeers
Programming Massively Parallel Processors: A Hands-on Approach - David B. Kirk and Wen-mei W. Hwu
The Art of Multiprocessor Programming - Maurice Herlihy and Nir Shavit
Introduction to Parallel Computing - A. Grama, A Gupta, G Karypis, and V Kumar
Relevant papers

CS 610: Programming for Performance

Credits: 3-0-0-0-[9]

Prerequisite:

Who can take the course:

Course Description

Course Objective

Course Contents

Optional topics

References

People

Resources

Programs

Admissions

Department

Research