Class hours: MF 8-9 AM Tu 9-10 AM in KD 101
Name | |
Swarnendu Biswas | swarnendu@cse.iitk.ac.in |
Name | |
Srinjoy Sarkar | srinjoys23@cse.iitk.ac.in |
Sahil Basia | sahilbasia24@cse.iitk.ac.in |
Nayan Das | nayand24@cse.iitk.ac.in |
Sangharsh Nagdevte | sangharsh@cse.iitk.ac.in |
To obtain good performance, one needs to write correct but scalable parallel programs using programming language abstractions like threads. In addition, the developer needs to be aware of and utilize many architecture-specific features like vectorization to extract the full performance potential. This course will discuss programming language abstractions with architecture-aware development to learn to write scalable parallel programs. This is not a "programming tips and tricks" course.
We will have 4-6 assignments to use the concepts learned in class and appreciate the challenges in extracting performance.
Prerequisites |
|
The course will focus on a subset of the following topics.
The following is a tentative allocation and might change slightly depending on the strength of the class. Grading is relative.
Assignments | 40% |
Midsem | 30% |
Endsem | 30% |
I am open to feedback about the course content and presentation. Feel free to provide suggestions for improvements.
Date | Topic | Resources | Recommended Reading |
---|---|---|---|
First course handout | FCH | ||
01/08, 04/08 | Compiler Challenges for Parallel Architectures | Slides | AK 1.1-1.6 |
05/08, 08/08, 09/08 | Cache Memory | Slides |
HP APP B.1-B.4, 2.1--2.3 CSAPP 6.2-6.4 |
11/08, 12/08 | Write Cache-Friendly Code | Slides |
CSAPP 6.5-6.6 DRAG 11.1-11.2 |
12/08 | PAPI Library | Slides | |
18/08, 19/08, 22/08 | Cache Coherence and False Sharing | Slides | MCM Chapters 2, 6, 8 (IITK has subscribed to the ebook) |
23/08, 25/08, 26/08 | Shared-Memory Synchronization | Slides |
MP 2.3, 2.4, 2.6, 7.1-7.5, 8.3
SMS 4.1, 4.2, 4.3.1, 6.1 |
Dependence Testing | Slides |
AK Chap 2, 3
DRAG 11.6 |
|
Loop Transformations |
AK 5.2-5.4, 5.7.2, 5.9, 6.2.1, 6.2.2, 6.2.5,
6.3.1-6.3.4 AP 4.1, 4.2, 4.5, 5.1-5.6 HP 4.5 Compiler Transformations for High-Performance Computing |
||
Vectorization |
HP 4.1-4.2 Program Optimization Through Loop Vectorization Topics in Loop Vectorization |
||
OpenMP |
PP Chapter 5 (IITK has subscribed to the ebook) LLNL OpenMP Tutorial Introduction to OpenMP - Tim Mattson (Intel) OpenMP Application Programming Interface v5.2 OpenMP Application Programming Interface Examples v5.2 |
||
GPU Architecture and CUDA Programming |
NVIDIA CUDA C
Programming Guide NVIDIA CUDA C Best Practices Guide KH Chapters 1-5,13,20 (IITK has subscribed to the ebook) HP 4.4 |
||
Concurrent Data Structures | MP Chapters 3, 9, 10, 13 (IITK has subscribed to the ebook) |
[CSAPP] | Computer Systems: A Programmer's Perspective, 3rd edition - R. Bryant and D. O'Hallaron |
[DRAG] | Compilers: Principles, Techniques, and Tools - A. Aho, M. Lam, R. Sethi, and J. Ullman |
[HP] | Computer Architecture: A Quantitative Approach, 6th edition - J. Hennessy and D. Patterson |
[AK] | Optimizing Compilers for Modern Architectures - R. Allen and K. Kennedy |
[SMS] | Shared-Memory Synchronization, 2nd edition - M. Scott and T. Brown. |
[AP] | Automatic Parallelization: An Overview of Fundamental Compiler Techniques - Samuel P. Midkiff |
[PP] | An Introduction to Parallel Programming - Peter S. Pacheco |
[KH] | Programming Massively Parallel Processors: A Hands-on Approach, 3rd edition - David B. Kirk and Wen-mei W. Hwu |
[MCM] | A Primer on Memory Consistency and Cache Coherence, 2nd edition - Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill and David A. Wood |
[MP] | The Art of Multiprocessor Programming, 1st edition - Maurice Herlihy and Nir Shavit |