Programming for Performance

CS 610, Semester 2024-25-I, IIT Kanpur

Class hours: MF 8-9 AM Tu 9-10 AM in KD 103


Instructor Information

Name Swarnendu Biswas
Email swarnendu AT cse.iitk.ac.in

TA Information

Name Email (AT cse.iitk.ac.in)
Srinjoy Sarkar srinjoy

Course Description

To obtain good performance, one needs to write correct but scalable parallel programs using programming language abstractions like threads. In addition, the developer needs to be aware of and utilize many architecture-specific features like vectorization to extract the full performance potential. This course will discuss programming language abstractions with architecture-aware development to learn to write scalable parallel programs. This is not a “programming tips and tricks” course.

We will have 4-6 assignments to use the concepts learned in class and appreciate the challenges in extracting performance.

Prerequisites
  • Exposure to the following courses (or equivalent) is desirable: CS220 (Computer Organization), and CS330 (Operating Systems).
  • Programming maturity with popular programming languages like C, C++, and Java.

   Course Syllabus and Policies   |   Academic Integrity   |   Evaluation Scheme   |   Resources   |   References   


Course Syllabus and Policies

Syllabus

The course will focus on a subset of the following topics.

We may add new, drop existing, or reorder topics depending on progress and class feedback. The course may also involve reading and critiquing related research papers.

Policies

Feedback

I am open to feedback about the course content and presentation. Feel free to provide suggestions for improvements.


Academic Integrity


Evaluation Scheme

Assignments 40%
Midsem 30%
Endsem 30%

Resources

Date Topic Resources Recommended Reading
First Course Handout
30/07, 02/08 Compiler Challenges for Parallel Architectures Slides AK 1.1-1.6
05/08, 06/08, 09/08 Cache Memory Slides HP APP B.1-B.4, 2.1--2.3
CSAPP 6.2-6.4
12/08, 13/08 Write Cache-Friendly Code Slides CSAPP 6.5-6.6
DRAG 11.1-11.2
16/08 PAPI Library Slides
19/08, 20/08, 23/08, 27/08, 30/08 Cache Coherence and False Sharing Slides MCM Chapters 2, 6, 8 (IITK has subscribed to the ebook)
Dependence Testing Slides AK Chap 2, 3
DRAG 11.6
Loop Transformations AK 5.2-5.4, 5.7.2, 5.9, 6.2.1, 6.2.2, 6.2.5, 6.3.1-6.3.4
AP 4.1, 4.2, 4.5, 5.1-5.6
HP 4.5
Compiler Transformations for High-Performance Computing
Vectorization HP 4.3
Intel oneAPI DPC++/C++ Compiler Developer Guide and Reference
Intel C++ Compiler Classic Developer Guide and Reference
Guide for Intel Compilers
OpenMP OpenMP Application Programming Interface v5.2
OpenMP Application Programming Interface Examples v5.2.1
PP Chapter 5 (IITK has subscribed to the ebook)
LLNL OpenMP Tutorial
GPU Programming with CUDA NVIDIA CUDA C Programming Guide
NVIDIA CUDA C Best Practices Guide
KH Chapters 1-5 (IITK has subscribed to the ebook)
HP 4.3
Concurrent Data Structures MP Chapters 9, 10, 13 (IITK has subscribed to the ebook)


References

I have listed (NOT in any particular order) a few popular references. We may read and discuss related materials and research papers, which we will announce in class.