Introduction: Why parallel computing; Ubiquity of parallel hardware/multi-cores; Processes and threads; Programming models: shared memory and message passing; Speedup and efciency; Amdahls Law.
Introduction to parallel hardware: Multi-cores and multiprocessors; shared memory and message passing architectures; cache hierarchy and coherence; sequential consistency.
Introduction to parallel software: Steps involved in developing a parallel program; Depen- dence analysis; Domain decomposition; Task assignment: static and dynamic; Performance issues: 4C cache misses, inherent and artifactual communication, false sharing, computation-to-communication ratio as a guiding metric for decomposition, hot spots and staggered communication.
Shared memory parallel programming: Synchronization: Locks and barriers; Hardware primitives for efcient lock implementation; Lock al- gorithms; Relaxed consistency models; High-level language memory models (such Java and/or C++); Memory fences. Developing parallel programs with UNIX fork model: IPC with shared memory and message pass- ing; UNIX semaphore and its all-or-none semantic. Example case studies (see note below for some details). Developing parallel programs with POSIX thread library: Thread creation; Thread join; Mutex; Condition variables. Example case studies (see note below for some details). Developing parallel programs with OpenMP directives: Parallel for; Parallel section; Static, dy- namic, guided, and runtime scheduling; Critical sections and atomic operations; Barriers; Reduction. Example case studies (see note below for some details).
Message passing programming: Distributed memory model; Introduction to message passing interface (MPI); Synchronization as Send/Recv pair; Synchronous and asynchronous Send/Recv; Collective communication: Reduce, Broadcast, Data distribution, Scatter, Gather; MPI derived data types. Example case studies (see note below for some details).
Introduction to GPU programming: GPU architecture; Introduction to CUDA programming; Concept of SIMD and SIMT computation; Thread blocks; Warps; Global memory; Shared memory; Thread divergence in control transfer; Example case studies (see note below for some details).
Additional topics: PGAS and APGAS programming paradigms; Transactional memory paradigm; Introduction to speculative parallelization.
** Notes **: