CS698F - Advanced Data Management

Fall 2016


About | Announcements | Description | Pre-req | Grading | References | Schedule | © Copyright

About

Instructor: Medha Atre
Office: KD-219
Teaching Assistant: Priyank Agarwal
Email: a t r e m @ cse . iitk . ac . in
(add [CS698F] to the subject, else ignored.)
Class location: Mon & Thus 10:30am--12:00pm @ KD-102
Office hours: Mon & Thus 12:00pm--1:00pm (or by prior appointment).

Important Announcements:

Background and Course Description

With the growth of internet and Web 2.0 (user generated content e.g., Facebook, Twitter, Wikipedia) the amount of data to be handled has exploded in size. The data has no longer remained in the purely relational format (as it used to be). This data is mainly "graph shaped", which does not have a strict structure like relational data. Hence the challenges of storage and query processing over this data are different from the relational database systems, and this course will cover important topics in the storage and querying of this "graph shaped" data.

At the same time, the management of relational data has also come a long way as relational databases are often used to store these large graphs, and do query processing over them. The course will cover various methods, tools, and techniques that are currently used to handle the graph data and do query processing over it. The tools that will be covered will include (but are not limited to) Apache Hadoop, Apache SPARK, Neo4j graph database, MonetDB, Virtuoso relational databases etc. We will also be reading and understanding cutting-edge research papers in the top data management conferences such as SIGMOD, VLDB, and ICDE. The topics will be more "data" focused and not area focused. Students are encouraged to bring ideas from other areas into the course, as well as apply methods learnt in the course to the other areas.

In the end, this course will give students knowledge of the cutting-edge data management techniques (e.g., used in industries or big research labs). It will give good hands on programming experience as a part of the course project and practical aspects of the big data management. Additionally, introduction to the open (challenging) problems, theoretical as well as practical, can give directions of further research for the interested students.

Pre-requisites

UG level course in Database management, knowledge of SQL queries. Good knowledge of programming. Basic understanding of computational complexity. Some knowledge of computer networks is good but not a must.

Grading

Reference materials

There is no dedicated textbook or reference material. Lecture slides/notes will cover the required material, and the instructor will provide pointers to any other material that is needed to be read by the students.

Academic Integrity

The CSE department academic integrity (anti-cheating) policy applies to all the registered students of this course.

Schedule

Day, Date Topic Slides/Notes Assignments Deadlines Readings/References
Thus, July 28 Introduction Lecture-1
Mon, Aug 01 Graph pattern queries with BitMat Lecture-2 Assignment-1 BitMat details, From any database systems book or online resources, read about basic query cost estimation and optimization techniques. Semi-joins, About bitmap compression methods
Thus, Aug 04 BitMat details continued Lecture-3
Mon, Aug 08 Recap of BitMat algorithm, start of overview of contemporary systems Lecture-4 RDF-3X, Virtuoso, gStore, TripleBit, N-way-multi joins
Thus, Aug 11 Overview of RDF-3X and TripleBit Lecture-5
Mon, Aug 15 No class, institute holiday. Assignment-1 due 23:59 IST
Thus, Aug 18 Review/Recap
Mon, Aug 22 Introduction to the distributed systems Lecture-7 Project groups due 18:00 IST
Thus, Aug 25 No class, institute holiday.
Mon, Aug 29 Data distribution over Hadoop Lecture-8
Thus, Sept 1 Join query processing over Hadoop/P2P networks Lecture-9
Mon, Sept 5 No class, instructor away.
Thus, Sept 8 No class, instructor away. Assignment-2 due 23:59 IST
Mon, Sept 12 No class, mid-sem exam. Course project proposal due 23:59 IST
Thus, Sept 15 No class, mid-sem exams.
Sun, Sept 18 Mid-sem SOTA presentations. 13:00 -- 17:00 at KD102
Mon, Sept 19 Review: tying it all together. SOTA doc and proj proposals due 23:59 IST
Thus, Sept 22 Special topics in graph processing (Intro to reachability) Lecture-11
Mon, Sept 26 Graph Reachability part 1 Lecture-12 Read the PVLDB 2014 paper and all the approaches covered in its Related Work section (Section 3).
Thus, Sept 29 Graph Reachability part 2
Mon, Oct 3 Regular path queries part 1 Lecture-14 Tarjan 1979, Mendelzon Wood 1995, Abiteboul Vianu 1997, Milo Suciu 1999, PODS 2013, J.ACM 2014
Thus, Oct 6 Regular path queries part 2 Lecture-15 SIGMOD 2010, ICDE 2011, SSDBM 2012, (additional topics) PODS 2003, ICDT 2003, TCS DB theory 2003, ICDE 2000, ICDE 1998
Mon, Oct 10 No class, mid-sem recess.
Thus, Oct 13 No class, mid-sem recess.
Mon, Oct 17 Keyword search over graphs part 1 Lecture-16 Assignment-3 due 23:59 IST
Thus, Oct 20 Keyword search over graphs part 2 Lecture-17
Mon, Oct 24 Different Graph indexing methods part 1 Lecture-18
Mon, Oct 27 Different Graph indexing methods part 2 Lecture-19
Mon, Oct 31 Advanced topics in query optimization
Thus, Nov 3 Review, Q&A, discussion
Fri, Nov 4 N.A. Assignment-4 due 23:59 IST
Mon, Nov 7 Course project presentations 1
Thus, Nov 10 Course project presentations 2
Sat, Nov 26 Project code and report due 23:59 IST
Sun, Nov 27 Project demo (schedule TBA)

Copyright

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers, or to redistribute to lists, requires prior specific permission.