Medha Atre
Assistant Professor
KD-219, Computer Science and Engineering
Indian Institute of Technology, Kanpur
atrem at cse.iitk.ac.in or firstname.lastname at gmail

CS698F - Advanced Data Management (Fall 2017)

About

Email: add [CS698F] to the subject, else ignored.
Class location: Wed & Fri 15:30--17:00 @ KD-314
Office hours: By appointment.

Announcements

  • Answersheets showing 19-Nov-2017, 20:00 -- 20:30 @ KD314.
  • End semester exam 18-Nov-2017, 16:00 -- 19:00 @ KD314.
  • Course project presentation and demo 15, 17 Nov, 2017 in class.
  • Course project report and code due 14-Nov-2017, 23:59.
  • Assignment-2 presentations 1, 3 Nov 2017 in class.
  • Assignment-2 paper selection due 20-Oct-2017, 23:59.
  • Course project topic with brief proposal due by Aug 25, 11:59pm by email.
  • Assignment-1 papers and topics due by Aug 30, 11:59pm by email.
    • Sept 6, 8 class presentations.
  • Class location changed from KD 101 to KD 314.

Description

With the growth of the web, problems surrounding "big data" have become central to many of the "industrial strength" solutions (which tackle the scale of the data that goes beyond the capabilities of prototypical solutions, e.g., billions of data items, such as graphs, images, videos, documents etc). A lot of this data is semi-structured (graphs, e.g., Facebook, Twitter, LinkedIn networks) or unstructured (videos, images, text documents etc), or a mixture of the semi-structured and unstructured data. Hence the challenges of storage and query processing over this data are different from the traditional relational database systems which focused on strictly structured data, even though many of the robust database features of storage and indexing are utilized as the core base of the new solutions.

Following are the broad objectives of this course:
  • This course will first take an overview of the traditional data management and query optimization techniques.
  • Then it will focus on methods used for query processing over mainly "graph shaped" data, including centralized and distributed solutions (Hadoop, SPARK, and others).
  • We will read research papers from the top conferences.
  • The course will carry a large project component.
  • The instructor will also introduce open (challenging) problems in both theoretical as well as practical domains of the "graph data management", to give directions of further research -- with a purely theoretical, or practical, or a combination focus -- especially useful for developing a postgraduate research plan.

Prerequisites

UG level course in DBMS, knowledge of UG level data structures and algorithms. Good knowledge of programming.

Grading

  • Assignments (2 nos) -- 20%
    • Assignment-1: first week of September
    • Assignment-2: last week of October
  • Mid-semester: 20%
    • Presentation of literature survey for course project and course project intermediate demo
  • Course project implementation: 30%
  • Course project report: 10%
  • End-semester (in-class written): 20%
    • Questions on papers read through the semester and topics covered in the class

References

There is no dedicated textbook or reference material. Lecture slides/notes will cover the required material, and the instructor will provide pointers to any other material that is needed to be read by the students.

Schedule

Day, Date Topic Notes Readings/References
Wed, Aug 02 Introduction Lecture1
Fri, Aug 04 Recap Lecture2
Wed, Aug 09 Relational Algebra and Query rewriting Lecture3 Relevant chapters in std DBMS book.
Fri, Aug 11 Query plans and Indexes Lecture4 Relevant chapters in std DBMS book.
Wed, Aug 16 Types of Joins, Cost estimation Lecture5 Relevant chapters in std DBMS book.
Fri, Aug 18 Cost estimation remaining, Intro to graphs Lecture6
Wed, Aug 23 Graph pattern queries Lecture7
Fri, Aug 25 Graph pattern queries continued Lecture8
Wed, Aug 30 Cyclicity, Acyclicity, Compression Lecture9 BitMat, RDF3X, Compressed bitmaps, Semi-joins proofs
Fri, Sept 1 Finish Q proc, Intro to Distributed Data Management Lecture10
Wed, Sept 6 Distributed Data Management, Assign-1 presentation Lecture11 Presentations
Fri, Sept 8 Distributed Data Management.. contd, Assign-1 presentation Lecture12 Presentations
Wed, Sept 13 Assign-1 remaining presentations Presentations
Fri, Sept 15 Distributed joins.. contd Lecture13
Wed, Sept 20 Midsem exam presentations Presentations
Wed, Oct 4 Recap of things till midsem Lecture14
Fri, Oct 6 Cancelled (instructor unwell)
Wed, Oct 11 Map Reduce framework Lecture15
Wed, Oct 13 Joins with Map Reduce Lecture16
Wed, Oct 18 Multi-Joins with Map Reduce
Fri, Oct 20 Special topic -- Reachability queries Lecture18
Fri, Oct 25 Special topic -- Reachability queries contd... Lecture19
Wed, Nov 1 Assignment-2 presentations Presentations
Fri, Nov 3 Assignment-2 presentations Presentations
Wed, Nov 8 Keyword search on database/graphs Lecture20
Fri, Nov 10 Review
Wed, Nov 15 Course proj pres and demo
Fri, Nov 17 Course proj pres and demo
Sat, Nov 18 Endsem Exam @ KD314 16:00 -- 19:00




  • About
  • Announcements
  • Description
  • Pre-req
  • Grading
  • Schedule
  • Back to home
@Copyright Contents of this website cannot be copied without priot permission.