Seminar by Mayur Datar
Algorithms for Data Stream Systems
Mayur Datar
Department of Computer Science
Stanford University
California, USA
Date: Thursday, September 11, 2003
Time: 4:00 PM
Venue: CS-101
Abstract
In a growing number of information processing applications, data takes the form of "continuous data streams" rather than traditional stored databases. These applications share several distinguishing features like the need for real time analysis, huge volumes of data, and unpredictable and bursty data arrivals. These applications have spawned a considerable and growing body of research into data stream processing, ranging from algorithms for data streams to full-fledged data stream systems. In this talk, I will present some of my work in this area. For this presentation, we will focus on some issues related to the design of a general purpose data stream management system (DSMS). We will look at two runtime resource allocation issues, namely operator scheduling and load shedding. The problem of operator scheduling is to design a strategy that decides at every time instant what is the next job (operator) to schedule for execution on the processor. We will present an almost optimal scheduling strategy called "Chain" that is designed to minimize the memory requirement of the system. In the second part of the talk I will discuss techniques for intelligently dropping unprocessed data (tuples) from the system, i.e. load shedding, so as to reduce the load on the system during moments of high volumes of input data. We present an algorithm for load shedding that minimizes the inaccuracy in query results, while at the same time making sure that the load on the system is below the required threshold.
About the speaker
Mayur Datar is a PhD student in the theory group of the computer science department at Stanford university. He will be receiving his PhD degree by the end of Sept 2003. He obtained his Bachelor of Technology degree from the Indian Institute of Technology, Bombay and a M.S. degree from Stanford university. His research interests include design and analysis of algorithms in applied areas like data bases and data mining.