ARNAB BHATTACHARYA
Areas of Research Interest: Databases, Data Mining, Information Retrieval, Natural
Language Processing, Artificial Intelligence.
Experience:
- Associate Professor, Dept. of Computer Science and Engineering,
Indian Institute of Technology (IIT), Kanpur, India. June 2014 -
present.
- Assistant Professor, Dept. of Computer Science and Engineering,
Indian Institute of Technology (IIT), Kanpur, India. December 2007 -
June 2014.
- Project Scientist, Dept. of Computer Science, University of
California, Santa Barbara, USA. September 2007 - November 2007.
- Software Design Engineer, Texas Instruments (India) Ltd.,
Bangalore, India. July 2001 -
July 2002.
Education:
- Ph.D. in Computer Science, Dept. of Computer Science,
University of California, Santa Barbara, USA. 2007.
- M.S. in Computer Science, Dept. of Computer Science, University
of California, Santa Barbara, USA. 2007.
- Bachelor of Computer Science and Engineering (B.C.S.E.),
Jadavpur University, India. 2001.
Books:
- “Fundamentals of Database Indexing and Searching”. Arnab
Bhattacharya. CRC Press, 2014.
Selected Publications:
- Framework for Question-Answering in Sanskrit through Automated
Construction of Knowledge Graphs. Hrishikesh Terdalkar, Arnab
Bhattacharya. 6th International Sanskrit Computational
Linguistics Symposium (ISCLS), 2019, to appear, Kharagpur, India.
- TIPS: Mining Top-K Locations to Minimize User-Inconvenience for
Trajectory-Aware Services. Shubhadip Mitra, Priya Saraf, Arnab
Bhattacharya. IEEE Transactions on Knowledge and Data Engineering
(TKDE), 2019, to appear.
- RAQ: Relationship-Aware Graph Querying in Large Networks.
Jithin Vachery, Akhil Arora, Sayan Ranu, Arnab Bhattacharya. International World Wide Web Conference (WWW), 2019, pages 1886-1896,
San Francisco, USA.
- HD-Index: Pushing the Scalability-Accuracy Boundary for
Approximate kNN Search in High-Dimensional Spaces. Akhil Arora,
Sakshi Sinha, Piyush Kumar, Arnab Bhattacharya. Proceedings of
the VLDB Endowment (PVLDB), 2018, 11(8), pages 906-919.
- Finding Largest Rectangle inside a Digital Object and
Rectangularization. Apurba Sarkar, Arindam Biswas, Mousumi Dutt,
Arnab Bhattacharya. Journal of Computer and System Sciences,
2018, 95, pages 204-217.
- Image Management for Biological Data. Arnab Bhattacharya,
Vebjorn Ljosa. Book chapter in Encyclopedia of Database Systems
(2nd Edition) edited by L. Liu and M. T. Ozsu. Springer, 2018.
- MineAr: Using Crowd Knowledge for Mining Association Rules in
the Health Domain. Milan Someswar, Arnab Bhattacharya. ACM
Joint International Conference on Data Science & Management of Data
(CoDS-COMAD), 2018, pages 108-117, Goa, India.
- Finding Shell Company Accounts using Anomaly Detection.
Devendra K. Luna, Girish K. Palshikar, Manoj Apte, Arnab
Bhattacharya. ACM Joint International Conference on Data Science
& Management of Data (CoDS-COMAD), 2018, pages 167-174, Goa, India.
- Tracking the Impact of Fact Deletions on Knowledge Graph Queries
using Provenance Polynomials. Garima Gaur, Srikanta J. Bedathur.
International Conference on Information and Knowledge Management
(CIKM), 2017, pages 2079-2082, Singapore.
- SkyGraph: Retrieving Regions of Interest using Skyline Subgraph
Queries. Shiladitya Pande, Sayan Ranu, Arnab Bhattacharya. Proceedings of the VLDB Endowment (PVLDB), 2017, 10(11), pages
1382-1393.
- NetClus: A Scalable Framework for Locating Top-K Sites for
Placement of Trajectory-Aware Services. Shubhadip Mitra, Priya Saraf,
Richa Sharma, Arnab Bhattacharya, Sayan Ranu, Harsh Bhandari. International Conference on Data Engineering (ICDE), 2017, pages 87-90,
San Diego, USA.
- K-Dominant Skyline Join Queries: Extending the Join Paradigm to
K-Dominant Skylines. Anuradha Awasthi, Arnab Bhattacharya, Sanchit
Gupta, Ujjwal K. Singh. International Conference on Data
Engineering (ICDE), 2017, pages 99-102, San Diego, USA.
- Neighbor-Aware Search for Approximate Labeled Graph Matching
using the Chi-Square Statistics. Sourav Dutta, Pratik Nayek, Arnab
Bhattacharya. International World Wide Web Conference (WWW),
2017, pages 1281-1290, Perth, Australia.
- Automatic Grading and Feedback using Program Repair for
Introductory Programming Courses. Sagar Parihar, Ziyaan Dadachanji,
Praveen Kumar Singh, Rajdeep Das, Amey Karkare, Arnab Bhattacharya.
ACM Conference on Innovation and Technology in Computer Science
Education (ITiCSE), 2017, pages 92-97, Bologna, Italy.
- GARUDA: A System for Large-Scale Mining of Statistically
Significant Connected Subgraphs. Satyajit Bhadange, Akhil Arora, Arnab
Bhattacharya. Proceedings of the VLDB Endowment (PVLDB), 2016,
9(13), pages 1449-1452.
- SMS: Stable Matching Algorithm using Skylines. Rohit Anurag,
Arnab Bhattacharya. International Conference on Scientific and
Statistical Database Management (SSDBM), 2016, pages 24:1-24:4,
Budapest, Hungary.
- SkyCover: Finding Range-Constrained Approximate Skylines with
Bounded Quality Guarantees. Shubhendu Aggarwal, Shubhadip Mitra,
Arnab Bhattacharya. International Conference on Management of
Data (COMAD), 2016, pages 1-12, Pune, India.
- Finding Largest Rectangle inside a Digital Object. Apurba
Sarkar, Arindam Biswas, Mousumi Dutt, Arnab Bhattacharya. Computational Topology in Image Context (CTIC), 2016, pages 157-169,
Marseille, France.
- Probabilistic Aggregate Skyline Join Queries: Skylines with
Aggregate Operations over Existential Uncertain Relations. Arnab
Bhattacharya, Shrikant Awate. International Conference on
Scientific and Statistical Database Management (SSDBM), 2015, pages
5:1-5:12, San Diego, USA.
- Trajectory Aware Macro-cell Planning for Mobile Users.
Shubhadip Mitra, Sayan Ranu, Vinay Kolar, Arnab Bhattacharya, Ravi
Kokku, Aditya Telang, Sriram Raghavan. IEEE International
Conference on Computer Communications (INFOCOM), 2015, 792-800, Hong
Kong, China.
- Generation of Random Triangular Digital Curves using
Combinatorial Techniques. Apurba Sarkar, Arindam Biswas, Mousumi
Dutt, Arnab Bhattacharya. International Conference on Pattern
Recognition and Machine Intelligence (PReMI), 2015, pages 136-145,
Warsaw, Poland.
- Using Social Connections to Improve Collaborative Filtering.
Kanish Manuja, Arnab Bhattacharya. IKDD Conference on Data
Sciences (CoDS), 2015, pages 140-141, Bengaluru, India.
- Generation of Random Digital Curves using Combinatorial
Techniques. Apurba Sarkar, Arindam Biswas, Mousumi Dutt, Arnab
Bhattacharya. Conference on Algorithms and Discrete Applied
Mathematics (CALDAM), 2015, pages 286-297, Kanpur, India.
- Mining Statistically Significant Connected Subgraphs in Vertex
Labeled Graphs. Akhil Arora, Mayank Sachan, Arnab Bhattacharya. SIGMOD International Conference on Management of Data (SIGMOD), 2014,
pages 1003-1014, Snowbird, USA.
- Efficient and Effective Route Planning in Road Networks with
Probabilistic Data using Skyline Paths. Arzoo Katiyar, Arnab
Bhattacharya, Shubhadip Mitra. IKDD Conference on Data Sciences
(CoDS), 2014, New Delhi, India.
- Emotion Recognition from Audio and Visual Data using F-score
based Fusion. Abhishek Gera, Arnab Bhattacharya. IKDD
Conference on Data Sciences (CoDS), 2014, New Delhi, India.
- RCached-tree: An Index Structure for Efficiently Answering
Popular Queries. Manash Pal, Arnab Bhattacharya, Debjyoti Paul. International Conference on Information and Knowledge Management
(CIKM), 2013, pages 1173-1176, San Francisco, USA.
- Efficient Edit Distance based String Similarity Search using
Deletion Neighborhoods. Shashwat Mishra, Tejas Gandhi, Akhil Arora,
Arnab Bhattacharya. EDBT/ICDT Workshops, 2013, pages 375-383,
Genoa, Italy.
- Hybrid HBase: Leveraging Flash SSDs to Improve Cost per
Throughput of HBase. Anurag Awasthi, Avani Nandini, Arnab
Bhattacharya, Priya Sehgal. International Conference on
Management of Data (COMAD), 2012, pages 68-79, Pune, India.
- A Plant Identification System using Shape and Morphological
Features on Segmented Leaflets: Team IITK, CLEF 2012. Akhil Arora,
Ankit Gupta, Nitesh Bagmar, Shashwat Mishra, Arnab Bhattacharya. CLEF (Online Notes/Labs/Workshop), 2012, Rome, Italy.
- Mining Statistically Significant Substrings using the Chi-Square
Statistic. Mayank Sachan, Arnab Bhattacharya. Proceedings of
the VLDB Endowment (PVLDB), 2012, 5(10), pages 1052-1063.
- Mining Statistically Significant Substrings Based on the
Chi-Square Measure. Sourav Dutta, Arnab Bhattacharya. Book chapter
in Pattern Discovery Using Sequence Data Mining: Applications and
Studies edited by P. Kumar, P. R. Krishna and S. B. Raju. IGI
Global, 2012.
- Minimally Infrequent Itemset Mining using Pattern-Growth
Paradigm and Residual Trees. Ashish Gupta, Akshay Mittal, Arnab
Bhattacharya. International Conference on Management of Data
(COMAD), 2011, pages 57-68, Bengaluru, India. (Best paper)
- Caching Stars in the Sky: A Semantic Caching Approach to
Accelerate Skyline Queries. Arnab Bhattacharya, B. Palvali Teja,
Sourav Dutta. International Conference on Database and Expert
Systems Applications (DEXA), 2011, pages 493-501, Toulouse, France.
- A Continuous Query System for Dynamic Route Planning. Nirmesh
Malviya, Samuel Madden, Arnab Bhattacharya. International
Conference on Data Engineering (ICDE), 2011, pages 792-803, Hannover,
Germany.
- Finding the Bias and Prestige of Nodes in Networks based on
Trust Scores. Abhinav Mishra, Arnab Bhattacharya. International
World Wide Web Conference (WWW), 2011, pages 567-576, Hyderabad,
India.
- Aggregate Skyline Join Queries: Skylines with Aggregate
Operations over Multiple Relations. Arnab Bhattacharya, B. Palvali
Teja. International Conference on Management of Data (COMAD),
2010, pages 15-26, Nagpur, India. (Best student paper)
- INSTRUCT: Space-Efficient Structure for Indexing and Complete
Query Management of String Databases. Sourav Dutta, Arnab
Bhattacharya. International Conference on Management of Data
(COMAD), 2010, pages 27-38, Nagpur, India.
- Simulated Evolution and Learning. Proceedings of the 8th
International Conference on Simulated Evolution and Learning (SEAL).
Co-edited by K. Deb, A. Bhattacharya, N. Chakraborti, P. Chakroborty,
S. Das, J. Dutta, S. K. Gupta, A. Jain, V. Aggarwal, J. Branke, S. J.
Louis, K. C. Tan, Springer, 2010.
- Minimum Spanning Tree on Spatio-Temporal Networks. Viswanath
Gunturi, Shashi Shekhar, Arnab Bhattacharya. International
Conference on Database and Expert Systems Applications (DEXA), 2010,
pages 149-158, Bilbao, Spain.
- Finding Top-k Similar Pairs of Objects Annotated with Terms from
an Ontology. Arnab Bhattacharya, Abhishek Bhowmick, Ambuj K. Singh.
International Conference on Scientific and Statistical Database
Management (SSDBM), 2010, pages 214-232, Heidelberg, Germany.
- Most Significant Substring Mining based on Chi-square Measure.
Sourav Dutta, Arnab Bhattacharya. Pacific-Asia Conference on
Knowledge Discovery and Data Mining (PAKDD), 2010, pages 319-327,
Hyderabad, India.
- Querying Spatial Patterns. Vishwakarma Singh, Arnab
Bhattacharya, Ambuj K. Singh. International Conference on
Extending Database Technology (EDBT), 2010, pages 418-429, Lausanne,
Switzerland.
- Image Management for Biological Data. Arnab Bhattacharya,
Vebjorn Ljosa. Book chapter in Encyclopedia of Database Systems
edited by M. T. Ozsu and L. Liu. Springer, 2009.
- On Low Distortion Embeddings of Statistical Distance Measures
into Low Dimensional Spaces. Arnab Bhattacharya, Purushottam Kar,
Manjish Pal. International Conference on Database and Expert
Systems Applications (DEXA), 2009, pages 164-172, Linz, Austria.
- FTDP-17 Mutations in Tau Alter the Regulation of Microtubule
Dynamics: An “Alternative Core” Model for Normal and Pathological Tau
Action. Adria LeBoeuf, Sasha F. Levy, Michelle Gaylord, Arnab
Bhattacharya, Ambuj K. Singh, Mary Ann Jordan, Leslie Wilson, Stuart
C. Feinstein. Journal of Biological Chemistry, 2008, 283(52),
pages 36406-36415.
- A General Modeling and Visualization Tool for Comparing Different
Members of a Group: Application to Studying Tau-Mediated Regulation of
Microtubule Dynamics. Arnab Bhattacharya, Sasha Levy, Adria LeBoeuf,
Michelle Gaylord, Leslie Wilson, Ambuj K. Singh, Stuart C. Feinstein.
BMC Bioinformatics, 2008, 9, page 339.
- Efficient Computation of Statistical Significance of Query
Results in Databases. Vishwakarma Singh, Arnab Bhattacharya, Ambuj K. Singh. International Conference on Scientific and Statistical
Database Management (SSDBM), 2008, pages 509-516, Hong Kong, China.
- MIST: Distributed Indexing and Querying in Sensor Networks using
Statistical Models. Arnab Bhattacharya, Anand Meka, Ambuj K. Singh.
International Conference on Very Large Data Bases (VLDB), 2007,
pages 854-865, Vienna, Austria.
- Indexing Spatially Sensitive Distance Measures Using
Multi-Resolution Lower Bounds. Vebjorn Ljosa, Arnab Bhattacharya,
Ambuj K. Singh. International Conference on Extending Database
Technology (EDBT), 2006, pages 865-883, Munich, Germany.
- LB-Index: A Multi-Resolution Index Structure for Images.
Vebjorn Ljosa, Arnab Bhattacharya, Ambuj K. Singh. International
Conference on Data Engineering (ICDE), 2006, pages 144-145, Atlanta,
USA.
- ViVo: Visual Vocabulary Construction for Mining Biomedical
Images. Arnab Bhattacharya, Vebjorn Ljosa, Jia-Yu Pan, Mark R. Verardo, Hyung-Jeong Yang, Christos Faloutsos, Ambuj K. Singh. International Conference on Data Mining (ICDM), 2005, pages 50-57,
Houston, USA. (One of the top five student papers)
- ProGreSS: Simultaneous Searching of Protein Databases by
Sequence and Structure. Arnab Bhattacharya, Tolga Can, Tamer Kahveci,
Ambuj K. Singh, Yuan-Fang Wang. Pacific Symposium on
Biocomputing (PSB), 2004, pages 264-275, Hawaii, USA.
Invited Talks:
- “Querying Statistically Significant Subgraphs” at the NetApp
Corporation, Bengaluru, India, 2019.
- “Graph Querying using Statistical Significance” at the Indian Institute
of Science, Engineering and Technology, Shibpur, 2019.
- “Data Mining” at the Andhra Pradesh Human Resource Development
Institute (APHRDI), 2018.
- “Trajectory Aware Service Location Problems” at the NetApp
Corporation, Bengaluru, India, 2016.
- “Mining Statistically Significant Connected Subgraphs” at the NetApp
Corporation, Bengaluru, India, 2015.
- “Mining Statistically Significant Substructures based on the
Chi-square Statistic” at IBM, New Delhi, India, 2015.
- “Mining Statistically Significant Substrings based on the Chi-square
Measure” at the NetApp Corporation, Bengaluru, India, 2014.
- “Mining Statistically Significant Substructures based on the
Chi-square Statistic” at the Indian Statistical Institute, Kolkata,
India, 2014.
- “Skylines: Databases' Answer to Multiple Preferences” at the NetApp
Corporation, Bengaluru, India, 2013.
- “Skylines: Databases' Answer to Multiple Preferences” at the Dept. of Computer Science and Engineering, Indian Institute of Technology,
Kanpur, India, 2012.
- “Finding the Bias and Prestige of Nodes in Networks based on Trust
Scores” at Yahoo! Labs, Bengaluru, India, 2011.
- “Earth Mover's Distance: An Adaptable and Universally Applicable
Distance Measure” at the Dept. of Computer Science, Andhra University,
Vishakhapatnam, India, 2010.
- “Earth Mover's Distance: An Adaptable and Universally Applicable
Distance Measure” at Tata Consultancy Services (TCS), Gurgaon, India,
2010.
- “On Earth Mover's Distance: A Spatially Sensitive Distance Measure”
at the Dept. of Computer Science, Free University of Bozen-Bolzano,
Italy, 2009.
- Popular lecture on “Game Theory” at the Business Club meeting of the
Indian Institute of Technology, Kanpur, India, 2009.
- “Distributed Indexing and Querying in Sensor Networks using
Statistical Models” at the Dept. of Computer Science, Université
Libre de Bruxelles, Belgium, 2008.
Patents:
- Multiple Criteria Decision Analysis
- US patent number US8504581B2: 2013
- India patent number INDEL20123027A: 2014
- Multiple Criteria Decision Analysis in Distributed Databases
- Global patent number WO2015104591A1: 2015
Important Courses Taught:
- Data Mining
- Indexing and Searching Techniques in Databases
- Information Retrieval
- Skyline Queries in Databases
- Data-Driven Program Analysis
- Topics in Biocomputing
- Principles of Database Systems
- Fundamentals of Computing
- Computing Laboratory
Awards, Scholarships and Certificates:
- IBM Faculty Research Award, 2014.
- Award from Yahoo! Faculty Research and Engagement Program, 2011.
- Best paper award at the International Conference on Management of Data
(COMAD), 2011 for the paper “Minimally Infrequent Itemset Mining using
Pattern-Growth Paradigm and Residual Trees”.
- Best student paper award at the International Conference on Management
of Data (COMAD), 2010 for the paper “Aggregate Skyline Join Queries:
Skylines with Aggregate Operations over Multiple Relations”.
- One of the top-five student paper awards at the International
Conference on Data Mining (ICDM), 2005 for the paper “ViVo: Visual
Vocabulary Construction for Mining Biomedical Images”.
- ICDM Student Travel Award sponsored by IBM at the International
Conference on Data Mining (ICDM), 2005 awarded to the top five student
papers.
Major Sponsored Projects:
- “Scalable Spatio-Temporal Measurement and Analysis of Air Pollution
Data for Delhi-NCR using Vehicle Mounted Sensors” under IMPRINT-II
scheme from SERB, 2019-2022.
- “NYAYA: A Legal Assistance System for Legal Experts and the Common
Man in India” from SERB, India, 2019-2022.
- “Continuous Monitoring of Sampreeti Setu (New Jubilee Bridge):
Instrumentation, Design and Health Assessment” from Eastern Railway
Zone, Indian Railways, 2018-2023.
- “Development of Optimal Eco Driving System in HEV/PHEV based on
Vehicle Environment” from KEIIT (Korea Evaluation Institute of
Industrial Technology), 2018-2021.
- “Development of Novel Materials and Methods for Removal of
Relcalcitrant Organics from Water” from Indo-Taiwan Programme in
Science and Technology, 2017-2020.
- “Provenance in Graph Databases” from IBM, India, 2016-2018.
- “A Smart Phone Based Dark Field Microscope for Point of Care (Poc)
Diagnosis of Blood Cell Disorder in Lethal Diseases” under IMPRINT-I
scheme from SERB, 2017-2020.
- “Identifying Fake Product Listings and Sellers” from Flipkart,
India, 2016-2018.
- “Mining Statistically Significant Substructures using the Chi-Square
Measure and Setup of Big Data Lab” from IBM, India, 2014-2017.
- “Extending Skyline Queries to Distributed and Uncertain Databases”
from SERB, India, 2014-2017.
- “Development of Air Quality Index (AQI) for Indian Cities” from
Central Pollution Control Board (CPCB), India, 2014-2015.
- “Deciphering the BMP Signaling Network in Developing Bone: An
Inter-disciplinary Approach Combining Bioinformatic Data Mining Tools
along with Molecular, Genetic and Developmental Biology” from DBT,
Govt. of India, 2013-2016.
- “Flash-Aware Optimizations for Columnar Databases” from NetApp
Corporation, 2011-2018.
- “Reputation Framework for Ad Networks” from Yahoo! Research,
2011-2011.
- “Data Storage and Backup Solutions” from BITCOE (BSNL IITK Telecom
Centre of Excellence), 2010-2011.
Students Advised:
- Ph.D. students: Graduated: 2, In progress: 5
- M.Tech. and M.S. students: Graduated: 78, In progress: 7
Chairs and Organizers:
- Program Chair for the ACM India Joint International Conference
on Data Science & Management of Data (CoDS-COMAD), 7th ACM IKDD CODS
and 25th COMAD, 2020.
- Organizer for FIRE 2019 AILA Track: Artificial Intelligence for
Legal Assistance at Forum for Information Retrieval (FIRE), 2019.
- Workshop Organizer of 2nd Joint International Workshop on Graph
Data Management Experiences & Systems (GRADES) and Network Data
Analytics (NDA) co-located with SIGMOD 2019.
- Workshop Organizer of 1st Workshop on Legal Data Analytics and
Mining (LeDAM) co-located with CIKM 2018.
- Workshop Organizer of 1st Joint International Workshop on Graph
Data Management Experiences & Systems (GRADES) and Network Data
Analytics (NDA) co-located with SIGMOD 2018.
- Workshop Organizer of 2nd International Workshop on Network Data
Analytics (NDA@SIGMOD) co-located with SIGMOD 2017.
- Organizer for FIRE 2017 IRLeD Track: Information Retrieval from
Legal Documents at Forum for Information Retrieval (FIRE), 2017.
- Program Chair for the 19th International Conference on
Management of Data (COMAD), 2013.
- Publication Chair for the 18th International Conference on
Management of Data (COMAD), 2012.
- Program Chair for the 8th International Conference on Simulated
Evolution and Learning (SEAL), 2010.
- Publicity Chair for the 14th Pacific-Asia Conference on
Knowledge Discovery and Data Mining (PAKDD), 2010.
Professional Activities:
- Executive Member of the Computer Society of India's (CSI)
Special Interest Group in Data (SIGDATA) since 2012.
- Review Editor in Data Mining and Management part of the journal
Frontiers in Big Data since 2018.
- Member of the CODATA National Committee since 2016.
- Member of the Association for Computing Machinery (ACM) since
2010.
- Member of the Institute of Electrical and Electronics Engineers
(IEEE) since 2005.
- Reviewer and Program Committee member for many international
journals and conferences, including VLDB, ICDE, TKDD, etc.
- Panelist for discussions on AI and Its Impact on Jobs at IIT
Kanpur.