Aditya Parameswaran

I am an assistant professor of Computer Science at the University of Illinois (UIUC). My research interests are broadly in simplifying and improving data analytics, i.e., helping users make better use of their data.

My work involves building real data analytics systems with principled foundations, designing algorithms (with formal guarantees) for the systems, as well as mining data obtained from such systems.

Biographical Sketch

Aditya Parameswaran is an Assistant Professor in Computer Science at the University of Illinois (UIUC), with affiliate appointments at the Institute for Genomic Biology and the Beckman Institute for Advanced Science and Technology. He spent a year as a PostDoc at MIT CSAIL following his PhD at Stanford University, before starting at Illinois in August 2014. He develops systems and algorithms for "human-in-the-loop" data analytics, synthesizing techniques from database systems, data mining, and human computation.

He has received the NSF CAREER Award (2017), the TCDE Early Career Award (2017), the C. W. Gear Junior Faculty Award from the University of Illinois (2017), multiple "best" Doctoral Dissertation Awards (from SIGMOD, SIGKDD, and Stanford in 2014), an "Excellent" Lecturer award from Illinois, a Google Faculty award, the Key Scientific Challenges award from Yahoo!, five best-of-conference citations (VLDB 2010, KDD 2012, ICDE 2014, ICDE 2016, AISTATS 2017), a best demo honorable mention (SIGMOD 2017). He is an associate editor of SIGMOD Record, serves on the steering committee of the HILDA (Human-in-the-loop Data Analytics) Workshop, and has served on program committees of various database, data mining, web, systems, and crowdsourcing conferences. His research group is supported with funding from the NSF (CAREER, Medium, AITF, BigData), the NIH (2X), Adobe, the Siebel Energy Institute, and Google.

Quick Project Links

                       
               

News

  • October 15, 2017: My O'Reilly Blog post on "Enabling Data Science for the Majority" is live! In here, I articulate that there are 5 BIG challenges in democratizing data science, and describe some of our work as well as some of the other work in this space. Read this if you want to find out what's new and cool in data science research.
  • October 10, 2017: New preprint on characterizing the spectrum of scalability issues in Microsoft Excel via Reddit posts here as part of our DataSpread project. Led by Kelly, our intrepid undergrad!
  • October 1, 2017: The Zenvisage gang chronicle our multi-year effort in participatory design with Zenvisage along with scientists from material science, genetics, and astrophysics is chronicled here. Many interesting insights on how visual exploration systems like Zenvisage can fit into scientific data exploration workflows + many real instances of valuable scientific findings gained from the process!
  • September 11, 2017: Thanks to new funding from the NSF Algorithms in the Field (AitF) program, we can advance scalable visualization by applying sublinear time techniques, along with the super smart theory duo of Ronitt Rubinfeld (MIT) and Ilias Diakonikolas (USC). NSF page here.
  • September 1, 2017: VLDB Blog Posts! Here they are:
    • "Towards Automating Insight", here.
    • "Drawing Conclusions Early with Incvisage", here.
    • "Painless Data Versioning for Collaborative Data Science", here.
    • "Crowdsourcing in Practice: Our Findings", here.
  • August 15, 2017: Grateful to receive the C.W. Gear Junior Faculty Award from the University of Illinois! Thanks to the Department of CS for being such a supportive environment for junior faculty!
  • August 1, 2017: New/Updated preprints:
    • on Needletail, our "any-k" browsing and sampling engine, here.
    • on FastMatch, an algorithm for rapidly matching histograms to a target, applying a variety of systems and algorithmic ideas, here; a key component of Zenvisage.
    • on DataSpread, studying representation and indexing schemes for spreadsheet data, here.
    • on Datamaran, our unsupervised extraction tool for large-scale extraction from data lakes, here.
  • May 18, 2017: The OrpheusDB demo received a best demo honorable mention! Congrats to Liqi + Silu! Missed it at SIGMOD? You can still catch it here: video.
  • May 15, 2017: Paper on IncVisage: our incrementally improving visualization algorithm and interface has been accepted to VLDB'17! Paper here. Joint work with theorists at MIT and Waterloo, and HCI/Viz folks at Illinois. Perhaps the first paper that has theory, DB, and HCI co-authors? (Would love to be corrected if not.)
  • April 15, 2017: Thrilled and honored to receive:
    • The NSF CAREER Award: Abstract here. Excited to pursue the vision of optimizing "open-ended" crowdsourcing! Vision paper from IEEE Data Engg. Bulletin here.
    • The TCDE (Technical Committed on Data Engineering) Early Career Award, awarded for an individual's whole body of work in the first 5 years after the PhD. The award citation: The award is for developing new interactive tools and techniques that expand the reach of data analytics, enabling powerful data-driven discoveries by experts and non-experts alike.
  • April 15, 2017: Orpheus Updates: demo accepted at SIGMOD 2017; paper accepted at VLDB 2017 (no revisions!); open-source release here.
  • April 3, 2017: The New York Times cited Adam Marcus and my book on crowdsourced data management. Article here.
  • April 3, 2017: The HILDA 2017 workshop (co-located with SIGMOD) program is up.
  • March 1, 2017: Manas's TKDE paper on smart drill-down (from the "best of ICDE 2016") was accepted.
  • February 20, 2017: Vision paper on next-gen visualization recommendation systems with Manasi Vartak, Sam et al. is out at SIGMOD Record. Link here.
  • January 30, 2017: My student Silu Huang won the MSR Faculty Fellowship: the first Illinois student since 2011! A great honor! Silu has been recently working on Orpheus.
  • January 30, 2017: Yihan's paper on calibrating classifiers has been accepted as an ORAL presentation at AISTATS'17!
  • January 15, 2017: Our paper analyzing a very large log of all tasks from a popular crowdsourcing marketplace has been accepted at VLDB'17. Learn all about how a marketplace operates, what the distribution of tasks look like, and how the workers behave here.
  • January 10, 2017: Three of our key analytics tools, DataSpread, Zenvisage, and OrpheusDB, are moving out of private betas with a few interested parties to the public, available for easy download and deployment. More details and download links here: http://tiny.cc/three-tools.
  • January 1, 2017: New preprint release on Catamaran, our new fully-unsupervised data extraction tool from machine generated data: no examples or supervision needed! Preprint here.
  • December 1, 2016: Two new paper updates:
    • Our paper on SlimFast: a data fusion algorithm, spearheaded by Theo Rekatsinas has been accepted at SIGMOD'17!
    • Our vision paper on Open-Ended Crowdsourcing was accepted to appear at the IEEE data engineering bulletin, spearheaded by the amazing Tova Milo.
  • December 1, 2016: I've given a bunch of talks on our three tools for human-in-the-loop data analytics: a distinguished colloquium at Northwestern, a keynote at the Enterprise Intelligence workshop at KDD'16, and BigData events at Illinois and Chicago. Grab the slides here.
  • December 1, 2016: My exceptional PhD student, Silu Huang, was a finalist in the prestigious Microsoft Research PhD fellowship competition, with an in-person interview at MSR HQ -- so proud of her! Fingers crossed for the eventual outcome.
  • November 15, 2016: Many new preprints! Grab 'em while they're hot:
    • From the Zenvisage project: a paper on visualizations that incrementally improve over time, and a paper on our rapid sampling engine for visualizations.
    • From the Orpheus project: a paper describing data models and partitioning schemes for relational dataset versioning.
    • From the Populace project: a paper on consensus-based clustering of unstructured data.
    • From the DataSpread project: a paper evaluating representation schemes and indexing structures for billion cell spreadsheets.
  • November 15, 2016: We delivered our tutorial on crowdsourced data management at HCOMP'16: slides part 1 part 2.
  • November 1, 2016: Thanks to Adobe for supporting our research efforts!
  • November 1, 2016: New releases for : a paper on the Zenvisage query language, ZQL and our smart-fuse query optimizer, accepted at VLDB'17 here, plus a demonstration paper accepted at CIDR'17 here.
  • October 15, 2016: I am one of the chairs of the Human-in-the-loop Data Analytics (HILDA) Workshop at SIGMOD'17, along with the peerless Joe Hellerstein, from Berkeley, and Carsten Binnig from Brown. Website here. Follow us on twitter.
  • October 1, 2016: My outstanding MS student, Vipul Venkataraman won the Siebel Scholarship: cool cash prize of $20K. Well-deserved!
  • September 15, 2016: Participated in a fun panel on "Will AI eat us all?" with the eminent team of Sunita Sarawagi, Sihem Amer-Yahia, H. Jagadish, and Ihab Ilyas at VLDB'16. Short answer: no.
  • September 1, 2016: Thanks to NSF for funding our work on DataSpread with an NSF BigData grant. Some Illinois press here.
  • September 1, 2016: Participated in an invited workshop on the "Theory and Models for Crowds and Networks" with an eminent team of researchers in Oaxaca, Mexico. I presented a tutorial on the data management community's take on crowdsourcing. Slides here.
  • August 1, 2016: New slick websites for projects:
    • , our spreadsheet-database hybrid: here.
    • , our versioned database system: here.
    • , our visualization recommendation system: here.
    • , our project on optimizing crowdsourcing: here.
  • July 15, 2016: Our new paper on producing intelligent summaries of facets of papers, with Xiang Ren, Tarique Siddiqui, and Jiawei Han has been accepted at CIKM 2016!
  • June 20, 2016: Our paper on data exploration at ICDE 2016 was invited to the TKDE "best of conference" issue, an honor reserved for the top few papers at the conference. Great job Manas!
  • June 15, 2016: After two years of extensive collaborations with folks at the two institutes, I am now an "official" affiliate of the Institute for Genomic Biology, and the Beckman Institute for Advanced Science and Technology.
  • June 1, 2016: Our paper on Squish: a tool for compression of relational datasets was accepted at KDD 2016! Our code is open-source and available on Github.
  • May 1, 2016: New release on our visual data exploration platform zenvisage. Paper here, and website dedicated to Zenvisage here. Contact us if you'd like to test run zenvisage on your datasets!
  • April 15, 2016: We just received a small seed grant from the Siebel Energy Institute to develop Zenvisage in collaboration with battery scientists at Carnegie Mellon! Excited to see what happens next.
  • April 10, 2016: We received a whopping 3X the number of submissions for the undergraduate research contest. Who knows what these young researchers will accomplish next?
  • April 1, 2016: Our paper on Decibel, the storage engine underlying DataHub, was accepted at SIGMOD 2016!
  • March 1, 2016: Thrilled to be among the "List of Teachers Ranked as Excellent by their Students" at Illinois! Happy to see that students enjoy my classes.
  • January 6, 2016: Adam and I are proud to finally release a book on crowdsourced data management, a labor of love under development for two years. The book not only covers the state of the art, but also contains a survey of both industry users of crowdsourcing and managers of crowdsourcing marketplaces. We hope that this book will be the definitive reference for how crowdsourcing is used in practice. Do send us comments!
  • January 1, 2016: Our vision paper on the unsolved challenges in large-scale data crowdsourcing was accepted at TKDE.
  • December 15, 2015: Our paper on interactive exploration using a more expressive drill-down operator was accepted at ICDE 2016 in Finland.
  • November 25, 2015: Some Illinois press on our NSF-funded DataHub grant. Thrilled and honored to be working with the amazing Sam Madden and Amol Deshpande at solving the problems underlying collaborative data analytics.
  • November 15, 2015: Our paper on optimally managing worker and answer quality in crowdsourcing was accepted at SIGMOD 2016.
  • October 1, 2015: We just heard word that NIH has funded our BD2K commons supplement. Looking forward to working with folks at UChicago to improve data publication workflows!
        More News

Synergistic Activities

I am an Associate Editor for SIGMOD Record, focusing on vision articles. Please consider sending us your most controversial and/or interesting papers!

I served as a co-chair for the HILDA (Human-In-the-Loop Data Analytics) Workshop at SIGMOD 2017. Website here.

I served as an Area Chair for SIGMOD 2017. I've served on the program committees of VLDB, KDD, SIGMOD, WSDM, WWW, SOCC, HCOMP, ICDE, and EDBT, many of them multiple times.

I was the SIGMOD 2016 Undergraduate Research Chair. Our competition has concluded; we had 3X the number of submissions this year compared to previous years.

Recent Releases



Medium Blog




Selected Projects

zenvisage

Zenvisage: A visualization recommendation system

Zenvisage is a tool for effortlessly visualizing insights from very large data sets. It automates finding the right visualization for a query, significantly simplifying the laborious task of identifying appropriate visualizations.

Project page: here


dataspread

DataSpread: A Spreadsheet-Database Hybrid

DataSpread is a tool that marries the best of databases and spreadsheets.

Project page: here


Datasift

Orpheus: Relational Dataset Version Management at Scale

DataHub (or "GitHub for Data") is a system that enables collaborative data science by keeping track of large numbers of versions and their dependencies compactly, and allowing users to progressively clean, integrate and visualize their datasets. OrpheusDB is a component of DataHub focused on using a relational database for versioning.

Project page: here


crowd-alg

Populace: A Suite of Crowd-Powered Algorithms

Our work has developed a number of algorithms for gathering, processing, and understanding data obtained from humans (or crowds), while minimizing cost, latency, and error. Since 2014, our focus has been on optimizing open-ended crowdsourcing: an understudied and challenging class.

Project page: here