Aditya Parameswaran

I am an assistant professor of Computer Science at the University of Illinois (UIUC).

My research interests are broadly in building tools for simplifying data analytics, i.e., empowering individuals and teams to leverage and make sense of their datasets more easily, efficiently, and effectively.

Biographical Sketch

Aditya Parameswaran is an Assistant Professor in Computer Science at the University of Illinois (UIUC), with affiliate appointments at the Illinois Informatics Institute, the Institute for Genomic Biology and the Beckman Institute for Advanced Science and Technology. He spent a year as a PostDoc at MIT CSAIL following his PhD at Stanford University, before starting at Illinois in August 2014. He develops systems and algorithms for "human-in-the-loop" data analytics, synthesizing techniques from database systems, data mining, and human computation.

He has received the Army Research Office Young Investigator Program Award (2018), the NSF CAREER Award (2017), the TCDE Early Career Award (2017), the Dean's Excellence in Research Award (2018) and the C. W. Gear Junior Faculty Award from the University of Illinois (2017), multiple "best" Doctoral Dissertation Awards (from SIGMOD, SIGKDD, and Stanford in 2014), "Excellent" teacher awards from Illinois (2015, 2017), a Google faculty award (2015) and focused research award (2017), the Key Scientific Challenges award from Yahoo!, six best-of-conference citations (VLDB 2010, KDD 2012, ICDE 2014, ICDE 2016, AISTATS 2017, VLDB 2017), and a best demo honorable mention (SIGMOD 2017).

He is an associate editor of SIGMOD Record, serves on the steering committee of the HILDA (Human-in-the-loop Data Analytics) Workshop @ SIGMOD and the DSIA (Data Systems for Interactive Analysis) Workshop @ VIS, and has served on program committees of various database, data mining, web, systems, and crowdsourcing conferences. His research group is supported with funding from the NSF (CAREER, Medium, AITF, BigData), the NIH (2X), the Army Research Office, the Toyota Research Institute, Adobe, the Siebel Energy Institute, and Google.

Quick Project Links



  • December 15, 2018: Congratulations to Doris and gang for our IUI 2019 paper identifying a new fallacy in data exploration--the drill-down fallacy--and developing techniques to work around it.
  • December 1, 2018: Yay! My student Mangesh Bendre (coadvised with Kevin Chang) defended his thesis on DataSpread. Mangesh has spearheaded the development of DataSpread, and was instrumental in many of the key innovations so far: the hybrid data model, positional indexing, and asynchronous formula computation.
  • November 15, 2018: Congrats to Doris and the rest of the Helix team for the VLDB 2019 paper on the design of Helix, our human-in-the-loop machine learning system.
  • August 13, 2018: Congratulations to Silu, Liqi et al. (w/ Aaron Elmore) for a "best of conference" nomination for our VLDB 2017 paper on our versioned database system Orpheus's design and implementation!
  • August 13, 2018: Doris's IEEE D.E. Bulletin paper articulating our vision for a visual discovery assistant, called VIDA, will be out soon. Thanks to Alexandra for inviting us.
  • August 10, 2018: Shreya's paper on incorporating constraints for more accurate crowd-powered sorting was accepted as a short paper at CIKM 2018.
  • July 11, 2018: Thrilled to receive the Army Research Office Young Investigator Program Award for our work on decoupling perspectives in crowdsourcing. Thanks to the CS Department for the generous article!
  • June 30, 2018: Our SIGMOD blog post on why visual data exploration introduces a number of new data management challenges is up. Thank you to Georgia Koutrika for inviting me!
  • June 15, 2018: helix Our project page for Helix is up!
  • June 15, 2018: Short papers accepted at IDEA on iteration in machine learning workflows, and at HCOMP on quality evaluation methods for crowdsourced segmentation.
  • May 27, 2018: I am serving on the steering committee of the DSIA workshop @ VIS 2018. Consider submitting your latest and greatest work here!
  • May 2, 2018: Demos on Helix, our human-in-the-loop machine learning tool, and ShapeSearch, our flexible shape-based trend-line querying tool, were accepted at VLDB 2018.
  • April 25, 2018: Our paper on Needletail, an efficient sampling engine for browsing, was accepted at the HILDA Workshop at SIGMOD 2018. We've used Needletail in a number of papers on scalable approximate visualization generation, so we're glad to have this finally out there!
  • April 15, 2018: Our paper on accelerating human-in-the-loop ML, a vision paper for the Helix project, was accepted at the DEEM Workshop at SIGMOD 2018. Lots more to come on Helix in the near future. Congrats Doris!
  • April 1, 2018: Thrilled to receive the 2018 Dean's Excellence in Research Award from the University of Illinois, given to assistant professors with an outstanding research profile + impact. Delighted to be able to celebrate with the group (photo on the right)!
  • March 1, 2018: Happy to be recognized with a spot on the "List of teachers rated as Excellent" for the second year in a row!
  • February 15, 2018: Our demo paper (w/ folks at UChicago) on generating succinct diffs between data versions was accepted at SIGMOD 2018.
  • February 10, 2018: Mangesh's paper on data models and indexes for scalable spreadsheets has been accepted to ICDE 2018! This paper lays out the groundwork for our DataSpread project, many years in the making.
  • December 12, 2017: New paper on quickly identifying a succinct difference (or "diff") between two relational datasets here. We characterize the complexity of this problem, based on varying the classes of operators and types of attributes.
  • December 10, 2017: More Kelly news! Kelly received the Snap Research Scholarshop and the CRA Undergraduate research award honorable mention. Woohoo!
  • December 1, 2017: Interested in trying out our latest version of Zenvisage? Here's the link: More at our Medium blog post.
  • November 18, 2017: Paper studying scalability issues in Microsoft Excel by analyzing a large collection of Reddit posts, accepted at CHI 2018. Congrats Kelly Mack (an amazing achievement for an undergrad)! In other news, Kelly was also nominated for the CRA undergraduate research award.
  • November 10, 2017: Paper on our automatic data lake extraction tool accepted at SIGMOD 2018. Our tool automatically identifies the components corresponding to formatting and filters it out to extract a structured representation, with high accuracies on log files from github. Congrats Yihan and Silu!
  • October 15, 2017: My O'Reilly Blog post on "Enabling Data Science for the Majority" is live! In here, I articulate that there are 5 BIG challenges in democratizing data science, and describe some of our work as well as some of the other work in this space. Read this if you want to find out what's new and cool in data science research.
  • October 10, 2017: New preprint on characterizing the spectrum of scalability issues in Microsoft Excel via Reddit posts here as part of our DataSpread project. Led by Kelly, our intrepid undergrad!
  • October 1, 2017: The Zenvisage gang chronicle our multi-year effort in participatory design with Zenvisage along with scientists from material science, genetics, and astrophysics is chronicled here. Many interesting insights on how visual exploration systems like Zenvisage can fit into scientific data exploration workflows + many real instances of valuable scientific findings gained from the process!
  • September 11, 2017: Thanks to new funding from the NSF Algorithms in the Field (AitF) program, we can advance scalable visualization by applying sublinear time techniques, along with the super smart theory duo of Ronitt Rubinfeld (MIT) and Ilias Diakonikolas (USC). NSF page here.
  • September 1, 2017: VLDB Blog Posts! Here they are:
    • "Towards Automating Insight", here.
    • "Drawing Conclusions Early with Incvisage", here.
    • "Painless Data Versioning for Collaborative Data Science", here.
    • "Crowdsourcing in Practice: Our Findings", here.
  • August 15, 2017: Grateful to receive the C.W. Gear Junior Faculty Award from the University of Illinois! Thanks to the Department of CS for being such a supportive environment for junior faculty!
  • August 1, 2017: New/Updated preprints:
    • on Needletail, our "any-k" browsing and sampling engine, here.
    • on FastMatch, an algorithm for rapidly matching histograms to a target, applying a variety of systems and algorithmic ideas, here; a key component of Zenvisage.
    • on DataSpread, studying representation and indexing schemes for spreadsheet data, here.
    • on Catamaran (formerly known as Datamaran), our unsupervised extraction tool for large-scale extraction from data lakes, here.
  • May 18, 2017: The OrpheusDB demo received a best demo honorable mention! Congrats to Liqi + Silu! Missed it at SIGMOD? You can still catch it here: video.
  • May 15, 2017: Paper on IncVisage: our incrementally improving visualization algorithm and interface has been accepted to VLDB'17! Paper here. Joint work with theorists at MIT and Waterloo, and HCI/Viz folks at Illinois. Perhaps the first paper that has theory, DB, and HCI co-authors? (Would love to be corrected if not.)
  • April 15, 2017: Thrilled and honored to receive:
    • The NSF CAREER Award: Abstract here. Excited to pursue the vision of optimizing "open-ended" crowdsourcing! Vision paper from IEEE Data Engg. Bulletin here.
    • The TCDE (Technical Committed on Data Engineering) Early Career Award, awarded for an individual's whole body of work in the first 5 years after the PhD. The award citation: The award is for developing new interactive tools and techniques that expand the reach of data analytics, enabling powerful data-driven discoveries by experts and non-experts alike.
  • April 15, 2017: Orpheus Updates: demo accepted at SIGMOD 2017; paper accepted at VLDB 2017 (no revisions!); open-source release here.
  • April 3, 2017: The New York Times cited Adam Marcus and my book on crowdsourced data management. Article here.
  • April 3, 2017: The HILDA 2017 workshop (co-located with SIGMOD) program is up.
  • March 1, 2017: Manas's TKDE paper on smart drill-down (from the "best of ICDE 2016") was accepted.
  • February 20, 2017: Vision paper on next-gen visualization recommendation systems with Manasi Vartak, Sam et al. is out at SIGMOD Record. Link here.
  • January 30, 2017: My student Silu Huang won the MSR Faculty Fellowship: the first Illinois student since 2011! A great honor! Silu has been recently working on Orpheus.
  • January 30, 2017: Yihan's paper on calibrating classifiers has been accepted as an ORAL presentation at AISTATS'17!
  • January 15, 2017: Our paper analyzing a very large log of all tasks from a popular crowdsourcing marketplace has been accepted at VLDB'17. Learn all about how a marketplace operates, what the distribution of tasks look like, and how the workers behave here.
  • January 10, 2017: Three of our key analytics tools, DataSpread, Zenvisage, and OrpheusDB, are moving out of private betas with a few interested parties to the public, available for easy download and deployment. More details and download links here:
  • January 1, 2017: New preprint release on Catamaran (formerly known as Datamaran), our new fully-unsupervised data extraction tool from machine generated data: no examples or supervision needed! Preprint here.
        More News

Synergistic Activities

I am an Associate Editor for SIGMOD Record, focusing on vision articles. Please consider sending us your most controversial and/or interesting papers!

I serve on the steering committees of HILDA (Human-in-the-loop Data Analytics) at SIGMOD and DSIA (Data Systems for Interactive Analysis) at VIS. Lots of excitement around this nascent area at the intersection of databases, data mining, and visualization/HCI -- join us!

I've served on the program committees of VLDB, KDD, SIGMOD, WSDM, WWW, SOCC, HCOMP, ICDE, and EDBT, many of them multiple times.

I served as an Area Chair for SIGMOD 2017 and as a co-chair for HILDA 2017. I was the SIGMOD 2016 Undergraduate Research Chair.

Recent Releases

Medium Blog

Selected Projects


Zenvisage: A visualization recommendation system

Zenvisage is a tool for effortlessly visualizing insights from very large data sets. It automates finding the right visualization for a query, significantly simplifying the laborious task of identifying appropriate visualizations.

Project page here. Try it live here!


Helix: An Accelerated Human-in-the-loop Machine Learning System

Helix accelerates the iterative development of machine learning pipelines with a human developer "in the loop" via intelligent caching and reuse.

Project page here.


DataSpread: A Spreadsheet-Database Hybrid

DataSpread is a tool that marries the best of databases and spreadsheets.

Project page: here


Orpheus: Relational Dataset Version Management at Scale

DataHub (or "GitHub for Data") is a system that enables collaborative data science by keeping track of large numbers of versions and their dependencies compactly, and allowing users to progressively clean, integrate and visualize their datasets. OrpheusDB is a component of DataHub focused on using a relational database for versioning.

Project page: here


Populace: A Suite of Crowd-Powered Algorithms

Our work has developed a number of algorithms for gathering, processing, and understanding data obtained from humans (or crowds), while minimizing cost, latency, and error. Since 2014, our focus has been on optimizing open-ended crowdsourcing: an understudied and challenging class.

Project page: here