Brown CS Announces Major Achievements In Data Science


Click the links that follow for more Brown CS content about Carsten Binnig, DARPA, the Data Science Initiative, Lorenzo De Stefani, Tim Kraska, Sloan fellowships, Tupleware, Eli Upfal, Vizdom, Andy van Dam, and Emanuel Zgraggen.

Only a few months after the founding of Brown University's Data Science Initiative (DSI), the Department of Computer Science (Brown CS) is eager to share three innovative projects that represent the recent fruits of long-standing efforts to advance the state of the art in data science research:

Upfal, Binnig, Kraska, And Van Dam Win A $3.1M DARPA Grant For Quality-Aware Interactive Curation Of Models

Professor Eli Upfal (PI) and Professors Carsten Binnig, Tim Kraska, and Andy van Dam (co-PIs) have just won a Defense Advanced Research Projects Agency (DARPA) grant of more than 3.1 million dollars, and domain experts of all kinds stand to benefit.

In the past, researchers have attempted to make large scale data analysis more accessible, but existing machine learning (ML) tools are still far from allowing non-technical users to explore data interactively. They require a deep technical understanding and do not address the risk factors in iterative model development. Upfal and his collaborators have responded by developing the first system for Quality-aware Interactive Curation of Models (QuIC-M). The new project will enable domain experts to build models themselves without the need to involve a data scientist, and to do so in a risk-aware manner, with the software continuously monitoring the user’s interactions to automatically warn them about potential false discoveries, and if possible, suggest solutions or even automatically correct common mistakes.

Beginning with a selection of algorithms from a model family and hyper-parameter tuning techniques implemented in MLBase, the researchers have integrated these techniques into their interactive human-in-the-loop data exploration and model building suite, Vizdom/IDEA. Vizdom is a novel pen-and-touch interface that allows domain experts to curate ML workflows in an intuitive, fluid way, while IDEA is the back-end for Vizdom that enables the interactive curation and evaluation of models on large data sets.

The result will be interactive and quality-assured data exploration that continuously monitors users and warns them about potential wrong conclusions or models. Upfal and his colleagues expect that domain experts who take an introductory class in statistics and ML but are not data scientists will be able to use QuIC-M to build models an order of magnitude faster than an ML expert, with quality that's only slightly (and acceptably) inferior to the expert solution.

Tim Kraska Wins A Sloan Award For Democratizing Data Exploration And Analysis

Picture yourself in a meeting, looking up at a conference room wall. Not far in the future, Professor Tim Kraska expects it to be equipped with an interactive whiteboard that will enable a broad range of users to work together in a single meeting to visualize, transform, and analyze complex data in real time. Between our present and this future is a complete rethinking of the full analytics stack, from its user interface to its smallest components, and incorporating pertinent algorithms.

Tim has just been named an Alfred P. Sloan Research Fellow in one of the oldest and most competitive fellowship programs in the country. He's the eighth Brown CS faculty member to receive the honor, which Brown CS has now received for four years in a row. The fellowships, which take the form of a $50,000 grant used over a two-year period, honor and promote the science of outstanding researchers early in their academic careers who show outstanding promise for fundamental contributions to new knowledge.

Tim's work that will be funded by the fellowship aims to democratize data science by enabling a broader range of users to unfold the potential of their data through the development of a new generation of algorithms and systems for interactive and sustainable data-driven discovery. It includes three major components: Tupleware (a parallel high-performance UDF processing system designed for “normal” users, not the world's Googles and Microsofts), Vizdom, and new techniques to control the multi-hypothesis error.

At SIGMOD, Zhao, De Stefani, Zgraggen, Binnig, Upfal, And Kraska Will Present On Controlling False Discoveries In Interactive Data Exploration

Have you ever been skeptical of recent news items claiming that the secret to winning a Nobel Prize is eating more chocolate and that drinking a glass of wine is as good as spending an hour at the gym? Data-driven stories like these are prone to inaccuracy due to false inferences, and with the rise of interactive data exploration tools, the likelihood of error is significantly increased.

But after analyzing the think-aloud protocols of various user studies with more than 50 participants, researchers at Brown CS have an answer. At SIGMOD '17, a leading international forum for database researchers, PhD candidates Zheguang Zhao, Lorenzo De Stefani, and Emanuel Zgraggen, and Professors Carsten Binnig, Eli Upfal, and Tim Kraska will present the first end-to-end system, QUDE (Quantifying Uncertainty in Data Exploration), to automatically control the risk of false discovery for visual, interactive data exploration.

Their work offers a user interface and an initial set of meaningful default hypotheses to control the ratio of false discoveries without interrupting the exploration process, provides a superior and more modern criterion of controlling the false discovery rate, and demonstrates how the system controls false discovery for experts and novice users alike using generated and real-world data.

Looking Forward

"Data science has been a Brown CS strength for a long time," says Department Chair Ugur Cetintemel, "and the latest work of my colleagues in this area is rigorous, transformative, and will be extremely far-reaching because it addresses real pain points and pitfalls commonly experienced by data scientists. We're excited about the real impact ahead and all the collaborations this line of work will enable across the methods and applications of data science."

Jeff Brock, Director of the DSI, is also eager: “The DSI is extremely fortunate to count among its senior leadership and founding architects these path-breaking researchers in computer science and data science. These breakthrough research projects, each funded through an extremely competitive process, represent the driving philosophy for the DSI: that data and its tools should be accessible to the domain specialist,  distributable in an equitable and transparent manner, and interactive, in a way that maintains the virtuous cycle from model, to inference, to refinement at the hands of the human expert. The DSI is proud to support their work and count these innovative efforts among its first guiding projects."

For more information, click the link that follows to contact Brown CS Communication Outreach Specialist Jesse C. Polhemus.