Andrew Crotty of Brown University's Department of Computer Science (Brown CS) has just received a Google PhD Fellowship for his research in data management, particularly the design of big data analytics systems. His current work focuses on developing a new high-performance analytics platform, Tupleware, which is geared toward complex computations like machine learning.
The fellowship, first launched in 2009, recognizes and supports outstanding graduate students doing exceptional research in computer science and related disciplines. It includes a monetary award and assigns each student a Google Research Mentor to serve as a resource. This year, there were 39 recipients from three different continents: Andrew is one of three winners in the Systems and Networking category, joining colleagues from the University of Cambridge and the University of California, Berkeley. "This was by far the most accomplished group of students we've seen, and each and every nominee should be proud," says Michael Rennaker of Google University Relations.
Andrew's research, he explains, is predicated on the fact that data analytics has grown to include increasingly sophisticated techniques, such as machine learning and advanced statistics. "Frequently," he says, "users express these complex analytics tasks as workflows of user-defined functions (UDFs) that specify each algorithmic step. However, given typical hardware configurations and dataset sizes, the core challenge of complex analytics is no longer sheer data volume but rather the computation itself, and the next generation of analytics frameworks must focus on optimizing for this computation bottleneck. While query compilation has gained widespread popularity as a way to tackle the computation bottleneck for traditional SQL workloads, relatively little work addresses UDF-centric workflows in the domain of complex analytics."
Crotty's research has primarily focused on the creation of a novel architecture for automatically compiling workflows of UDFs and co-developing several related optimizations that consider properties of the data, UDFs, and hardware together in order to generate different code on a case-by-case basis. These techniques are currently being implemented in Tupleware, a new high-performance distributed analytics system whose benchmarks show performance improvements of up to three orders of magnitude compared to alternative systems.
The fellowship is the latest recognition in a busy year for Andrew: he's recently published papers on a variety of topics from compiling UDF-centric workflows to redesigning traditional data management algorithms for high-performance networks to providing interactive analytics through pen and touch. He and other Brown CS colleagues also won a Best Demo Award at VLDB 2015.
"It's a big honor to be awarded the Google fellowship," Andrew says, "especially this year being included among so many other outstanding recipients. I'm really looking forward to continuing the exciting work we've been doing with Tupleware and exploring applications to new areas, including interactive data analysis and genomics pipelines."