Rodrigo Fonseca And Collaborators Win An NSDI Test Of Time Award


Click the link that follows for more Brown CS content about Rodrigo Fonseca.

At the USENIX Symposium on Networked Systems Design and Implementation (NSDI) this week in Boston, Massachusetts, Professor Rodrigo Fonseca of Brown University's Department of Computer Science (Brown CS) and collaborators from University of California San Diego and University of California Berkeley accepted an award for the most influential paper among those presented a decade ago at the annual conference. Rodrigo and co-author George Porter accepted the 2017 NSDI Test of Time Award for their paper (“X-Trace: A Pervasive Network Tracing Framework”) along with their co-author and former advisor, Professor Ion Stoica, at a luncheon on March 26 to honor a paper published at NSDI 2007. George is now a professor at the University of Calfornia San Diego, where he is a co-director of the university's Center for Networked Systems (CNS).

Rodrigo and George were at University of California Berkeley when they wrote the original paper, which was also co-authored with Professors Randy H. Katz and Scott Shenker of the same institution. “We wrote X-Trace while we were PhD students,” recalls George. “It was really an honor to work with my colleagues on this project, which formed the basis of Rodrigo’s and my PhD dissertations.”

Modern Internet systems often combine different applications, span different administrative domains, and function in the context of network mechanisms (tunnels, VPNs, overlays, and so on). In their 2007 paper, Rodrigo and his collaborators argued that “diagnosing these complex systems is a daunting challenge”. He says, “Many diagnostic tools existed at the time, but none existed for reconstructing a comprehensive view of service behavior.”

X-Trace was not the first tracing framework, but it was influential given that it was effectively the first framework for end-to-end tracing to focus on generality and pervasiveness. “It was based on the observation that an increasing number of systems would be built from heterogeneous components, built and operated by different people,” Rodrigo explains. “In contrast, existing tracing frameworks required a specific language, or were targeted to a particular system.”

The researchers implemented X-Trace in protocols and software systems, and in their prize-winning paper, they set out to explain three different use scenarios: domain name system (DNS) resolution; a three-tiered photo-hosting website; and a service accessed through an overlay network.


The image at left represents an HTTP request going through CoralCDN, a distributed content distribution network. The request is tried in parallel in four different web caches (the different colored paths) before succeeding at the last one.

Hari Balakrishnan, who co-chaired NSDI in 2007, broke the news of the Test of Time Award to the recipients. “We’re very pleased to share that your X-Trace paper from NSDI 2007 has been selected for an NSDI Test of Time Award,” he wrote. “The award honors a paper published ten years earlier at NSDI with retrospectively the most impact on research or practice.”

The X-Trace paper has proved to be prescient in both research and practice. “Today, many Internet-scale backend systems are built using a ‘microservices’ approach, with hundreds of loosely connected components tied together to offer larger services,” noted George. “Debugging these systems effectively requires what X-Trace provided: the ability to correlate events in one component to events in other arbitrary components, even if they were many steps far removed from the first.”

The rapid adoption of tracing began with Google’s introduction of Dapper in 2010, which offered a similar primitive to X-Trace. Twitter’s Zipkin and Cloudera’s HTrace were open-source implementations of Dapper. Another current competitor in the market, called Traceview, also has X-Trace in its DNA after a series of startups and acquisitions dating back to 2010.

“By 2015, many companies such as Netflix, Baidu, Uber, Facebook, and Etsy were deploying internal trace solutions very similar to our ideas presented in the X-Trace paper,” observes Rodrigo. “And the interest persists in a rather recent initiative called OpenTracing, which is trying to standardize end-to-end tracing.” In 2017, the excitement surrounding tracing continues unabated. For example, earlier this year, Amazon released X-Ray, which offers distributed tracing for Amazon Web Services, and another company, Datadog, also released a new end-to-end tracing product.

The NSDI award is not his first for his work on tracing: he co-authored a paper on "pivot tracing" that received a Best Paper award at the 2015 Symposium on Operating Systems Principles. That same year, Rodrigo won an NSF CAREER Award for his work on "causal tracing" to elucidate understanding of the performance of distributed systems. (Causal tracing covers a wide variety of tracing systems and frameworks, including X-Trace itself, as well as Dapper, Zipkin, HTrace, and many others.)

“It’s becoming increasingly difficult to understand how a system behaves, and, especially, how and why it fails,” he says. “Causal tracing is a technique that captures the causality of events across all components, layers, and machines, and it eases the task of understanding complex distributed systems.”

Besides his work on tracing, Rodrigo is more broadly interested in distributed systems and computer networking: "Ultimately, I'm always trying to design systems that are more useful, efficient, and understandable to developers, operators, and end users."

Brown CS gratefully acknowledges Doug Ramsey and University of California San Diego for the research and content creation that originated this news item. For more information, please click the link that follows to contact Brown CS Communication Outreach Specialist Jesse C. Polhemus.