PhD candidate. UC berkeley
yifanwu@berkeley.edu Twitter_Logo_Blue
I create new interfaces and DSLs for data analysis, using database techniques.
I'm graduating fall 2020.


Scalable Data Analysis Interfaces
Dealing with large scale data analysis involves efforts on multiple fronts: faster backend (the usual suspect), code & interface changes. This line of research focuses on the latter two.
Reification of Interactions in Computation Notebooks
DIEL suggests a close connection between queries and interactive visualizations---they both perform some data manipulation. Based on the insight, we developed B2 to allow data anlysts to switch smoothly between writing code and interacting with charts. We are currently investigating DSLs beyond relational algebras that express shared semantics of code and interactions.
  • B2: Bridging Code and Interactive Visualization in Computational Notebooks Yifan Wu, Arvind Satyanarayan, Joe Hellerstein @ UIST 2020 (code) (demo)

Grad school has been a mind-expanding ride. I got to work on some projects that do not fit into the my main research narrative, but they all relate to the investigation of "human data interfaces"---be it API design, ways tosearch with DNN generated labels, or how we update our beliefs based on visual data.


University of California, Berkeley 2015-present
PhD Candidate in Computer Science
Harvard College 2010-2014
AB in Computer Science (Economics minor)


Graduate Student Researcher 2015-present
University of California, Berkeley Berkeley, CA
Research Intern Summer, 2017
Microsoft Research Redmond, WA
Undergraduate Student Researcher 2013-2014
Harvard University Cambridge, MA


Teaching Assistant at UC Berkeley, for DS100, Principles and Techniques of Data Science, where I guest lectured about visualizations (slides)2020
Mentor for undergrads for research projects2018-2019
Speaker at BiD Seminar, talk titled Reification of Interactions in Compuational NotebooksDec, 2019
Reviewer at Data Systems for Interactive Analysis (DSIA)2019
Speaker at ForwardJS, talk titled, Managing Frontend State with Relational Transducers (slides) Jan, 2019
Speaker at RISELab Seminar, talk titled Mid PhD Reflection (slides)2019
Speaker at RISE Lab Retreat, talk titled, Asynchronous and Concurrent Interactive Visualization (recording) 2017
Attendee at Schloss Dagstuhl, for a seminar on Connecting Visualization and Data Management Research (link)2017
Speaker at StrangeLoop, talk titled Let's Talk About Front-End Consistency (recording) 2016
Teaching Assistant at UC Berkeley, for CS186, Introduction to Database Systems 2015
Teaching Assistant at Harvard College, for CS20, Discrete Mathematics for Computer Science; awarded with Awarded Bok Center's Certificates of Excellence in Teaching. 2014


Full Stack Engineer Spring, 2015
Clever, San Francisco, CA
Technical Program Manager Intern Summer, 2014
Twitter, San Francisco, CA
Software Engineer Intern Summer, 2013
Bing, Microsoft Redmond, WA
Explorer Intern Summer, 2012
Visual Studio, Microsoft Redmond, WA


I plan to graduate fall 2020, and I hope to continue building data tools. I'm looking to explore projects to scaling teams working with data. The history of the individual/organization's programming and interaction traces has huge potential to make analysis easier by allowing the tools to build custom on-the-fly interactive visualizations/models and surfacing relevant information synthesized from history.

PhD Summary

Interactive visualization is increasingly popular for data analysis---questions can be quickly specified and answered. However, these interfaces are difficult to program even for those well versed in query languages, such as SQL, and visualization libraries, such as Vega-Lite.

The difficulty manifests on two different levels. First is the effort taken to "wire up" frontends with backends, which involves mundane and repetitive efforts like mapping application-level manipulations to query languages, as well as tough issues like coordinating concurrent and asynchronous events. Second is the more subtle effort to type the code---compared to direct manipulation techniques like brushing, coding is OOM slower and requires the analyst to context switch.

In my Ph.D. I created two new projects.

DIEL, a framework that helps developers build scalable interative data visualizations under a simple, declarative interface. DIEL treats UI events as a stream of data that is captured in an event history for reuse. Developers declare what the state of the interface should be after the arrival of events. DIEL compiles these declarative specifications into relational queries over both event history and the data to be visualized. In doing so, DIEL makes it easier to develop visualizations that are robust against changes to the size and location of data. To read more: scaling, concurrency and consistency.

B2 builds on this duality of queries and interactions in DIEL. It is a notebook extension that creates interactive visualizations inferred from queries already written. We identified additional design gaps that prevents the developer can move smoothly between the two mediums---layout, temporal and semantic. We address these gaps in a Jupyter Notebook extension, which tightens the feedback loop for programming with data. Check out our code and demo (paper coming soon).


Joe Hellerstein, Eugene Wu, Remco Chang, Arvind Satyanarayan


Before the US, I grew up in China (Taiyuan + Beijing) and the UK (Nottingham). I love good books (a big Woolf fan), conversations, and activities that help me move with intention (like rock climbing). After a stint with soylent and "rationality", I now spent a lot of my non-professional time cooking and being silly.