PhD candidate. UC berkeley
yifanwu@berkeley.edu Twitter_Logo_Blue
I create new interfaces and DSLs for data analysis, using database techniques.
I'm looking for a job for fall 2020.


Scalable Data Analysis Interfaces
Dealing with large scale data analysis involves efforts on multiple fronts: faster backend (the usual suspect), code & interface changes. This line of research focuses on the latter two.
Reification of Interactions in Computation Notebooks
DIEL suggests that queries (i.e. code) can inform interactions and vice versa---they both perform some data manipulation. Midas is built on this insight, allowing data anlysts to switch smoothly between writing code and interacting with charts. If you are feeling adventurous, check out the repo. Stay tuned for the paper!


University of California, Berkeley 2015-present
PhD Candidate in Computer Science
Harvard College 2010-2014
AB in Computer Science (Economics minor)


Graduate Student Researcher 2015-present
University of California, Berkeley Berkeley, CA
Research Intern Summer, 2017
Microsoft Research Redmond, WA
Undergraduate Student Researcher 2013-2014
Harvard University Cambridge, MA


Teaching Assistant at UC Berkeley, for DS100, Principles and Techniques of Data Science, where I guest lectured about visualizations (slides)2020
Mentor for undergrads for research projects2018-2019
Speaker at BiD Seminar, talk titled Reification of Interactions in Compuational NotebooksDec, 2019
Reviewer at Data Systems for Interactive Analysis (DSIA)2019
Speaker at ForwardJS, talk titled, Managing Frontend State with Relational Transducers (slides) Jan, 2019
Speaker at RISELab Seminar, talk titled Mid PhD Reflection (slides)2019
Speaker at RISE Lab Retreat, talk titled, Asynchronous and Concurrent Interactive Visualization (recording) 2017
Attendee at Schloss Dagstuhl, for a seminar on Connecting Visualization and Data Management Research (link)2017
Speaker at StrangeLoop, talk titled Let's Talk About Front-End Consistency (recording) 2016
Teaching Assistant at UC Berkeley, for CS186, Introduction to Database Systems 2015
Teaching Assistant at Harvard College, for CS20, Discrete Mathematics for Computer Science; awarded with Awarded Bok Center's Certificates of Excellence in Teaching. 2014


Full Stack Engineer Spring, 2015
Clever, San Francisco, CA
Technical Program Manager Intern Summer, 2014
Twitter, San Francisco, CA
Software Engineer Intern Summer, 2013
Bing, Microsoft Redmond, WA
Explorer Intern Summer, 2012
Visual Studio, Microsoft Redmond, WA


I plan to graduate summer/fall 2020, and I hope to continue building tools for data analysts. These days, I am particularly interested in scaling analysis teams. I think the history of the individual/organization's programming traces has huge potential to make analysis easier by allowing the tools to build custom on-the-fly interactive visualizations/models and surfacing relevant information synthesized from history.

PhD Summary

Interactive visualization is increasingly popular for data analysis---questions can be quickly specified and answered. However, these interfaces are difficult to program even for those well versed in query languages, such as SQL, and visualization libraries, such as Vega-Lite.

The difficulty manifests on two different levels. First is the effort taken to "wire up" frontends with backends, which involves mundane and repetitive efforts like mapping application-level manipulations to query languages, as well as tough issues like coordinating concurrent and asynchronous events. Second is the more subtle effort to type the code---compared to direct manipulation techniques like brushing, coding is OOM slower and requires the analyst to context switch.

In my Ph.D. I created two new projects.

DIEL, a framework that helps developers build scalable interative data visualizations under a simple, declarative interface. DIEL treats UI events as a stream of data that is captured in an event history for reuse. Developers declare what the state of the interface should be after the arrival of events. DIEL compiles these declarative specifications into relational queries over both event history and the data to be visualized. In doing so, DIEL makes it easier to develop visualizations that are robust against changes to the size and location of data. To read more: scaling, concurrency and consistency.

Midas builds on this duality of queries and interactions in DIEL. It is a notebook extension that creates interactive visualizations inferred from queries already written. The developer can move smoothly between the two mediums.


Joe Hellerstein, Eugene Wu, Remco Chang, Arvind Satyanarayan


I grew up in China (Taiyuan, Beijing) and the UK (Nottingham). I love good books (a big Woolf fan), conversations, and activities that help me move with intention (like rock climbing). After a stint with soylent and "rationality", I now spent a lot of my non-professional time cooking and being silly.