PhD candidate. UC berkeley
yifanwu@berkeley.edu Twitter_Logo_Blue
I create new interfaces and DSLs for data analysis, using database techniques.
I'm looking for a job for fall 2020.


Scalable Data Analysis Interfaces
Dealing with large scale data analysis involves efforts on multiple fronts: faster backend (the usual suspect), code & interface changes. This line of research focuses on the latter two.
Reification of Interactions in Computation Notebooks
DIEL suggests that queries (i.e. code) can inform interactions and vice versa---they both perform some data manipulation. Midas is built on this insight, allowing data anlysts to switch smoothly between writing code and interacting with charts. Stay tuned for the code and paper!


University of California, Berkeley 2015-present
PhD Candidate in Computer Science
Harvard College 2010-2014
AB in Computer Science (Economics minor)


Graduate Student Researcher 2015-present
University of California, Berkeley Berkeley, CA
Research Intern Summer, 2017
Microsoft Research Redmond, WA
Undergraduate Student Researcher 2013-2014
Harvard University Cambridge, MA


Teaching Assistant at UC Berkeley, for DS100, Principles and Techniques of Data Science, where I guest lectured about visualizations (slides)2020
Mentor for undergrads for research projects2018-2019
Speaker at BiD Seminar, talk titled Reification of Interactions in Compuational NotebooksDec, 2019
Reviewer at Data Systems for Interactive Analysis (DSIA)2019
Speaker at ForwardJS, talk titled, Managing Frontend State with Relational Transducers (slides) Jan, 2019
Speaker at RISELab Seminar, talk titled Mid PhD Reflection (slides)2019
Speaker at RISE Lab Retreat, talk titled, Asynchronous and Concurrent Interactive Visualization (recording) 2017
Attendee at Schloss Dagstuhl, for a seminar on Connecting Visualization and Data Management Research (link)2017
Speaker at StrangeLoop, talk titled Let's Talk About Front-End Consistency (recording) 2016
Teaching Assistant at UC Berkeley, for CS186, Introduction to Database Systems 2015
Teaching Assistant at Harvard College, for CS20, Discrete Mathematics for Computer Science; awarded with Awarded Bok Center's Certificates of Excellence in Teaching. 2014


Full Stack Engineer Spring, 2015
Clever, San Francisco, CA
Technical Program Manager Intern Summer, 2014
Twitter, San Francisco, CA
Software Engineer Intern Summer, 2013
Bing, Microsoft Redmond, WA
Explorer Intern Summer, 2012
Visual Studio, Microsoft Redmond, WA


I plan to graduate summer 2020, and I hope to continue building tools for data analysts. These days, I am particularly interested in scaling analysis teams. I think the history of the individual/organization's programming traces has huge potential to make analaysis easier by allowing the tools to build custom on-the-fly interactive visualizations/models and and surfacing relevant information synthesized from history.

PhD Summary

Interactive visualization is increasingly popular for data analysis---questions can be quickly specified and answered. However, these interfaces are difficult to program even for those well versed in query languages, such as SQL, and visualization libraries, such as Vega-Lite.

The difficulty manifests on two different levels. First is the effort taken to "wire up" frontends with backends, which involves mundane and repetitive efforts like mapping application-level manipulations to query languages, as well as tough issues like coordinating concurrent and asynchronous events. Second is the more subtle effort to type the code---compared to direct manipulation techniques like brushing, coding is OOM slower and requires the analyst to context switch.

In my PhD work, I created two new projects.

DIEL is a library that makes interactive visualizations as easy as writing relational queries. By adding "event tables" to a client-server federated database, the specification of the interaction logic is agnostic to where the data lives, client or server. Much like how when one writes the same query whether the tables are spread over two countries or on the same laptop, in DIEL the developers are agnostic to the location of the data. The caveat to the agnosticism is that latency often causes a design issue, and to create responsive and consistent UI, the developer need to be able to reason about events' relative ordering, which is exposed in the DIEL design. To read more: scaling, concurrency and consistency.

MIDAS builds on this duality of queries and interactions in DIEL. It is a notebook extension that creates interactive visualizations inferred from queries already written. The developer can move smoothly between the two mediums, picking either one when the best suited, because the interaction is reified into code.


Joe Hellerstein, Eugene Wu, Remco Chang


I grew up in China (Taiyuan, Beijing) and the UK (Nottingham). I love good books (a big Woolf fan), conversations, and activities that help me move with intention (like rock climbing). After a stint with soylent and "rationality", I now spent a lot of my non-professional time cooking and being silly.