I plan to graduate fall 2020, and I hope to continue building data tools. I'm looking to explore projects to scaling teams working with data. The history of the individual/organization's programming and interaction traces has huge potential to make analysis easier by allowing the tools to build custom on-the-fly interactive visualizations/models and surfacing relevant information synthesized from history.
Interactive visualization is increasingly popular for data analysis---questions can be quickly specified and answered. However, these interfaces are difficult to program even for those well versed in query languages, such as SQL, and visualization libraries, such as Vega-Lite.
The difficulty manifests on two different levels. First is the effort taken to "wire up" frontends with backends, which involves mundane and repetitive efforts like mapping application-level manipulations to query languages, as well as tough issues like coordinating concurrent and asynchronous events. Second is the more subtle effort to type the code---compared to direct manipulation techniques like brushing, coding is OOM slower and requires the analyst to context switch.
In my Ph.D. I created two new projects.
DIEL, a framework that helps developers build scalable interative data visualizations under a simple, declarative interface. DIEL treats UI events as a stream of data that is captured in an event history for reuse. Developers declare what the state of the interface should be after the arrival of events. DIEL compiles these declarative specifications into relational queries over both event history and the data to be visualized. In doing so, DIEL makes it easier to develop visualizations that are robust against changes to the size and location of data. To read more: scaling, concurrency and consistency.
B2 builds on this duality of queries and interactions in DIEL. It is a notebook extension that creates interactive visualizations inferred from queries already written. We identified additional design gaps that prevents the developer can move smoothly between the two mediums---layout, temporal and semantic. We address these gaps in a Jupyter Notebook extension, which tightens the feedback loop for programming with data. Check out our code and demo (paper coming soon).
Before the US, I grew up in China (Taiyuan + Beijing) and the UK (Nottingham). I love good books (a big Woolf fan), conversations, and activities that help me move with intention (like rock climbing). After a stint with soylent and "rationality", I now spent a lot of my non-professional time cooking and being silly.