I plan to graduate summer 2020, and I hope to continue building tools for data analysts. These days, I am particularly interested in scaling analysis teams. I think the history of the individual/organization's programming traces has huge potential to make analaysis easier by allowing the tools to build custom on-the-fly interactive visualizations/models and and surfacing relevant information synthesized from history.
Interactive visualization is increasingly popular for data analysis---questions can be quickly specified and answered. However, these interfaces are difficult to program even for those well versed in query languages, such as SQL, and visualization libraries, such as Vega-Lite.
The difficulty manifests on two different levels. First is the effort taken to "wire up" frontends with backends, which involves mundane and repetitive efforts like mapping application-level manipulations to query languages, as well as tough issues like coordinating concurrent and asynchronous events. Second is the more subtle effort to type the code---compared to direct manipulation techniques like brushing, coding is OOM slower and requires the analyst to context switch.
In my PhD work, I created two new projects.
DIEL is a library that makes interactive visualizations as easy as writing relational queries. By adding "event tables" to a client-server federated database, the specification of the interaction logic is agnostic to where the data lives, client or server. Much like how when one writes the same query whether the tables are spread over two countries or on the same laptop, in DIEL the developers are agnostic to the location of the data. The caveat to the agnosticism is that latency often causes a design issue, and to create responsive and consistent UI, the developer need to be able to reason about events' relative ordering, which is exposed in the DIEL design. To read more: scaling, concurrency and consistency.
MIDAS builds on this duality of queries and interactions in DIEL. It is a notebook extension that creates interactive visualizations inferred from queries already written. The developer can move smoothly between the two mediums, picking either one when the best suited, because the interaction is reified into code.
I grew up in China (Taiyuan, Beijing) and the UK (Nottingham). I love good books (a big Woolf fan), conversations, and activities that help me move with intention (like rock climbing). After a stint with soylent and "rationality", I now spent a lot of my non-professional time cooking and being silly.