Here are some projects I have worked on for school, work, or fun. You can find the code for most of them on GitHub.
Define an accessible representation of your chart for VoiceOver to generate an audio graph.
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing. I am a committer to the project.
Falcon is an interactive system for real-time brushing and linking interactions among multiple visualizations of billion-record datasets.
Danyel Fisher and I developed a visualization technique to show millions of time series at once.
Pangloss is a big-data visualization system that uses approximation to answer queries quickly. I developed Falcon with Danyel Fisher at MSR.
To visualize values and uncertaity together, Michael Correll and I created Value-Suppressing Uncertainty Palettes (VSUPs for short).
I created Draco, a formal model of visualization design with shareable design guidelines, formal reasoning over the design space, and visualization recommendation.
I am the co-author of Vega-Lite, a high-level visualization grammar for rapid specification of interactive, multi-view graphics.
I contribute to Vega, a declarative visualization grammar that introduces novel primitives for interactive visualization design.
I co-created Voyager. The system accelerates exploratory data analysis by augmenting visual analysis tools with visualization recommendations.
Query language and engine that unifies visualization specifications and recommendations.
As a control condition for Voyager, I co-created Polestar, a Tableau-like data exploration tool.
I maintain the Vega and Vega-Lite online editor.
I maintain Vega-Embed, a library to embed Vega and Vega-Lite visualizations on the web.
I maintain a tool to generate JSON schema from Typescript code. The tool uses the TypeScript compiler to parse into an AST and then generates an equivalent JSON schema. We use this tool to generate the schema for Vega-Lite.
I maintain the library to add tooltips to charts.
I maintain a collection of reusable themes for Vega and Vega-Lite.
I maintain the Jupyter Notebook extesion for Vega and Vega-Lite.
I contribute code to support Vega and Vega-Lite in JupyterLab.
I developed new operators and debugging tools for Myria, a distributed database systems from the UW database group.
I led the development of an interactive tool to investigate quety execution in distirbuted database systems.
During my internship at Google in NYC during the summer of 2014, I implemented a new out-of-core join and aggregation operators for a large-scale time series database inside Monarch. Monarch processes production monitoring data from various systems at Google.
Not much about this system is public but there is a talk from John Banning about the system.
CKAN is the world’s leading open-source data portal platform used by data.gov.uk, data.gov and publicdata.eu among many others. I developed the new data store, a data ingestion service, and a preview extension API.
For a class project, I extracted 500k labeled images of figures from roughly 1M papers. For each image I created a mask that shows where the text in the image is. I then used the images to train a neural network that would find text in images.
I implemented Conway’s Game of Life in Python, Go, Rust, and C#. All projects are on GitHub.
A simple Leaflet map control to find your current location on a leaflet map. Very popular and used on the OpenStreetMap home page.
Visualize coverage on a Leaflet map. The data is store in a quad-tree to make queries to the data super fast.
For this project I combined the MaskCanvas layer and
heatmap.js from my friend Patrick.
A bomberman clone written in Squeak Smalltalk.
For Patrick Baudisch’s HCI class at HPI, we had to implement an application that usually requires large screens on a iPod nano (enough for 4 buttons). We implemented a GIS application to geo-reference images by simply aligning points. We introduced the concept of an X-Ray layer to align points on small devices.
Together with another student at HPI, I built a music player in VHDL, a hardware description language.
A SAT solver that uses different statistical optimization algorithms to solve SAT problems encoded in the DIMACS format. This solver is written in Python and uses Numpy to speed up calculations. The two main algorithms in this solver are an ant colony optimization algorithm and a genetic algorithm. To support these algorithms, there are some pre-processing algorithms.
Tagshot is a photo management tool in the browser. We designed and developed as a class project. Our goal was to create a tool that let’s users efficiently manage large numbers of photos and in particular add tags.
Our contribution to the Informaticup 2011. The algorithms we devleloped compute optimal placements of ATMs in a city. We won the competition and were invited to present our work in Bonn.
Our contribution to the Informaticup 2012. We developed algorithms to find the optimal tour to buy items at stores.