Projects

Here are some projects I have worked on for school, work, or fun that you might find interesting. You can find the code for most of them on GitHub.

Research

Vega stack

With the Interactive Data Lab, we created the Vega stack and a number of tools for data exploration based on it.

Vega-Lite

I am the co-author of Vega-Lite, a high-level visualization grammar. It provides a concise JSON syntax for supporting rapid generation of visualizations to support analysis. Vega-Lite specifications can be compiled to Vega specifications.

Vega-Lite examples.
Vega-Lite examples.

Polestar and Voyager

We built the data exploration tools Polestar and Voyager on top of Vega-Lite. We have written a paper about these systems.

ipyvega

IPython/Jupyter notebook module for Vega and Vega-Lite. The code is on GitHub.

Myria

Myria is a distributed Big Data management system in the cloud which is being developed in the Database group at the University of Washington.

Jobs

Data Search at Google Research

During my internship with Google Research in Mountain View, I worked on the UX for Goods. The system is described in this paper.

Production System Monitoring at Google

During my internship at Google in NYC during the summer of 2014, I implemented a new out of core join and aggregation operators for a large-scale time series database. The system processes production monitoring data from various systems at Google.

Not much about this system is public but there is a talk from John Banning about the system.

CKAN at the Open Knowledge foundation

CKAN is the world’s leading open-source data portal platform used by data.gov.uk, data.gov and publicdata.eu among many others.

It is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available. CKAN is mainly developed by the Open Knowledge Foundation.

Development happens on GitHub.

I also developed and contributed to many other tools related to CKAN like messytables and the datapusher.

School

Text detection with neural networks

For a class project, I extracted 500k labeled images of figures from roughly 1M papers. For each image I created a mask that shows where the text in the image is. I then used the images to train a neural network that would find text in images. The project is on GitHub.

Text detection output.
Text detection output.

Space Clean Up

A bomberman clone written in Squeak Smalltalk. Code is on GitHub.

Screenshot of Space Clean Up.
Screenshot of Space Clean Up.

MapLink/Gis.ly

For Patrick Baudisch’s HCI class at HPI, we had to implement an application that ususally requires large screens on a iPod nano (enough for 4 buttons). We implemented a GIS application that allows users to geo-reference images by simply aligning points. We introduced the concept of an X-Ray layer to enable users on small devices to align points.

You can find more details in the paper.

MapLink.
MapLink.

Singing VHDL board

Together with another student at HPI, I built a music player in VHDL. The code as well as schematics to print the board are on GitHub. I wrote a short blog post about the project.

The glass is half full

In this class project I write about how we can use an optimistic approach to concurrency control. You can find the paper here.

SoSat

A SAT solver that uses different statistical optimization algorithms to solve SAT problems encoded in the DIMACS format. This solver is written in Python and uses Numpy to speed up calculations. The two main algorithms in this solver are an ant colony optimization algorithm and a genetic algorithm. To support these algorithms, there are some pre-processing algorithms.

With Matthias Springer. The code is on GitHub.

Tagshot

Tagshot is a photo management tool in the browser. We designed and developed as a class project. Our goal was to create a tool that let’s users efficiently manage large numbers of photos and in particular add tags. The code is on GitHub and I wrote a blog post about it.

Side projects

Himawari 8

A chrome extension that shows the latest image from the Himawari 8 weather satellite when you open a new tab. The code is on Github and you can install the extension from the Chrome web store.

Himawari Chrome extension new tab page.
Himawari Chrome extension new tab page.

Game of Life

I implemented Conway’s Game of Life in Python, Go, Rust, and C#. All projects are on GitHub.

Leaflet plugins

Leaflet is an open source JavaScript library for mobile friendly interactive maps. I used it for various projects and also wrote a few extensions.

LocateControl

A simple control to find your current location on a leaflet map. Very popular and used on the OpenStreetMap home page.

The code is at github.com/domoritz/leaflet-locatecontrol.

Leaflet locate extension.
Leaflet locate extension.

MaskCanvas

Visualize coverage on a leaflet map. The data is store in a quad-tree to make queries to the data super fast.

The code is at github.com/domoritz/leaflet-maskcanvas. I used it to visualize public transit coverage around Berlin.

Public transit coverage around Berlin.
Public transit coverage around Berlin.

Heatmap layer

For this project I combined the MaskCanvas layer and heatmap.js from my friend Patrick. You can learn more about it at patrick-wied.at/static/heatmapjs/example-heatmap-leaflet.html

Informaticup 2011: Optimal ATM placement

Blog post about our entry.

Visualization of optimal ATM placement.
Visualization of optimal ATM placement.

Informaticup 2012: Shopping tour optimizer

Blog post about our entry.