Cozy, the collection synthesizer

Cozy is a tool that automatically writes data structure implementations from high-level specifications. Our 2016 paper (PDF) describes the techniques that make Cozy possible.

The source code is available on GitHub. Documentation on using Cozy for your own projects can be found there as well.

Instructions for downloading the case study programs and other evaluation materials for our 2016 paper are available here.


This specification builds a graph data structure that stores edges:

    src : int,
    dst : int

assume src != dst

query outgoingEdges(node : int)
    src == node

query allEdges(node : int)
    src == node or dst == node

Each edge has a source node src and a destination node dst. The assume statement means that this data structure does not store edges that connect a node to itself (often called self-loops).

Cozy can use this specification to generate an efficient data structure:

$ python src/   \
    --java  \
    --java-class Graph \

...which creates

public class Graph {
    public static class Record {
        public Record(int src, int dst) { ... }

    public Graph() { ... }

    public void add(Record e) { ... }
    public void remove(Record e) { ... }
    public void updateSrc(Record e, int newVal) { ... }
    public void updateDst(Record e, int newVal) { ... }

    public Iterator<Record> outgoingEdges(int node) { ... }
    public Iterator<Record> allEdges(int node) { ... }

Of course, Cozy fills in the implementations of each method so you don't have to! Cozy can also generate C++ code. If you provide a benchmark program, Cozy can even do additional fine-tuning to your specific workload.


We have found some evidence that Cozy can avoid the sorts of errors human programmers make when implementing data structures by hand.

We have also investigated the performance of Cozy-generated code relative to human-written code for four benchmark applications.

benchmark results

These graphs show how execution time of the application on a real-world workload changes with the size of the workload, to give a sense for how Cozy's data structures scale. Lower times mean faster and better. For two out of four (ZTopo and Sat4J, graphs e and f), the performances of the human-written and synthesized implementations are nearly identical. In one (Myria, graphs a and b) Cozy far outperforms the human-written implementation, and in another (Bullet, graphs c and d) the human-written implementation is much more efficient. However, Cozy can be made to match the human performance by adding one additional primitive to the core synthesis algorithm (graph d). This new primitive can be enabled in the tool using the --enable-volume-trees flag.

A full discussion of our results can be found in the paper.


Cozy was written by Calvin Loncaric with help from Haoming Liu. The research was carried out with a great deal of help from Michael Ernst and Emina Torlak.