Luca's blog

Discovering Vega-lite

I am in the middle of my journey discovering Clojure and its ecosystem of libraries. So far I have explored quite thoroughly the areas of Data Science (most notable mention: tech.ml.dataset) and R/Python interoperability (mentions: libpython-clj and clojisr). The incredible developers behind these libraries often discuss about data visualization and they seem fond of one "framework" in particular, and that is Vega-Lite.

Of course, there is a particularly popular Clojure library for manipulating visualizations and that is Oz. Out of curiousity, I have watched the canonical introductory video to Vega-Lite and I was really amazed by its simplicity and power of expression. In brief:

Vega is designed by following guidelines outlined in the Grammar of Graphics
Vega is built "on top" of d3.js
Vega-lite is a "lighter" version of Vega, less verbose and with "sane defaults"

After reading about it and experimenting with it, I understand why the smart people of the Clojure community are fascinated by this library. Consider this plot:

This is the plot specification, described in JSON:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "description": "The PM2.5 value of Beijing observed 15 days, highlighting the days when PM2.5 level is hazardous to human health. Data source https://chartaccent.github.io/chartaccent.html",
    "layer": [{
      "data": {
        "values": [
          {"Day": 1, "Value": 54.8},
          {"Day": 2, "Value": 112.1},
          {"Day": 3, "Value": 63.6},
          {"Day": 4, "Value": 37.6},
          {"Day": 5, "Value": 79.7},
          {"Day": 6, "Value": 137.9},
          {"Day": 7, "Value": 120.1},
          {"Day": 8, "Value": 103.3},
          {"Day": 9, "Value": 394.8},
          {"Day": 10, "Value": 199.5},
          {"Day": 11, "Value": 72.3},
          {"Day": 12, "Value": 51.1},
          {"Day": 13, "Value": 112.0},
          {"Day": 14, "Value": 174.5},
          {"Day": 15, "Value": 130.5}
        ]
      },
      "layer": [{
        "mark": "bar",
        "encoding": {
          "x": {"field": "Day", "type": "ordinal", "axis": {"labelAngle": 0}},
          "y": {"field": "Value", "type": "quantitative"}
        }
      }, {
        "mark": "bar",
        "transform": [
          {"filter": "datum.Value >= 300"},
          {"calculate": "300", "as": "baseline"}
        ],
        "encoding": {
          "x": {"field": "Day", "type": "ordinal"},
          "y": {"field": "baseline", "type": "quantitative", "title": "PM2.5 Value"},
          "y2": {"field": "Value"},
          "color": {"value": "#e45755"}
        }
      }
    ]}, {
      "data": {
         "values": [{}]
      },
      "encoding": {
        "y": {"datum": 300}
      },
      "layer": [{
        "mark": "rule"
      }, {
        "mark": {
          "type": "text",
          "align": "right",
          "baseline": "bottom",
          "dx": -2,
          "dy": -2,
          "x": "width",
          "text": "hazardous"
        }
      }]
    }
  ]
}

It is not the simplest example but I think it speaks great of Vega-Lite expressiveness. The visualization is divided in two layers and each layer has two sublayers:

The first layer has simple data associated with it, with the fields "Day" and "Value"
- The first sublayer has a mark specifying a "bar" plot
- The field "Day" is encoded to the x axis and "Value" to the y axis
- The second sublayer is another bar mark with defines transforms:
  - Data below 300 is filtered and 300 is defined as baseline
  - In the encoding there is y, which is the defined baseline
  - There is also y2, which is the values above 300 that we filtered, colored of red
The second layer has no data but a fixed y encoding to 300
- The first sublayer has mark "rule", which draws an horizontal line in y
- The second sublayer has mark "text" and the options describe positioning

I really like this way of composing a visualization. It is very simple to express the components and quite intuitive how to layer them together.

Oz

Well, Vega-lite is nothing new, might not be so exciting for the majority of people. What makes it extremely interesting for me is that it has something in common with Clojure: this visualization spec is just data. Instead of JSON, it can be represented in YAML or EDN. In fact, it's nothing more than a map of vectors and maps.

The library called Oz allows us to define a Vega-Lite spec in Clojure, it compiles it to vega and renders it in a browser with minimal effort. It even allows us to export the plot to a self-contained HTML using the javascript library vega-embed.

Consider this neat code found in the repo's README:

(ns org.core
  (:require [oz.core :as oz]))

(defn play-data [& names]
  (for [n names
        i (range 20)]
    {:time i :item n :quantity (+ (Math/pow (* i (count n)) 0.8) (rand-int (count n)))}))

(def line-plot
  {:data     {:values (play-data "monkey" "slipper" "broom")}
   :encoding {:x     {:field "time" :type "quantitative"}
              :y     {:field "quantity" :type "quantitative"}
              :color {:field "item" :type "nominal"}}
   :mark     "line"})

(oz/export! line-plot "public/html/line.html")

Here 3 random time series are generated, encoded in the most obvious, concise, simple way and the result is what you would expect:

What I love about this example is that you work with raw, naked data. There is no class, no weird API syntax or function kwargs to memorize.

Blog development:

Anoter interesting learning I had was how to embed Vega into this blog! Because I don't know enough about web development, it took me way more than it should have. Actually, it was really easy since you can just write plain HTML in markdown and that will be correctly parsed by markdown.core and reagent.

When taking a compiled Vega spec, I can just put the SVG in a div tag. When exporting it from Clojure, I can put the HTML file in an iframe. It is just a bit annoying that this does not resize automatically but I can control it with the CSS attribute min-height and set it to the height I specify in Vega-lite.

Oz also has a facility to render a Reagent component directly from Clojure. At the moment I am not using it as my blog posts are written in markdown/HTML, I could use it to populate another page of the website.

As a final cherry on top, writing my blog in Org mode has already shown its value. Executing yarn develop starts the shadow-cljs server which will watch for changed files. At the same time it will expose a Clojure REPL that I can connect to in order to execute my org src blocks. When I am satisfied, I can just export to markdown and see the blog post reloading.

Vega-lite in Clojure

Discovering Vega-lite

Oz

Blog development: