don't worry, it's probably fine

Notes from the Week #20

06 Apr 2019

weeknotes

I’m back writing my weeknotes after a break needed for my own mental bandwidth.

A lot of change has happened, some of which I’m not able write about (yet), so what follows is an accurate but incomplete weeknotes of the last couple of weeks.

Ain’t no party like a crowd-source party

I volunteered to help run the London version of Democracy Club’s SoPN Data Party at Newspeak House.

On Thursday, the vast majority of candidate lists (properly, Statements of Persons Nominated or SoPN) were published by councils across the UK for the local elections taking place on May 2nd 2019.

Why do we need to do this? SoPNs are published as PDFs, in a painfully unhelpful format immune to even copy-and-paste, so we process candidates for all 5000+ wards (with an estimated 30,000 candidates) into our candidates tracker, which is then exposed as open data.

In an ideal world, there would be a single well-defined format (C/TSV, XML) that each brand of electoral management software would produce so we don’t need to do this manual data collection in the first place, but that’s not the state of the world at the moment.

Special shout-outs to Ed for hosting us and Alice for fielding questions from our new and eager data wombles.

Getting Graphite in a better state

Our Graphite setup is the core of our observability infrastructure, and evolved incrementally such that doing more clever things for reliability and performance was hard.

After a good amount of work, we finally declared the second-generation infrastructure done such that we could do what we call “the Indiana Jones moment” and replace parts of the infrastructure underneath the whole of production without so much as a blip in the metrics collection.

We introduced a layer of carbon-c-relay to multiplex data between multiple graphite nodes, and we now have a funnel of metrics from Edge -> Relay -> Relay -> Cache -> Disk that we can monitor for degradation.

A spike I’d like to try in future is to build readers/writers for Graphite that leverage AWS Timestream, Amazon’s not-yet-out-of-preview hosted timeseries database. The worst thing about running most metrics collection systems (I’d guess) is the state, so being able to point it to a hosted datastore would be ideal.

Legislation as code

The library I was building to calculate SoPN publish dates from the date and type of election finally reached 1.0.0. I was happy with the API, and decided to invest a bit of time learning Sphinx so that I could leverage ReadTheDocs to host its documentation.

I’m going to write a much bigger post about this topic soon as a “release post” so I won’t spoil too much, but I do like the type hints that modern versions of Python support.


I did a bunch of other stuff over this period:

  • Went to the UK finals of the International Championship of Collegiate Acapella. This was excellent, and I hope winners Aquapella go on to kick some arse in the international finals.
  • One of my Christmas presents last year was tickets to see No Such Thing As A Fish, the QI podcast, recorded live. Absolutely great show!
  • Took a holiday down to the New Forest. Beautiful and quiet part of the world, and exactly what was needed as certain things in politics were reaching fever pitch.