don't worry, it's probably fine

A hammer in a kingdom of screws

30 Dec 2020

sql advent of code

I decided to have a crack at Advent of Code (AoC) this year, and try to get as far as I could using only a database and SQL to solve the problems.

tl;dr - great fun but don’t do it if you care about “winning” AoC

(warning: huge spoilers ahead for puzzles in AoC 2020)

Advent of What?

Every December for the last 6 years Eric Wastl has run Advent of Code, a set of daily programming challenges with gradually rising difficulty. A lot of people take part, some for the competitive leaderboard, some to practice their existing skills, some to learn a new language.

I fell into the third category - my knowledge of SQL is “enough to be dangerous” and I’ve accidentally DROP‘d at least one database in my career.

I was up for a bit of a challenge this year, so I attempted to finish all 25 days’ problems, each having two parts. I chose to use SQLite, and jump to Postgres if I desperately needed the features.

First things first, some ground rules for “Advent of SQL”.

  1. Do as much as I can in just SQL
  2. Other languages are allowed provided there’s no logic e.g. runner scripts
  3. C is allowed for generic SQLite extensions but not for solving the problem in wholesale
  4. Aim for halfway, anything on top of that is gravy

It went okay, stretching up to pretty well.

The Good

Most of the difficulty in trying to solve AoC in SQL is getting the input into a decent format. Provided the data could be transformed into a nice table structure, some problems were close to trivial.

Examples of this were:

  • Day 1 (“what is the product of two/three numbers in this list which sum to 2020?”)
  • Day 7 (“given a set of containment rules, how many nested containers are there?”)
  • Day 17 (implement 3D, followed by 4D, Game of Life)
  • Day 21 (“given a set of foods, ingredients, and allergens, which ingredients are safe?”)

Recursive common table expressions (CTEs) and window functions are both incredibly powerful and I’d not used either for a real task before.

One of my big takeaways is that with a runner to make generational changes, SQLite is pretty darn good at Game of Life.

I ended up writing three extensions using SQLite’s C API, to enable key functionality that SQLite does not have out of the box:

  • splitting a string e.g. split("a,b,c",",") gives three rows ("a", "b", and "c")
  • multiplying all integer/floating point entries in a column
  • transforming a binary string to an integer e.g. binary_to_int("01101") returns 15

The Bad

Quite a few of the problems required iterating over the input in a particular order, or relying on past state. You can emulate it by incrementing a counter in the recursion step and using it to join against a particular row in the input table, but it feels like a hack.

At least two problems required infinite recursion and recognition of a stopping state, which as far as I knew was outside of SQLite’s capabilities so I ended up setting a very high query limit and hoping it reached the desired state (for at least one of them, it did).

Anything that required a specific data structure to solve was likely to be an instant game-over. You can fake a certain amount with JSON object storage, as SQLite provides immutable operations over arrays and objects, but beyond that you’re on your own.

Performance gets really bad when you’re using SQLite’s JSON datatypes for something they’re not supposed to do. I chalked up one of the problems as unfeasible in SQL because my back-of-the-envelope maths showed that it would take almost two weeks to complete the calculations.

Unfortunately the problem in question (Day 15) is a Van Eck sequence for which there exists no closed form as a shortcut to the answer. I was so annoyed at this that I broke my own rule and wrote a solution in Java just to make the problem go away.

The Unexpected

SQLite’s extensibility, documentation, and programmer interface are all stellar. I have nothing but positive things to say about it.

One of the problems (Day 13) had a closed form solution in the shape of the Chinese Remainder Theorem. I took a peek at Rosetta Code to see if anyone had previously done it in SQL. The answer was no so … I added it.

I ended up implementing a full shunting-yard algorithm for one of the questions in order to parse infix arithmetic expressions. If you squint, SQLite’s immutable JSON array looks quite a bit like a stack, which was the key player in this puzzle. This is probably up there with my favourite achievement during AoC this year, even if I daren’t look at the code again.

I took a look at DuckDB as an alternative to SQLite towards the end of the competition - it looks really cool, and its columnar format might have provided useful performance wins in the grittier puzzles. During this process, I accidentally found a segfault-causing bug in the query parser, which I’ve raised as a GitHub issue.

This was my first real dive into side-effect-free programming since I studied Haskell at university. After I’d gotten my head around the idea that the closest I could get to variables was using with to create temporary tables, I started to enjoy “thinking with portals databases”.

Conclusion

I learned a lot, and I’m proud of how far I actually got. At time of writing, I received 38/49 stars for SQL-based solutions (not including my previously mentioned Java solution), significantly higher than halfway.

There are several problems I have not yet attempted due to time constraints and wanting to enjoy the rest of my leave doing something other than AoC. I think only a couple are truly out of reach for “Advent of SQL” and would require a solution in a different language.

Would I try this again? Probably not for a future Advent of Code, but I might come back in future and look at the problems I passed on.

Was it a “good” idea? Absolutely not, if what you care about is getting 50/50 stars.

But sometimes it’s fun do something not because it’s sensible, but just because you can.

My solutions and extensions are on GitHub should you wish to peruse them.