don't worry, it's probably fine

Immutable Infrastructure and Sources of Truth

22 Jul 2014

puppet ansible

My first foray into configuration management software was through Puppet, which we also use as our configuration management tool of choice at work - when we originally made the transition from rat-king shell scripts to a dedicated provisioning system, we liked the idea of a centralised puppetmaster, and this suited us well.

More recently we have been making the effort to shift our the focus of our infrastructure deployment process towards one emphasising “infrastructure immutability”. To us, this means making machines “write once” - provisioning is done largely on boot and after that is frozen from a provisioning point-of-view. If we want to make fundamental changes, then we tear down and rebuild.

As our infrastructure has grown larger we have been using PuppetDB as our Source of Truth - it contains all of the information about the current state of our infrastructure - which machines are alive and where, which can talk to each other, where applications are located.

The conclusion that I’ve drawn after developing in this way for a while is that centralised configuration management with its own Source of Truth (in this case, PuppetDB) does not dovetail well with the ideals of immutable infrastructure.

The benefits of immutable infrastructure

We can look at infrastructure configuration the same way we look at our applications - this practice of “Infrastructure-as-Code” is not a new idea, but we can apply the same concepts from software development and achieve interesting results.

The main virtue of immutable objects from a programmer’s standpoint is that they are easy to reason about - guaranteeing that an object will not change its attributes after creation, and that modification returns an entirely new object.

I have been bitten before where just because a machine was under configuration management that ran without errors, it is no guarantee of being rebuildable from scratch, and indeed that only by rebuilding your infrastructure regularly and deterministically can you approach similar levels of immutability.

This is what “immutable” infrastructure gives us - by removing the ability to change the current state and forcing rebuilds instead, we are encouraging rebuildability in the long term. During this process we’d expect things that lengthen a deploy cycle, like long deploy/provision times, to be exposed and fixed.

All our eggs

We started off wanting our machines to be flexible in production - changing configuration files as an example, dependent on which other machines have a certain application deployed on them. But to achieve this idea, we’ve had to pour more and more (duplicated) data into PuppetDB. This has the added side-effect of massively slowing down deploys - templating files is slow, in my experience.

The problem with implementing an immutable infrastructure with a centralised (and self-updating) configuration system comes down to making the system responsible for too much. For example, we use AWS for our web-hosting but have duplicated “facts” in facter for ec2 region/availability zone. We have iptables rules, but we also use AWS security groups. Not to mention, do our machines really need to talk to each other if their services are unconnected?

So let’s strip out these things from our setup - we start using AWS as our Source of Truth. Our hosting provider, which already handles useful things like private clouds, load balancing, and DNS, is now taking sole responsibility for state of all of our machines. We’ve managed to strip out our duplicate data but we haven’t achieved immutability yet, our machines are still re-puppeting themselves. Why? It’s at this point where having centralised auto-updating configuration loses value.

Truth from within

Throughout this post I’ve referred almost exclusively to AWS, but there are of course other alternatives. I don’t just mean hosting providers, there are also technology solutions to managing inventory in a distributed way - rolling our own Source of Truth. These applications normally fall under the umbrella of Service Discovery - machines register themselves with the system, remove themselves, and the global state is ideally replicated and consistent across the world

Consul is an example that I recently came across which exposes the features you would want in a Source of Truth: a replicated source of service discovery, inventory (and healthchecks), and an additional key-value store. At a basic level, all you really need is an available key-value store like etcd or zookeeper. These can store configuration data, keep track of which nodes are around, and expose service locations (you can even wrap it with a thin DNS server!).

Conclusion

The main virtue of immutable infrastructure is how it forces us to make sure our deployment process is fast and clean, which in my experience is hard to reconcile with the idea of a mastered configuration which is constantly updating internal state. Instead, I would (and do personally) use tooling which emphasises one-shot configuration and have regular tear-down and rebuild cycles. So when you’re setting up your machines, whether at home or at work, ask yourself - when was the last time you did this from scratch?