Released by CoreOS, etcd is designed to be a highly-available key-value store with a simple http/json API and distributed replication via an implementation of the Raft consensus algorithm. Datastores such as etcd are ideal for holding information that needs to be looked up relatively often and propagated quickly in the case of updates, which fits with the objective of a dynamically-updatable DNS server.

HelixDNS is an implementation of such a server, which reads records from an etcd instance and serves them appropriately.

This post also touches on a choosing a suitable TTL for such records, to find a compromise between fast updates and load on both the server and the backing etcd instance.

Setting up a HelixDNS instance

The project is entirely implemented in the Go language, mainly due to the ease of etcd integration with coreos’s go-etcd client library and a dns server framework which abstracts away the work of parsing requests and building the responses.

Download and install the project using go get github.com/mrwilson/helixdns which will create the hdns binary.

Run without configuration, HelixDNS will default to trying to use an etcd instance running at http://localhost:4001/, which is the default etcd port.

Record format in etcd

Records are stored as a directory-like structure under /helix in etcd’s key-value store, where “sub-directories” correspond to labels in the queried address. For example, adding an A record for foo.example.com to 123.123.123.123 using curl would look like the following:

  curl -XPUT http://etcd-server:4001/v2/keys/helix/com/example/foo/A \
    -d value="123.123.123.123"

etcd supports TTL (time-to-live) on both keys and directories, which will be honoured: if a value has existed beyond its TTL, it will be deleted from etcd and will no longer be able to be served by HelixDNS.

Running a HelixDNS instance

Start a server using the hdns executable compiled previously. This will then establish a connection to the etcd instance (either default or specified in configuration) and be available to start serving records on port 9000.

To test the previously inserted A record, we use dig to query HelixDNS:

dig foo.example.com. @localhost -p 9000

which will spit out

... 
;; ANSWER SECTION:
foo.example.com.  5 IN  A 123.123.123.123
...

Which is, of course, the result we were expecting, with a TTL of 5.

Low record TTL

HelixDNS serves all its records with a relatively low TTL, that of 5s. This is low enough to allow updated records to be served almost immediately, but high enough (that is to say, not zero) to not fall foul of undefined behaviour (some DNS servers interpret a TTL of 0 as ‘cache forever’ as opposed to ‘never cache’).

Configuring BIND to delegate zones to HelixDNS

Some sample configuration follows for delegating a specific subdomain to a HeliXDNS instance. For example, let’s say we have a domain, and the address of this domain is subject to frequent change, at b.probablyfine.co.uk. This is unlike the rest of our domains, which are unlikely to change and aren’t suited to being served by HelixDNS. Here is the BIND 9 configuration to forward this zone onto a HelixDNS instance

; delegate b.probablyfine.co.uk to a hdns instance
b     IN   NS  hdns.probablyfine.co.uk.

; hdns server glue record
hdns  IN   A   <ip-address-for-helixdns-instance>

The glue record is not essential and can be stored elsewhere in the configuration, but it is used to reduce query load when looking up the additional nameservers (as the A record is provided with the response’s Additional Information, and doesn’t require another lookup).

Further work

There is a fairly long list of standardised record types, so for now the prototype just supports A and AAAA. Apart for enforcing that the IP addresses are parseable, there is no validation done against e.g. a CNAME not being the only record type available for a domain lookup where one is present.

Source is available on github and pull requests are welcome.

don't worry, it's probably fine