At Unruly, we use Apache Flume to handle parts of our event-streaming architecture, as it was easy to both set up and drop in custom sources and sinks. As part of my innovation time I tried to set up some Flume topologies to learn about Docker and containerisation.
Setting up a base image
Docker has the concept of an image, from which we start a container running, so the first step was to create an image with Flume pre-installed. Flume’s only dependency is java (as it is a java project), and I created this image from the Ubuntu base image, which will execute the following steps:
- Install java and wget
- Download and untar the flume project into
/opt/flume
- Set JAVA_HOME and add flume-ng to the PATH
Which we do below
Building an image from this Dockerfile (with docker build -t flume .
) will give us a base from which to make Dockerised Flume containers, and is available on the Docker index.
A basic Flume topology
A Flume topology consists of agents, which have 3 core concepts: sources, channels, and sinks.
We receive data from sources, pass it into one or more channels, which get read and processed by sinks. The most basic topology consists of a single node, which we construct below as an agent called docker, with:
- A NetcatSource, reading data from a port and turning it into events.
- A MemoryChannel, buffering events in memory.
- A LoggerSink, which just logs the events it receives.
The configuration file for this topology, which we’ll refer to as flume-example.conf looks like this.
From this, we’ll create a new container with this configuration file, and start the docker agent.
The flume-ng command in the ENTRYPOINT block is the command that will be run on starting the container (which takes the configuration directory, configuration file, and agent name), and the EXPOSE instruction makes the port available at run time, which is where the NetcatSource will be listening.
Once we’ve built this new image (which we’ll call flume-example), we can start this container, with docker run -p 444:44444 -t flume-example
. The -p 444:44444
flag will map port 44444 on the container to port 444 on the host machine. Now we can write messages to it, with echo foo bar baz | nc localhost 444
and see the events being logged.
Cool! We now have a working Flume agent ingesting and processing data.
The next post in this series will show some more interesting Flume topologies, and how we can easily integrate Docker’s features (such as shared volumes and read-only mounting) into a Flume set up.