Don't worry, it's ... Probably Fine

Faster puppet package installation with multipkg

21 Mar 2014

puppet

Recently I realised that puppet’s package resource, even when passed a list of packages, starts a non-parallelised process for each package that it installs. With the ability to install a list of packages being removed in newer versions of puppet (via the name param), I wanted to create a custom resource type to represent a group of packages, but that installs them in a single command.

Enter multipkg, a puppet module to implement this desired behaviour.

Using multipkg

A multipkg declaration takes a transaction name as a namevar. This would normally be a description of the group of packages (e.g. “editor-plugins”, “apache-modules”) which can be referred to in other places like require/subscribe params.

multipkg { 'project-utils':
  packages => ['foo', 'bar', 'baz']
}

A multipkg will install all packages given in the same installation command, removing the need to stop and start installation commands with significant overhead (I’m looking at you, yum) and letting the implementation parallelise itself in cases where it is able to, something that puppet’s vanilla package resource could not offer.

Achieved speedup

As a quick example, I tried to install 5 pseudo-randomly chosen apache modules (on Ubuntu 12.04). Installed with a regular package block:

Notice: Scope(Class[Package_example]): Installing packages normally...
...
Notice: Finished catalog run in 34.61 seconds

And installed with a multipkg block:

Notice: Scope(Class[Multipkg_example]): Installing packages with multipkg...
Notice: /Stage[main]/Multipkg_example/Multipkg[apache-mods]/ensure: created
Notice: Finished catalog run in 15.57 seconds

The observed speedup in this case is approximately 50%, but will grow with the amount of packages specified.

Taking advantage of transactions

Removing the overhead of starting installations in serial is the initial benefit, but in certain cases (where the package manager is able) we can also take advantage of transactionality: if one package in a group fails to install, then none of them will be and the system (with regards to these packages being installed) remains unchanged.

This allows us to reason more clearly about the current state of the machine, and would not leave us in the case where some dependencies are installed, but not others, or the wrong versions are pulled in. The main caveat of this provider is having to manually specify versions in the package name rather than a version param.

Source is available on github and the module is available to download from Puppet Forge