Managing Puppet Secrets with Jerakia and Vault

A new approach to managing encrypted secrets in Puppet using Vault and Jerakia

Introduction and History

Over the past couple of years I’ve talked a lot about a project called Jerakia. Jerakia is a data lookup system inspired by Hiera, but built to be a stand-alone solution that is decoupled from Puppet, or any particular configuration management system that not only offers opportunities to integrate with other tools aside from Puppet thanks to it’s REST API server architecture but also offers a solution to people with far reaching edge cases around data complexity that are hard or impossible to solve in Hiera, it does this largely thanks to being configurable with native Ruby DSL.  If you’ve never heard of Jerakia before, you can read my initial blog post that covers the basics or see the official website.

Being able to deal with secret data, such as passwords and other sensitive data that needs to be served by Puppet has proved to be a very important requirement for Puppet users.  Shortly after Hiera was first released as a third party tool by R.I Pienaar in 2012 I developed one of the first pluggable backends for it, the now deprecated hiera-gpg.  hiera-gpg became hugely popular very quickly as people finally had a way to store production sensitive data along side other non-production data (eg: in the same Git repo) without compromising the details since anyone browsing the Git repo could only see the encrypted form of the key values.

As hiera-gpg grew in popularity as the first plugin of it’s kind to be able to solve the problem, it also suffered from a few design limitations and eventually hiera-eyaml was developed and became the next evolutionary step for handling sensitive data from Hiera.  hiera-eyaml had a better and more modern design than hiera-gpg and has served many users well over the years, but it re-implements a lot of what the yaml backend does with added capabilities to handle encryption, Hiera has always had the ability to support pluggable backends so you can source your data from a variety of different systems, whether they be files, databases or REST API services, but to be able to support encryption within a Hiera lookup you are tied to using a file based YAML back end.

Jerakia initially released with the ability to handle encrypted values from any data source, and up until now it’s done that using a mish-mash of the hiera-eyaml library to provide the decryption mechanism.  I’ve always felt this level of integration wasn’t ideal, hiera-eyaml was never designed to be a standalone solution to be used outside of Puppet and Hiera and the role of providing reliable and secure encryption for your sensitive data is an important one, and so I started looking at platforms that were built specifically for encryption, and more importantly, a shared encryption solution that I could use throughout my toolchain and still maintain the flexibility to store data where and how I want.  I’ve settled on Vault (but you don’t  have to!).


Vault is an open source encryption platform by Hashicorp, the makers of many great software platforms such as Vagrant and Terraform.  Vault is a highly feature rich system for handling all of your encryption and cryptography needs, most of the features of Vault I won’t even touch on in this post, since there are so many.  To take a quote directly from the website; [source]

Vault secures, stores, and tightly controls access to tokens, passwords, certificates, API keys, and other secrets in modern computing. Vault handles leasing, key revocation, key rolling, and auditing. Through a unified API, users can access an encrypted Key/Value store and network encryption-as-a-service, or generate AWS IAM/STS credentials, SQL/NoSQL databases, X.509 certificates, SSH credentials, and more.

Many people use Vault as a place to store their secret data, like an encrypted database, and can use either the command line or the HTTP API to authenticate and retrieve the encrypted values.  Jerakia strives not to be a database but to offer users flexibility in where and how they want to store their data but perform hierarchical lookups in a uniformed fashion regardless of the source of the data, so I was particularly interested in Vault when I read about the introduction of the Transit Backend that is now available.

In a nutshell, the transit backend turns vault from an encrypted database into “cryptography as a service”.  You create an encryption key that authenticated clients can use to encrypt and decrypt data on-the-fly using Vaults’ API but Vault itself never stores the encrypted or decrypted data, but as a dedicated encryption platform it offers a an excellent level of protection around authentication and key storage protection.

This immediately seemed like a great idea for providing the encryption functions to support sensitive data in Puppet and other tools and for a tool like Jerakia.  So the 2.0.0 release of Jerakia now has native support for Vault integration using the Transit secret backend.

Jerakia Encryption

Jerakia has always had the concept of output filters. These are pluggable, data source agnostic and give you the ability to pass the results from any data lookup from any source to a filter that can perform modifications to the results before it’s sent back to the requestor.  In Jerakia 1.x there was an output filter called encryption which was a filter that tried to pick out hiera-eyaml style encoded strings and decrypt them using the slightly hacky integration I touched on earlier.

In Jerakia 2.0 the concept of encryption has become a bit more of a first class citizen, and it’s also pluggable.  In Jerakia 2.0 you can enable encryption and specify a provider for the encryption mechanism you want to use – the shipped provider for encryption in 2.0 is vault, but the API allows you to extend Jerakia with your own providers if you wish, even hiera-eyaml.

The output filter encryption will now use whichever encryption provider has been configured to provide decryption of data regardless of the source, based on a signature (a regular expression) that is advertised by the provider. So if a returned value matches the regular expression advertised by the providers signature, the encryption output filter will flag that as an encrypted value and attempt to decrypt it, otherwise the value will be returned unaltered.  We’re also not tied to a particular data source, Jerakia will detect an decrypt the data no matter which data source is used for the lookup.

Furthermore, an encryption provider can advertise an encrypt capability which allows you to encrypt and decrypt values right on the command line using the Jerakia CLI.   You can use the Jerakia CLI to encrypt a secret string, copy that into your data, whether that be YAML files, or a database or some other data source, and that’s it.

Jerakia + Vault

So I’ve covered Vault and the encryption capabilities provided by Jerakia – now let’s look at how to tie the two together, and use Vaults’ transit backend as an encryption provider for Jerakia, and therefore handle secrets in Puppet / Hiera.

To start with, you’ll need to install and configure Vault and unseal it. Once unsealed, theres a few steps we need to cover in Vault before we integrate Jerakia.  The following steps assume you have an unsealed Vault installation that you can run root commands against.

Configuring Vault

The first thing we need to do is enable the transit backend in Vault, this can be achieved by mounting it with the following command

Once the backend is mounted, we need to create a key for Jerakia to use for encryption and decryption of values.  The name of the key is configurable, but by default Jerakia will try and use a key called jerakia.

Now we have a dedicated key that Jerakia will use for encrypting and decrypting data.  The second step is to create a policy to restrict activity to just the endpoints used for encrypting and decrypting, we’ll also call that policy jerakia. To create a policy, we’re going to create a new file called jerakia_policy.hcl and then import that policy into Vault.

The file should contain the following rules;

Once created and saved, we need to import the policy into vault.

We can do a quick test now to make sure everything is ok by trying to encrypt a value on the command line using the Jerakia transit key and the policy that we’ve just created

If you had a similar output, then we’re good to go! But before we can plug Jerakia into this mix we need to give Jerakia itself something to authenticate with against Vault.  We could use a simple Vault token, which is supported, but this raises issues of expiry and renewal which we probably don’t want to be dealing with every 30 days (or whatever you set your token TTL to).  The recommended way of authenticating to Vault is to use the AppRole authentication backend of Vault.  When using this method of authentication, we configure Jerakia with a role_id and a secret_id and Jerakia uses this to obtain a limited lifetime token from the Vault server to use for interacting with the transit backend API.  When that token expires, Jerakia will automatically request a new token using it’s role_id and secret_id to get a new token.

First we need to create a new AppRole for Jerakia, we’ll give it a token TTL of 10 minutes (optional) but it’s important that we tie this role to the access policy that we created earlier;

Now we can read the AppRole jerakia and determine the role_id

The role_id that we see here we are going to use later on.  But we’re not quite done yet, we need to create a secret_id along with our role_id and the combination of these two values will give Jerakia the authentication it needs to request tokens.  So let’s create that;

Now we have the two crucial pieces of information that we need to integrate Jerakia with Vault, the role_id and secret_id.  Now Vault is ready to be a cryptography provider to Jerakia, we just now need to add some simple configuration to Jerakia to glue all this together.

Configuring Jerakia

With an out-of-the-box installation of Jerakia, we don’t have encryption configured by default, it must be enabled.  If we look at the options on the command line for the sub-command secret we’ll see that there are no sub-commands available.

So the first thing we need to do is enable an encryption provider and give the configuration that it needs. We can do that in jerakia.yaml.  In the configuration file we configure the encryption option with a provider of vault and the specific configuration that our provider requires.  In this example I’m using a Vault instance that is over HTTP, not HTTPS so I need to set vault_use_ssl to false, see the documentation for options to enable SSL.  Because I’m not using SSL I need to set the vault_addr option as well as the secret_id and role_id.

Now I’ve configured an encryption provider in jerakia.yaml, that provider should advertise it’s capabilities to Jerakia and on the CLI I now see some new options available when running jerakia help secret….

Now we should be all set to test encrypting and decrypting data using the Vault encryption provider in Jerakia, we can use the CLI commands to encrypt and decrypt data… Let’s give that a spin;

Now Jerakia can use Vault as a provider of “cryptography as a service”, in a secure, authenticated way.  The only thing left now is to enable our Jerakia  data lookups to be encryption enabled, and we do that by calling the encryption output filter our lookup written in our policy.  We use the output_filter method to add a filter to our lookup, like this;

The inclusion of output_filter :encryption in this lookup tells Jerakia to pass all results to the encryption filter, which will match all returned data values against the signature provided by the encryption provider and if it matches, it will use the encryption provider to handle the decryption of the value before it it passed to the requestor.

Looking up secrets

So let’s add our encrypted value from earlier to this lookup…

This encrypted value can then be imported into any type of data source that you are using with Jerakia, here we’re using the default file data source so we’ll add it to test.yaml in our common path.

Because this string matches the regular expression provided by the vault encryption providers signature, and we’ve enabled the encryption output filter, if we try and look up the key password from the namespace test (test::password in Hiera speak), Jerakia automatically decrypts the data using the Vault transit backend.

Tying it up all into Puppet

Of course, this also means when Puppet / Hiera is integrated with Jerakia this now becomes transparent to Puppet and now we have our Puppet secrets, stored encrypted in any data source with decryption provided by Vault.


Theres a whole lot of really awesome functionality about Vault that I haven’t even touched on in this post, it’s extensive.  Having one tool for your cryptography needs across your infrastructure rather than a variety of smaller less dedicated tools doing their own thing simplifies things a lot.  If you don’t want to use Vault, the encryption feature of Jerakia is entirely pluggable and could be stripped out and replaced with whatever platform you want to use.

The subject of handling sensitive data in Puppet, and other tools, is an ongoing challenge, I’d certainly welcome any feedback over the approach used here.



Follow and share if you liked this

Composite namevars in Puppet

An advanced look at everything you never wanted to know about composite namevars in Puppet resource types

In this post I’m going to look at some of the more advanced concepts around Puppet resource types known as composite namevars. I’m going to assume you have a reasonable understanding of types and providers at this point, if not, you should probably go and read Fun with providers part 1 and Seriously, what is this provider doing, two excellent blog posts from Gary Larizza.

A composite what-now?… let’s start with the namevar

Maybe I’m getting ahead of myself, before we delve in it might be worth a quick refresh of the basics, let’s start with a fairly primitive concept of any Puppet resource type, the namevar. Take this simple example;

It’s important to get our terminology right here. The above piece of code is a Puppet resource declaration, the resource type is a package, there is a resource title of mysql and we have declared an attribute. When we run this code, Puppet will elect a provider to configure the managed entity. When we talk about the managed entity in this example, we are referring to the actual mysql package on the node (deb, rpm…etc). When we refer to the package resource we are talking about the resource within the Puppet catalog.

Puppet needs a way to map the resource declaration to the actual managed entity that it needs to configure, this is where the namevar comes in. Each resource type has a namevar that the provider will use to uniquely identify the managed entity. The namevar attribute is normally, and sensibly, called “name” – although this should never be assumed as we will find out later. What this means is I can use the namevar in a resource declaration like this…

In this example, we’ve changed our resource title to “mysql database” and added the name attribute with a value of “mysql”. The title of the resource in the world of Puppet is “mysql database”, and we would refer to the resource within Puppet as Package['mysql database']. But the provider will identify the managed entity to configure using the namevar, the name attribute. That is to say, whilst the resource is called “mysql database”, the actual thing that the provider will try and manage is a package called mysql.

So, you might be asking yourself, what’s the deal with the first example? We didn’t specify a namevar, so what happened there? The short answer is that in the absence of a namevar being specified in the resource declaration, Puppet will use the title of the resource as the namevar, or more correctly, the value of the name parameter will magically be set to the resource title.

Not all resources have name as their namevar. To find out which attribute is a namevar you can use the puppet describe command to view the documentation of resource attributes to see which attribute is namevar. For example, if we look at the file resource type;

Note that puppet describe will only tell you which attribute is the namevar if it isn’t name, which is confusing.

So, for the file resource, both of the following examples do the same thing, they both manage a file called /etc/foo.conf

Puppet is a stickler for uniqueness

You’ve probably already figured out that things in Puppet must be unique, that is to say, only one resource type can define the desired state of a managed entity. We already know that you can’t declare multiple resources with the same title, so what happens with the above code, the two resource declarations have different titles, let’s see…

Puppet is smart enough to figure out that although the resource titles are different, the namevar is resolving to the same value, which means you are trying to configure the same managed entity twice, which is firmly against the law of Puppet.

So, that was a quick recap of namevars and what they are. Pheew that was easy, are we done? not by a long shot!

When a name is not enough

Sometimes you can’t quantify a managed entity just from a single name alone. A classic example of this is actually the package example used earlier. Let’s revisit that, it sounded simple right? If we want to manage the mysql package we just declare:

This is all well and dandy, if you’re only using one packaging system, but sadly modern day systems come with a whole host of packaging systems, and we can use other providers of the package resource type to install different types of packages, like if I want to install a rubygem I can declare something like

But now consider what happens if you want to manage the yum package “mysql” and the rubygem that is also called “mysql”. We clearly can’t do this;

This will obviously fail to compile, we have two resources declared in Puppet with the same title, and that’s breaking one of the number one rules of Puppet that resources must be unique. So what if we change the titles to be different and use the namevar stuff we’ve just talked about;

Of course, as we’ve already dicussed, name is the package resources namevar and we’ve defined it twice, and as we saw with the file resource, Puppet is so smart if we try and run this code it is obviously going to fail like this;

WTF now? why did that work? Surely we’ve just broken one of Puppet’s golden rules? In general circumstances yes, but this bring us to the crux of this post, Puppet has the concept of composite namevars to solve this very issue. That is, a resource type can actually have more than one namevar, and rather than uniqueness being based soley on one parameter, Puppet can evaluate uniqueness based on a combination of different resource attributes.

Confused?? me too, let’s write some code then….

We’re going to craft a fairly basic type and provider to manage a fictional config file /etc/conf.ini. We want to be able to manage the value of configuration settings within a particular section of the INI file, something like:

Sounds easy enough! We have three basic elements to our managed entitiy, the section, the setting and the value. So let’s create a module called “config” and start with a very primitive resource type.

Now let’s add a provider so our resource type actually does something, for the sake of trying to keep the code to a minimum and focus on the relevant topics I’m going to re-use the ini_file library from the module puppetlabs/inifile… so if you want to follow along from home you’ll need to install that module so puppet/util/ini_file is in your ruby load path. Heres the provider we’ll use;

Whilst not the most comprehensive provider ever written, this should do the job, all we need to do now is write some Puppet code to implement it, let’s do that now.

Let’s just recap on that resource declaration statement, in our resource type we have said that the config resource type has three attributes, the section and setting parameters, and the value property. We don’t have an attribute called name, but we have made the parameter setting the namevar. Like in our file example at the beginning, this means that if we don’t explicitly give the setting parameter in the resource declaration, then the title of the resource, in this case hostname will automatically be used. Now to make sure everything is working…

Now it works, lets break it.

So far so good, but now consider that want to manage settings in other sections within the config file, that may have the same name, for example;

Both sections have a setting called server, so how can we express this in our Puppet manifest? We can’t declare both resources with the same title, and neither can we declare them with different titles but with the setting set to because, as we saw with the file example earlier, Puppet will fail to compile because the namevars must be unique as well as the resource titles and server is our namevar.

To solve this, we must tell Puppet that a config setting is not only uniquly identifiable by it’s setting name, but rather by the combination of the setting name and the section together. To do this, we need to make the section attribute of our type a namevar. That’s not rocket science, we just add a :namevar argument to our section parameter, like this;

What about the title? introducing title_params!

We’re not done yet! Remember in our earlier example using the package resource we discussed how in the absence of the namevar the resource title will be allocated as the namevar. Ths is still the case, but now we have two namevars we need to give Puppet a hand in deciding what to do with the resource title. We do this by creating a method within our type called self.title_patterns, and it goes something like this;

The self.title_patterns method returns a simple array of arrays of arrays containing arrays, or something like that. This particular nugget of insanity is provided in the Puppet::Type class, with a comment saying # The entire construct is somewhat strange…. No shit. If we delve into the Puppet core code in lib/puppet/type.rb we see the output of this method should return this mad data structure;

What we are saying with our above method is that any resource title that matched /(.*)/, which matches anything, will be assigned to the setting attribute if we have not declared it, meaning that we can still run our original Puppet code and get the same behaviour;

Now back to the problem at hand, now we have composite namevars, if we need to manage the setting hostname in both the [server] and [client] setion of our INI file, this is now possible by using our compisite namevars with differing titles.

Let’s do more with title_patterns

You’re probably wondering why the title_patterns method is so complicated (even by Puppet internals standards) for something that does so little, actually, it’s a rather powerful, albeit cryptic, beast. Our current method assigns any title to the setting attribute, we can make this smarter. We can enhance this method to also look for patterns matching section/setting and assign the relevant parts of the title to the right attributes. So let’s change our original regexp and add another element to the main array.

No, I haven’t broken my keyboard and you’re not going blind (yet!). We now have a title pattern with two matches. If we do not specify a title pattern containing / then the type will behave as before and the title will be assigned to the setting attribute, however, if it matches a string with a / then it will parse this as section/setting and assign the section and setting attributes from the title. This means, aswell as using the puppet declarations above, we could also shorten these and write;

The above shortened version has the same behaviour and the provider see’s the attributes in the same way.

How cool was that!

Here’s a recap of what our final type looks like

Final thoughts

Hopefully this has given you a basic understanding of what we mean by composite namevars, and maybe you didn’t break your brain reading this. I think I’ve only touched on the surface of what you could possibly do with title_vars, theres some very scary patterns involving procs out there! But I’ll leave off here. So there you have it, composite namevars and title_patterns, clear as mud, right?

Follow and share if you liked this

Solving real world problems with Jerakia


I’ve always been a great admirer of Hiera, and I still remember the pain and turmoil of living in a pre-Hiera world trying to come up with insane code patterns within Puppet to try and organize my data in a sensible way. Hiera was, and still is, the answer to lots of problems.

For me however, when I moved beyond a small-scale, single-customer orientated Puppet implementation into larger, complex and diverse environments I started to find that I was spending a lot of time trying to figure out how to model things in Hiera to meet my use requirements. It’s a great tool but it has some limitations in the degree of flexibiity it offers around how to store and look up your data.

Some examples of problems I was trying to solve were; How can I…

  • use a different backend for one particular module?
  • give a team a separate hierarchy just for their app?
  • give access to a subset of data to a particular user or team?
  • enjoy the benefits of eyaml encryption without having to use YAML?
  • implemenet a dynamic hierarchy rather than hard coding it in config?
  • group together applciation specific data into separate YAML files?
  • There are many more examples, and after some time I began exploring some solutions. Initially I started playing around with the idea of a “smart backend” to hiera that could give me more flexibility in my implementation and that eventually grew into what is now Jerakia. In fact, you can still use Jerakia as a regular Hiera backend, or you can wire it directly into Puppet as a data binding terminus.

    Introducing Jerakia

    Jerakia is a lookup tool that has the concept of a policy, which contains a number of lookups to perform. Policies are written in Ruby DSL allowing the maximum flexibility to get around those pesky edge cases. In this post we will look at how to deploy a standard lookup policy and then enhance it to solve one of the use cases above.

    Define a policy

    After installing Jerakia the first setup is to create our default policy in /etc/jerakia/policy.d/default.rb

    Jerakia policies are containers for lookups. A policy can have any number of lookups and they are run in the order they are defined

    Writing our first lookup

    A lookup must contain, at the very least, a name and a datasource to use for the lookup. The current datasource that ships with Jerakia is the file datasource. This takes several options, including format and searchpath to define how lookups should be processed. Within the lookup we have access to scope[] which contains all the information we need to determine what data should be returned. In Puppetspeak, the scope contains all facts and top-level variables passed from the agent

    We now have a fairly standard lookup policy which should be fairly familar to Hiera users. A Jerakia lookup request contains two parts, a lookup key and a namespace. This allows us to group together lookup keys such as port, docroot and logroot into a namespace such as apache. When integrating from Hiera/Puppet, the module is used for the namespace, and the variable name for the key. In Puppet we declare;

    This will reach Jerakia as a lookup request with the key port in the namespace apache, and with our lookup policy above a typical request would look for the key “port” in the following files, in order

    This is slightly different behaviour than you would find in Puppet using Hiera, if you are using Jerakia against an existing Hiera filesystem layout which has namespace::key in path, rather than key in path/namespace.yaml then check out the hiera plugin which provides a lookup method called plugin.hiera.rewrite_lookup to mimic hiera behaviour. More on lookup plugins in the next post!

    Adding some complexity

    So far what we have done is not rocket science, and certainly nothing that can’t be easily achieved with Hiera. So let’s mix it up a bit by defining a use case that will change our requirements. This use case is based on a real world scenario.

    We have a team based in Ireland. Their servers are identified with the top level variable location. They need to be able to manage PHP and Apache using Puppet, but they need a data lookup hierarchy based on their project, which is something only they use. Furthermore, we wish to give them access to manage data specifically for the modules they are responsible for, without being able to read, override or edit data for other modules (eg: network, firewall, kernel).

    So, the requirements are to provide a different lookup hierarchy for servers that are in the location “ie”, but only when configuring the apache or php modules, and to source the data from a different location separate from our main data repo. With Jerakia this is easily solvable, lets first look at creating the lookup for the Ireland team…

    So now we have defined a separate lookup for our Ireland based friends. The problem here is that every request will first load the lookup ireland and then proceed down to the main lookup. This is no different than just adding new hierarchy entries in hiera, they are global. This means potentially bad data creeping in, if for example they accidentally override the firewall rules or network configuration.

    To get around this we can use the confine method in the lookup block to restrict this lookup to requests that have “location: ie” in the scope, and are requesting keys in the apache or php namespaces, meaning the requesting modules. If the confine criteria is not met then the lookup will be invalidated and skipped, and the next one used. Finally, we do not want to risk dirty configuration from default values that we have in our hierarchy for apache and php, so we need to tell Jerakia that if this lookup is considered valid (eg: it has met all the criteria of confine) then only use this lookup and don’t proceed down the chain of available lookups. To do this, we use the stop method.

    The confine takes two arguments, a value and a match. The match is a string that can contain regular expressions. The confine method supports either a single match, or an array of matches to compare. So in order to confine this lookup to the location ie we can confine it as follows

    By confining in this way we tell Jerakia to invalidate and skip this lookup unless location is “ie”. Similarly we can add another confine statement to ensure that only lookups for the apache and php namespaces are handled by this lookup. Our final policy would look like this:


    This example demonstrates that using Jerakia lookup policies you can tailor your data lookups quite extensivly giving a high amount of flexibility. This is especially useful in larger organisations with many customers using one central Puppet infrastructure.

    This is just one example of using Jerakia to solve a use case, I hope to blog a small mini-series on other use cases and solutions, and welcome any suggestions that come from the real-world!

    Next up…

    Jerakia is still fairly experimental at the time of writing (0.1.6) and there is still a lot of room for improvement both in exposed functionality and in the underlying code. I’d like to see it mature and there are still plenty of features to add, and code to be tidied up. There is some excellent work being done in Puppet 4.0 with regards to internal handling of data lookups that I think would complement our aims very well (currently all work has been done against 3.x) and the next phase of major development will be exploring these options.

    Also, I talk about Puppet a lot because I am a Puppet user and the problems that I were trying to solve were Puppet/Hiera related, that doesn’t mean that Jerakia is exclusively a Puppet tool. The plan is to integrate it with other tools in the devops space, which given the policy driven model should be fairly straightforward.

    My next post will focus on extending Jerakia and will cover writing and using lookup plugins to enhance the power of lookups and output filters to provide features like eyaml style decryption of data regardless of the data source. I will also cover Jerakia’s pluggable datastore model that encourages community development.

    Follow and share if you liked this

    Puppet data from CouchDB using hiera-http

    Introducing hiera-http

    I started looking at various places people store data, and ways to fetch it and realized that a lot of data storage applications are RESTful, yet there doesn’t seem to be any support in Hiera to query these things, so I whipped up hiera-http, a Hiera back end to connect to any HTTP RESTful API and return data based on a lookup. It’s very new and support for other stuff like SSL and Auth is coming, but what it does support is a variety of handlers to parse the returned output of the HTTP doc, at the moment these are limited to YAML and JSON (or just ‘plain’ to simply return the whole response of the request). The following is a quick demonstration of how to plug CouchDB into Puppet using Hiera and the hiera-http backend.

    Hiera-http is available as a rubygem, or from GitHub:


    Apache CouchDB is a scalable database that uses no set schema and is ideal for storing configuration data as everything is stored and retrieved as JSON documents. For the purposes of this demo I’m just going to do a very simple database installation with three documents and a few configuration parameters to demonstrate how to integrate this in with Puppet.

    After installing couchdb and starting the service I’m able to access Futon, the web GUI front-end for my couchdb service – using this I create three documents, “dev”, “common” and “puppet.puppetlabs.lan”

    CouchDB documents
    CouchDB documents

    Next I populate my common and dev documents with some variables.

    Common document populated with data

    Now CouchDB is configured I should be able to query the data over HTTP

    Query with Hiera

    After installing hiera-http I can query this data directly from Hiera…

    First I need to configure Hiera with the HTTP back end. The search hierarchy is determined by the :paths: configuration parameter and since CouchDB returns JSON I set that as the output handler.

    I can now query this directly from Hiera on the command line

    And of course, that means that this data is now available from Puppet and if I add some overriding configuration variables to my dev document in CouchDB, my lookup will resolve based on my environment setting in Puppet

    Hiera-http is fully featured and supports all standard Hiera back end functions such as hiera_hash, hiera_array order overrides.

    Future stuff

    I’m going to carry on working on new features for hiera-http – including basic auth, HTTPS/SSL, proxys and a wider variety of output handlers – I would like for this back end to be flexible enough to allow users to configure Hiera with any network service that uses a RESTful API to perform data lookups. Keep watching.

    Follow and share if you liked this

    Designing Puppet – Roles and Profiles.

    Update, Feb 15th.

    Since writing this post some of the concepts have become quite popular and have generated quite a lot of comments and questions in the community. I recently did I talk at Puppet Camp Stockholm on this subject and hopefully I might have explained it a bit better there than I did below :-). The slides are available here and a YouTube video will be uploaded shortly.


    So you’ve installed Puppet, downloaded some forge modules, probably written a few yourself too. So, now what? You start applying those module to your nodes and you’re well on your way to super-awesomeness of automated deployments. Fast forward a year or so and your infrastructure has grown considerably in size, your organisations business requirements have become diverse and complex and your architects have designed technical solutions to solve business problems with little regard for how they might actually be implemented. They look great in the diagrams but you’ve got to fit in to Puppet. From personal experience, this often leads to a spell of fighting with square pegs and round holes, and the if statement starts becoming your go-to guy because you just can’t do it any other way. You’re probably now thinking its time to tear down what you’ve got and re-factor. Time to think about higher level design models to ease the pain.

    There is a lot of very useful guidance in the community surrounding Puppet design patterns for modules, managing configurable data and class structure but I still see people struggling with tying all the components of their Puppet manifests together. This seems to me to be an issue with a lack of higher level code base design. This post tries to explain one such design model that I refer to as “Roles/Profiles” that has worked quite well for me in solving some off the more common issues encountered when your infrastructure grows in size and complexity, and as such, the requirements of good code base design become paramount.

    The design model laid out here is by no means my suggestion on how people should design Puppet, it’s an example of a model that I’ve used with success before. I’ve seen many varied designs, some good and some bad, this is just one of them – I’m very interested in hearing other design models too. The point of this post is to demonstrate the benefits of adding an abstraction layer before your modules

    What are we trying to solve

    I’ve spent a lot of time trying to come up with what I see as the most common design flaws in Puppet code bases. One source of problems is that users spend a lot of time designing great modules, then include those modules directly to the node. This may work but when dealing with large and complex infrastructures this becomes cumbersome and you end up with a lot of node level logic in your manifests.

    Consider a network consisting of multiple different server types. They will all share some common configuration, some subsets of servers will also share configuration while other configuration will be applicable only to that server type. In this very simple example we have three server types. A development webserver (www1) that requires a local mysql instance and PHP logging set to debug, a live webserver (www2) that doesn’t use a local mysql, requires memcache and has standard PHP logging, and a mail server (smtp1). If you have a flat node/module relationship with no level of abstraction then your nodes file starts to look like this:

    Note: if you’re already thinking about ENC’s this will be covered later

    As you can see, the networking and users modules are universal across all our boxes, Apache, Tomcat and JDK is used for all webservers, some webservers have mysql and PHP logging options vary depending on what type of webserver it is.

    At this point most people try and simplify their manifests by using node inheritance. In this very simple example that might be sufficient, but it’s only workable up to a point. If you’re environment grows to hundreds or even thousands of servers, made up over 20 or 30 different types of server, some with shared attributes and subtle differences, spread out over multiple environments, you will likely end up with an unmanagable tangled web of node inheritance. Nodes also can inherit only one other node, which will be restrictive in some edge cases.

    Adding higher level abstraction

    One way I have found to minimise the complexity of node definitions and make handling nuances between different server types and edge case scenarios a lot easier is to add a layer (or in this case, two layers) of seperation between my nodes and the modules they end up calling. I refer to these as roles and profiles.

    Consider for a moment how you would represent these servers if you weren’t writing a Puppet manifest. You wouldn’t say “www1 is a server that has mysql, tomcat, apache, PHP with debug logging, networking and users” on a high level network diagram. You would more likely say “www1 is a dev web server” so really this is all the information I want to be applying directly to my node.

    So after analysing all our nodes we’ve come up with three distinct definitions of what a server can be. A development webserver, a live webserver and a mailserver. These are your server roles, they describe what the server represents in the real world. In this design model a node can only ever have one role, it cant be two things simultaneously. If your business now has an edge case for QA webservers to be the same as live servers, but incorporate some extra software for performance testing, then you’ve just defined another role, a QA Webserver.

    Now we look at what a role should contain. If you were describing the role “Development webserver” you would likely say “A development webserver has a Tomcat application stack, a webserver and a local database server”. At this level we start defining profiles.

    Unlike roles, which are named in a more human representation of the server function, a profile incorporates individual components to represent a logical technology stack. In the above example, the profile “Tomcat application stack” is made up of the Tomcat and JDK components, whereas the webserver profile is made up of the httpd, memcache and php components. In Puppet, these lower level components are represented by your modules.



    Now our nodes definitions look a lot simpler and are representitive of their real world roles…

    Roles are simply collections of profiles that provide a sensible mapping between human logic and technology logic. In this scenario our roles may look something like:

    Whether or not you choose to use inherited classes in the way I have done is up to you of course, some people stay clear of inheritence completely, others over use it. Personally I think it works for the purposes of laying out roles and profiles to minimise duplication.

    The profiles included above would look something like the following

    In summary the “rules” surrounding my design can be simplified as;

    • A node includes one role, and one only.
    • A role includes one or more profiles to define the type of server
    • A profile includes and manages modules to define a logical technical stack
    • Modules manage resources
    • Modules should only be responsible for managing aspects of the component they are written for

    Let’s just clarify what we mean by “modules”

    I’ve talked about profiles and roles like they are some special case and modules being something else. In reality, all of these classes can be, and should be modularised. I make a logical distinction between the profile and role modules, and everything else (e.g.: modules that provide resources).

    Other useful stuff to do with profiles.

    So far I’ve demonstrated using profiles as collections of modules, but it has other uses too. As a rule of thumb, I don’t define any resources directly from roles or profiles, that is the job for my modules. However, I do realise virtualised resources and occasionally do resource chaining in profiles which can solve problems that otherwise would have meant editing modules and other functionality that doesn’t quite fit in the scope of an individual module. Adding some of this functionality at the modular level will reduce the re-usability and portability of your module.

    Hypothetically lets say I have a module, let’s call it foo for originalities sake. The foo module provides a service type called foo, in my implementation I have another module called mounts that declares some mount resource types. If I want all mount resource types to be initiated before the foo service is started as without the filesystems mounted the foo service will fail. I’ll go even further and say that foo is a Forge module that I really don’t want to (and shouldn’t have to) edit, so where do I put this configuration? This is where having the profiles level of abstraction is handy. The foo module is coded perfectly, it’s the use case determined from my own technology stack that is requiring that my mount points exists before the foo service, so since my stack is defined in the profile, this is where I should specify it. e.g.:

    It’s widely known that good modules are modules that you don’t need to edit. Quite often I see people reluctant to use Forge modules because their set up requires some peripheral set up or dependancies not included in the module. Modules exist to manage resources directly related to what they were written for. For example, someone may choose to edit a forge mysql module because their set up has dependancies on MMM being installed after MySQL (purely hypothetical). The mysql module is not the place to do this, mysql and mmm are separate entities and should be configured and contained within their own modules, tying the two together is something you’ve defined in your stack, so again, this is where you’re profiles come in…

    This approach is also potentially helpful for those using Hiera. Although Hiera and Puppet are to become much more fused in Puppet 3.0, at the moment people writing forge modules have to make them work with Hiera or not, and people running Hiera have to edit the modules that aren’t enabled. Take a hypothetical module from the Forge called fooserver. This module exposes a paramaterized class that has an option for port, I want to source this variable from Hiera but the module doesn’t support it. I can add this functionality into the profile without the need for editing the module.

    What about using an ENC?

    So you’re probaby wondering why I haven’t mentioned using an ENC (External Node Classifier). The examples above don’t use any kind of ENC, but the logic behind adding a layer of separation between your nodes and your modules is still the same. You could decide to use an ENC to determine which role to include to a node, or you could build/configure an ENC to perform all the logic and return the list of components (modules) to include. I prefer using an ENC in place of nodes definitions to determine what role to include and keep the actual roles and profiles logic within Puppet. My main reason for this is that I get far greater control of things such as resource chaining, class overrides and integration with things like Hiera at the profile level and this helps overcome some tricky edge cases and complex requirements.


    None of the above is set in stone, what I hope I’ve demonstrated though is that adding a layer of abstraction in your Puppet code base design can have some significant benefits that will avoid pitfalls when you start dealing with extremely complex, diverse and large scale set ups. These include

    • Reducing complexity of configuration at a node level
    • Real-world terminology of roles improves “at-a-glance” visibility of what a server does
    • Definition of logical technology stacks (profiles) gives greater flexibility for edge cases
    • Profiles provide an area to add cross-module functionality such as resource chaining
    • Modules can be granular and secular and tied together in profiles, thus reducing the need to edit modules directly
    • Reduced code duplication

    I use Hiera to handle all of my environment configuration data, which I won’t go into detail about in this post. So, at a high level my Puppet design can be represented as;



    As I said previously, this is not a the way to design Puppet, but an example of one such way. The purpose of this post is to explore higher level code base design for larger and more complex implementations of Puppet, I would love to hear other design models that people have used either successfully or not and what problems it solved for you (or introduced :)) so please get in touch with your own examples.

    Follow and share if you liked this

    Introducing hiera-mysql MySQL Backend for Hiera


    Some time ago I started looking at Hiera, a configuration datastore with pluggable back ends that also plugs seamlessly into Puppet for managing variables. When I wrote hiera-gpg a few months ago I realised how easy extending Hiera was and the potential for really useful backends that can consolidate all your configuration options from a variety of systems and locations into one streamlined process that systems like Puppet and other tools can hook into. This, fuelled by a desire to learn more Ruby, lead to hiera-mysql, a MySQL Backend for Hiera.


    hiera-mysql is available as a ruby gem and can be installed with:

    Note: this depends on the Ruby mysql gem, so you’ll need gcc, ruby-devel and mysql-devel packages installed. Alternativley the source can be Downloaded here

    MySQL database

    To demonstrate hiera-mysql, here’s a simple MySQL database some sample data;

    Configuring Hiera

    In this example we’re going to pass the variable “env” in the scope. hiera-mysql will interpret any scope variables defined in the query option, and also has a special case for %{key}. Example:

    Running Hiera

    With the above example, I want to find the value of the variable colour in the scope of live

    If I add more rows to the database that match the criteria, and use Hiera’s array search function by passing -a I can make Hiera return all the rows

    Hiera’s pluggable nature means that you can use this back end alongside other back ends such as YAML or JSON and configure your search order accordingly.


    Currently hiera-mysql will only return the first element of a row, or an array of first elements, so you can’t do things like SELECT foo,bar FROM table. I intend to introduce this feature by implementing Hiera’s hash search in a future release. Also, the module could do with slightly better exception handling around the mysql stuff. Please let me know if theres anything else that would improve it.


    And of course, because Hiera is completely transparent, accessing these variables from Puppet couldn’t be easier!


  • Github homepage for hiera-mysql
  • Official Hiera Project Homepage
  • Hiera – A pluggable hierarchical data store
    Follow and share if you liked this
  • Secret variables in Puppet with Hiera and GPG

    Last week I wrote an article on Puppet configuration variables and Hiera. This almost sorted out all my configuration variable requirements, bar one; what do I do with sensitive data like database passwords, hashed user passwords…etc that I don’t want to store in my VCS repo as plaintext.

    Hiera allows you to quite easily add new backends, so I came up with hiera-gpg, a backend plugin for Hiera that will GPG decrypt a YAML file on the fly. It’s quite minimal and there is some stuff I’d like to do better – for instance it currently shells out to the GPG command, hopefully someone has some code they can contribute that’ll use the GPGME gem instead to do the encryption bit.

    Once you’re up and running with Hiera, you can get the hiera-gpg backend from Rubygems…

    We run several Puppetmasters, so for each one I create a GPG key and add the public key to a public keyring that’s kept in my VCS repo. For security reasons I maintain a dev and a live keyring so only live Puppetmasters can see live data.

    Currently hiera-gpg doesn’t support key passwords, I’ll probably add this feature in soon but it would mean having the password stored in /etc/puppet/hiera.yaml as plaintext anyway, so I don’t see that as adding much in the way of security.

    So I have my GPG secret key set up in roots homedir:

    Next I add my GPG public key to the keyring for live puppetmasters (in my set up, /etc/puppet/keyrings/live is a subversion checkout)

    Now I can create a YAML file in my hieradata folder and encrypt it for the servers in my live keyring.

    If like me you have more than one puppetmaster in your live keyrings, multiple -r entries can be specified on the command line for gpg, you should encrypt your file for all the puppet master keys that are allowed to decrypt it.

    Now you just need to tell Hiera about the GPG backend, My previous Hiera configuration now becomes:

    Here we’re telling Hiera to behave exactly as it used to when we just had the YAML back end, and if it doesn’t find the value you are requesting from YAML it will query the GPG back end which will pick up on your %{calling_module}.gpg.

    Now I can query Hiera on the command line to find my live MySQL root password with:

    In Puppet, I reference my variables in exactly the same way as any other variable

    Theres probably lots of stuff I can improve here, but I have the basics of what I need, a transparent method of data storage using GPG encryption and no sensitive data stored in my VCS repo as plain text.

    Follow and share if you liked this

    Puppet configuration variables and Hiera.

    Managing configuration variables within Puppet has always given me a bit of a headache, and I’ve never really found a way to do it that I’m all together happy with, particularly when dealing with the deployment of complex applications that require a lot, sometimes hundreds, of different configuration variables and multiple environments. I started thinking a while ago that Puppet wasn’t the best place to be keeping these variables in the first place. For starters, this is really valuable data we’re talking about, there may be lots of other applications that may benefit from having access to the way your software is configured, so why should Puppet retain all of this information exclusively for itself? The original extlookup() function in Puppet provides some decoupling of configuration data from Puppet manifests, but I found it a bit limiting and not very elegant having to maintain a bunch of CSV files. I’ve been interested in R.I.Pienaar’s Hiera for a while and thought I’d give it a proper spin and see if it meets my needs.

    Hiera itself is a standalone configuration data store that supports multiple back ends including YAML, JSON and Puppet itself, and adding more back ends to it is a fairly non-challenging task for anyone competent with Ruby. Thanks to hiera-puppet it plugs nicely into Puppet.

    Configuring a basic Hiera setup

    After installing hiera (gem install hiera), I want to test it by setting up a pretty basic configuration store that will override my configuration variables based on environment settings of dev, stage or live. Let’s take a variable called $webname. I want to set it correctly in each of my three environments, or default it to localhost.

    Firstly, I create four YAML files in /etc/puppet/hieradata

    Now I have a YAML file representitive of each environment, I create a simple config in /etc/puppet/hiera.yaml that tells Hiera to search for my environment YAML file followed by common.yaml.

    Now using hiera from the command line, I can look up the default value of $webname with the following command

    But now if I want to know the value for the live and dev environments I can pass an env flag to Hiera

    Accessing this from Puppet

    I can now access these variables directly from my Puppet modules using the hiera() function provided by hiera-puppet. In this example, I already have a fact called ${::env} that is set to dev, stage or live (in my particular set up we use the puppet environment variable for other things)

    Adding more scoping

    OK, thats a fairly simple set up but demonstrates how easy it is to get up and running with Hiera. The requirements I had were a little more complex. Firstly, our hierarchy is broken down into both environment (live, stage, dev..etc) and location. I have multiple environments in multiple locations, a particular location will either be a live, stage or dev environment. So some variables I want to override on the environment level, and some at the more granular location level.

    Secondly, I don’t like the idea of asking Hiera for $webname. That doesn’t tell me anything; what is $webname, what uses it? Consider a more generic variable called $port – that’s going to be confusing. So I started thinking about ways of grouping and scoping my variables. The way I solved this was to introduce a module parameter as well as environment and location in Hiera and place variables for a particular module in it’s own YAML file, using a filesystem layout to determine the hierarchy.

    My new hierdata file system looks a little like this

    Now for each of my modules, I create a YAML file in the folder level that I want to override with the values for my module. Taking the previous example, lets say that I want $webname to be for all live environments, except for Dublin, which I want to be a special case of To accomplish this I create the following two files:

    Hiera-puppet will pass the value of $calling_module from Puppet to Hiera, and we can use this in our hierarchy in hiera.yaml. NOTE: Currently you will need this patch to hiera-puppet in order for this to work!

    So our new /etc/puppet/hiera.yaml file looks like:

    On the command line, we can now see that environment, location and calling module are now used when looking up a configuration variable

    In Puppet, I have ${::env} and ${::location} already set as facts, and since $calling_module will get automatically passed to Hiera from Puppet, my myapplication class looks no different…

    but knowing the module name means I can easily find where this value is set, and I can easily see what configuration a module requires by examining its YAML files under /etc/puppet/hierdata


    In conclusion, I’m now convinced that moving configuration variable data out of Puppet is a very good thing. Now other systems can easily query this valuable information either on the command line or directly with Ruby. By forcing the use of $calling_module I’ve introduced a sort of pseudo scoping for my variables, so, for example… “port” now becomes “port calling_module=apache” and gives me a lot more meaning.

    Many thanks to R.I.Pienaar for help in setting this up, as well as providing the patch to scope.rb that enabled me to get this working.

    Follow and share if you liked this

    Puppet, Parameterized classes .vs. Definitions

    Firstly, a little background on this topic. During PuppetConf this year I attended a very interesting talk by Digant C. Kasundra about the Puppet implementation at Stanford University. At one point he asked, “Who uses definitions?”, and I raised my hand. The next question was, “Who uses parameterized classes?”, and I also raised my hand, this was followed by “Who uses both?”, and I was one of a small minority of people who raised a hand again. The next question was, “Who knows what the difference between a parameterized class and a definition is?”. I took a second or two to think about this and the talk moved on after no-one in the audience raised their hand, and I didn’t get a chance to answer this in the Q&A due to running out of time, but I’ve been thinking about it since. Digant’s view was that the two are very similar and he advocates the use of definitions, which is certainly not a bad point of view, but I don’t think you should be using one or the other, but rather, use either one appropriately, and given the fact that no-one could answer Digant’s question in his talk, I felt it worth expanding on the issue in a blog post and would really welcome any feedback.

    So, what is the core differences in how they are used? Well, the answer is, as Digant rightly pointed out, not a lot, but the small difference between them that does exist is very important, and should easily dictate which one you use in a given situation. Firstly, let’s look at what they actually are;

    A class, parameterized or not, is a grouping of resources (including resources provided by definitions) that can be included into a manifest with one statement. If you make your classs parameterized then you can include that class with dynamic parameters that you can override depending on how your catalog is compiled. A definition is a template that defines what effectively is no different from any other resource type and gives you a boiler plate solution for applying a series of resources in a certain way.

    So now it’s fairly clear that these two things are actually quite different, but you may be thinking, “sure, they’re different but there is nothing that a parameterized class does that you can’t do with a definition, right?” – well, yes, that’s right, but there is plenty that a definition does that you may not want it to do, mainly, puppet allowing it to be instantiated multiple times.

    As an example of this, at the BBC we have a core application, let’s call it acmeapp for semantics sake. The core application takes almost 50 different parameters for configuration and can get deployed differently depending on what type of server profile we are deploying to. We only ever want acmeapp defined once as it is responsible for some core resources such as creating the base install directories, therefore we use a parameterized class for this and sub classes can inherit of this and override any number of variables. Either way, I’m only ever going to apply the acmeapp class once in my catalog. The end result looks something like;

    Whilst the above is possible with a definition, it would allow it to be defined multiple times, which means the resources within the definition would end up duplicated and my catalog failing. I would rather Puppet fails because I’ve tried to use my class twice, rather than fail because I’ve duplicated a resource, such as the file type that creates the installation directory. It makes it very clear to anyone working with my Puppet code that this class should only be applied once.

    Now, lets look at another example. At multiple points in our manifests we need to set up MySQL grants – to initiate an Exec for every grant would be bad practice, as we’ll end up in a lot of code re-use. This is where definitions come in. At the BBC we have a mysql class that not only provides the MySQL packages and services, but also exposes some useful functions to manage your databases and grants through a series of definitions, this is the code to the definition that controls MySQL grants….

    As you can see, this is very different to our usage or a parameterized class, here we’ve templated a significant amount of functionality into one easy-to-use defined resource type. We can re-use this functionality as many times as we need, in the above example we set up two grants for the acmeapp application, one for a VIP IP address and one for an application IP range by specifying something like:

    Hopefully this gives you a good idea of the subtle differences between parameterized classes and definitions, but also that they are both very independent features of Puppet that have their own uses.

    Follow and share if you liked this

    Configuring Tomcat properties files with Augeas and Puppet.


    This post covers quite a few different things, it is taken from a real-world example of something I was asked to do recently which not only involved some cool Puppetmastery using exported resources, storeconfigs and custom definitions, but also forced me to start learning Augeas, which I’ve been meaning to get around to. So, heres the story.

    Some background.

    To put this into context, I recently had a requirement to add some Puppet configuration to manage some Tomcat properties files. On further investigation this turned out to be a little more complicated as the requirements weren’t as simple as chucking some templated configuration files out with a few variables replaced.

    The requirement was for a group of Tomcat servers to contain one properties file with a chunk of configuration in for each server in the group. So for example, each node in the group needed to have something like

    There could be many, many servers in a given group and I don’t want to be maintaining a massive list of variables as that will just get messy, so the answer here is to use exported virtual resources and Puppets’ storeconfig feature. My thinking here is that now I can configure each node in the group with an exported resource that looks something like

    … and then simply collect them all with something like …

    All good so far. Then I started thinking what application::instance() would look like. The requirement was for one properties file with all the nodes configuration in, so I can’t spit out several files from a template, that would be too easy. I looked around at various solutions for building files from fragments but to be honest nothing really appealed to me as elegant or clean, so I started investigating solutions for line-by-line editing. Traditionally this has been done by wrapping a series of piped echo commands and greps in a couple of exec’s, various examples of this exist, often called “line()”, but why do that when we have Augeas, a ready made tool for managing elements within a configuration file in a structured and controlled fashion.

    So, I thought this would be worth experimenting with!

    Creating an Augeas lens

    Augeas uses the term lenses to describe a class that defines how it interacts with different types of files. There are lenses for all sorts of configuration files, such as hosts, httpd.conf, yum.repos.d…etc. You name it, there is probably a lens for it. At the time of writing however, Augeas is not bundled with a lens that can process Tomcat properties files, although I’ve been told this is coming out soon. Thinking that a tomcat properties file is pretty uncomplicated, I decided that instead of searching for someone elses pre-written version of a Tomcat lens I would write my own to gain a better understanding of how Augeas works.

    My first thoughts when reading throught he Augeas documentation was, “Oh my god what have I got myself into”. I soon discovered that this was no simple little tool, and the configuration for lenses seemed immensely complicated. However, then I found this tutorial and ran through it. Slowly it started to make a bit more sense, and I realise that actually this is one powerful application.

    Creating a test file

    My first job was to create a test file, it’s useful to do this first as you’ll want to run augparse periodically to test your lens. The main function of the test file is to parse your example configuration into an augeas tree, and then vica versa and compare the outcomes.

    My Tomcat test file looks like this

    Here I’m testing a variety of scenarios, including indentations, spaces around “=” and comments. The first part tests that when I parse my configuration file using my lens that I get the expected tree, this is the get part. The second is the put part, this tests that setting a couple of variables in the augeas tree and parsing it back out as raw configuration will output in an expected manor, the augparse tool will use the lens I create to compare both of these outcomes and ensure my lens is doing what it should.

    Creating the lens

    At a very basic level, a lens describes a file. So before I started writing the lens for Tomcat properties file I thought about describing my file in plain English, and I came up with

  • Any one line is either a comment, a property or a blank line
  • A comment is a series of spaces/tabs followed by a hash, followed by text
  • A property consists of alphanumerical values seperated by periods
  • A value is a string of text
  • An equals sign separates the property from the value
  • Any line can be indented with tabs and spaces
  • White spaces or tabs can surround the separator

    That doesnt seem so complicated, so then I thought of how I need to represent these in Augeas. Firstly, I thought about my primitive types here that I can use to build up a comment, a key/value pair and a blank line, the 3 functions of any one line. These break down to

  • Blank line
  • End of line
  • Separator
  • Property name part
  • Value part
  • Indentation

    Using these building blocks, I can define a comment, a key/value pair and a blank line, for example, a standard key/value pair line would be…

    (spaces, tabs or null)(alphanumeric characters and periods)(spaces, tabs or null)(=)(spaces, tabs or null)(characters that are not end-of-line)(end-of-line)

    So, when I write regular expressions to define the above, the Augeas configuration looks something like this.

    Now I’ve defined my building blocks I can tell Augeas how these apply to the basic 3 elements of my configuration file; comments, blank lines and properties.

    Finally I set up my lens and filter by telling Augeas that my lens consists of my 3 basic elements, and define which files I wish to be parsed using my lens

    So my final lens file, which I install into /usr/share/augeas/lenses looks like this

    Testing my lens

    I use the augparse command to run my test file I created earlier against my new lens to make sure there are no parsing errors.

    As I final test, I create a file with the following example configuration

    Now I can use augtool to view and change one of my configurtation variables.

    Pulling this into Puppet

    Now I have a working lens, I can manipulate my configuration file using the augeas resource type provided by Puppet. First off, I want to build my tomcat property type using Augeas.

    Now I have a custom definition of tomcat::property that I can implement in my application::instance type. My instance definition needs to be able to set several tomcat variables in the file, so now I can do the following:

    Here I’m defining my application::instance type, and after that I include a resource collector to apply all exported definitions of my instance type. Finally, I just need to actually define the instances that I want to configure. Remember, each host in the group needs to know about every other host, so for each node I can now create something like the following and have it export it’s resource for all other nodes to collect.

    Now, with a combination of exported virtual resources, custom definitions and augeas I have the solution.

    If you want to use my Tomcat Augeas module for yourself, you can download it here

    Follow and share if you liked this