Using data schemas with Jerakia 0.5


In my previous posts I introduced and talked about a new data lookup tool called Jerakia.   I also recently gave a talk at Config Management Camp in February about it.

This week saw the release of version 0.5.0.  You can view the release notes here.  In this post I am going to focus on one new feature that has been added to this release, data schemas.

What are data schemas?

In short, schemas provide a definition layer to control lookup behaviour for particular keys and namespaces.  So, what do we mean by lookup behaviour?  Jerakia is a hierarchical based lookup tool and there are a couple of different ways that it can perform a search.  The simplest form of lookup will walk through the hierarchy and return the first result found.  A Jerakia lookup may also request a cascading lookup by setting the cascade option to true.  For a cascading lookup, Jerakia will continue to walk through the hierarchy and search for all instances of of the requested key and combine them together  into a hash or an array.   When performing a cascading lookup, the requestor can select the desired merge behaviour using the merge flag of the lookup.  Supported options for this parameter are array, hash or deep_hash.   When performing a cascading hash lookup, hashes are merged and key conflicts at the first level of the hash are overwritten according to their priority, whereas deep_hash attempts to merge the hash at all nested levels.

These concepts will be familiar to Hiera users as the legacy functions hiera_array() and hiera_hash().

Why use schemas?

The ability to do hash or array type lookups is a very useful and popular feature, and has always existed in Hiera.  For a long time the problem was where should/could you declare that a particular lookup key should be looked up this way.   Initially people used the hiera_hash() and hiera_array() directly in their modules.  This has several drawbacks though.  Most notably the incompatibility with Puppets’ data binding feature meant hash and array lookups had to be dealt with separately outside of the classes parameters, and  hard coding Hiera functions within modules is not best practice as it makes assumptions that the implementor of the module is using Hiera in the first place.

Later versions of Puppet and Hiera have made this nicer.  The new lookup functionality in Puppet 4 provides a lookup() function that takes a lookup strategy as an argument.  This is certainly nicer than the legacy hiera functions as it is provider agnostic and you can swap out Hiera for a different data lookup tool, such as Jerakia, transparently, which makes it more acceptable to use this function in a module.  And there is the new data in modules feature which allows for Puppet modules to determine the lookup behaviour of the parameters it’s classes contain

I think this approach is definitely going in the right direction, and it solves the problem of overriding behaviour for parameterised classes.

Jerakia takes a new approach and provides a new layer of logic called schemas.  When Jerakia receives a lookup request for a key, it first performs a search within the schema for the key and if found, will override lookup and merge behaviour based on what is defined in the schema.

The advantage of using schemas is that a user can download a module from the forge and override the lookup behaviour of the keys without modifying any of the Puppet code or adding anything Puppet specific to the data source.

How schemas work

Controlling lookup behaviour

When a request for a lookup key is recieved by Jerakia, it first performs a lookup against the schema.   It currently uses the in built file datasource to perform a separate lookup, but the source of data read by this lookup is different to the main lookup.  By default, Jerakia will search for a JSON (or YAML) file with a name corresponding to the namespace of the request in /var/lib/jerakia/schema.  Within this JSON document it searches for the key corresponding to the lookup key requested.  The data returned can override the lookup behaviour.  For example;

The above example will override a lookup for the key sysadmins in the namespace accounts (accounts::sysadmins) to be a cascading search merging the results into an array (just like hiera_array())

The big advantage here is that this data is separated from our actual configuration data, which could be in a YAML file structure, database, REST API endpoint…etc.

Using schema aliases

Another feature of schemas is the ability to create pseudo keys and namespaces that can be looked up and mapped to other keys within the data.   Schemas have the ability to override the namespace and key part of a request on the fly.  As a very hypothetical example, let’s say you have an array of domains in your data defined as webserver::domains eg:

If you need the same data to populate the vhosts parameter of a class called apache you could simply alias this in the schema rather than declaring the data twice or performing lookups from within Puppet…

The above schema entry will mean that lookups for apache::vhosts and webserver::domains will return the same data set.

You can also use a combination of aliases and lookup overrides to declare a  psuedo key to lookup data in a different way.

Here we have created two pseudo keys, security::firewall_rules and security::all_firewall_rules, both of which alias to the firewalld::rich_rules data set but will be looked up in different ways.  The security namespace itself may not even exist in the actual data set.

Future plans for schemas

The current implementation of schemas is fairly basic.  I see this as being quite a fundamental part of Jerakia in the future and it’s an area that could see functionality such as sub-lookups, views and even light “stored procedure” type functions to add some powerful functionality to data lookups whilst keeping the actual source of data in it’s purest form thus not stifling innovation of data source back ends.

Although currently limited to searching JSON or YAML files, schema searches are actually done with Jerakia lookups, the same functionality that does a regular lookup, so it should be trivial to allow users a lot more flexibility in how schema searches are done by using custom data sources and policies in future releases.

Want to know more?

Check out the Jerakia docs for how to configure the behaviour of schemas and more on how to use them.

Next up…

Puppet 4.0 delivered some great new functionality around data lookups, including environment data providers and the internal lookup functions that I feel will go really well with Jerakia.  I’m currently working on integration examples and a new environment data provider for Jerakia that will be available soon.


Follow and share if you liked this

Related Post


  1. Hi Craig, Thanks for nice explanation about Jerakia’s schemas. I found some parts a bit confusing though:

    1. In the text after the first example code (accounts.json) I think the names of the namespace and key are mixed up. i.e. accounts::sysadmin instead of sysadmin::accounts

    2. In the example code of aliases (apache.json) it should probably say key: “domains” not “vhosts”.

Leave a Reply

Your email address will not be published.