Hierarchical Data Lookups with Ansible

A new lookup plugin for Ansible that uses Jerakia to do hierarchical data lookups right from your playbook.

Jerakia is an open source, highly flexible data lookup tool for performing hierarchical data lookups from numerous data sources. It’s aim is to be a stand alone tool that can integrate with a wide variety of tools. In this post I will examine how to integrate it with Ansible to be able to perform powerful hierarchical lookups right from your playbook.

Hierarchical lookups

I recently wrote an in-depth look at Hierarchical data lookups. What we mean by hierarchical lookups is essentially a key/value data lookup that traverses a hierarchy of queries until it finds an answer. This enables us to define data at a global level and then override the values at different points of our hierarchy depending on the scope (ansible facts) returned from the node, this model of lookup is particularly suited to infrastructure configuration. The hierarchy is configurable based on whatever is appropriate for your environment, you may for example define variables at the global level and then override them depending on the operational environment, location or role of the requestor.

The difference between this and traditional lookups, or dynamic inventories, is that you do not have to collate and organise the data itself against each host prior to using it. For example if you are overriding a setting based on environment and location you do not have to build a data set that defines which hosts are in which environment or location, that data is already available in the facts that Ansible gathers at runtime, and is used dynamically when performing a lookup against the hierarchy tree.

A simple Ansible playbook

Let’s start with a simple Ansible playbook to manage NTP. I’m not sure why everyone uses NTP as the go-to configuration management example, but who am I to argue with such a well established convention. In this example we’re going create a playbook that manages the servers defined in /etc/ntp.conf.

Here is our playbook:

And our corresponding template:

If I run this, all my hosts get an identical ntp.conf:

Overriding the behaviour

Depending on certain characteristics of the node we are configuring, we may want to configure different NTP servers. In this example I want to retain these defaults but easily override these values depending on the operating environment that a host is in, and further more to be able to further override that for a specific host if I counter a one off edge case.

Ansible facts

For this example, we are going to work with the ansible fact ansible_nodename and a custom fact called environment that is available on the nodes in facts.d;

Ansible lookups

Ansible has several sources of runtime data, the integration with Jerakia for performing hierarchical lookups is currently possible by a lookup plugin (ansible-jerakia). Lookup plugins can be used within Ansible playbooks to call out to an external data source, in this case Jerakia, to populate variables with data. With Jerakia, we pass on information about the node in the lookup request, this information is gathered dynamically from the nodes facts. We refer to this collection of information as the scope. Rather than saying lookup the value of this key, we are now saying Lookup the value of this key in the context of this scope

Configuring Jerakia

I’m going to assume you already have a Jerakia instance running, and for simplicity it is on localhost and the port is default.

When a lookup request for a key is received by Jerakia, it uses one or more lookups contained in a policy to search for the data, policy files are Ruby based and live in the configured policy.d folder. Let’s start with a fairly basic Jerakia policy that defines a global layer and then overrides on environment and the specific hostname of the machine. Something like

Using this configuration, when Jerakia receives a lookup request, it will first check to see if there is any matching data at the hostname level specific to that node, if not it will fall down to the next level and check if there is any matching data in the corresponding environment that the node belongs to before finally falling down to the “global” layer.

Finally, you should create a token so that Ansible can authenticate against the Jerakia API

Integrating Ansible with Jerakia

Once you’ve installed the lookup plugin into the correct location, you should create a jerakia.yaml file at the root of your playbook. In this file we specify the Jerakia server with authentication details, and also pass on the scope values we are going to need to perform lookups.

If you refer back to the Jerakia policy file above, you’ll see that in the scope section of our configuration here we are mapping Ansible facts to keys that will be available in the scope from within the Jerakia policy. Nested fact values can be referenced using the dot-notation as in the above example

Add some data to Jerakia

We’ll start off by adding our default NTP servers to Jerakia, at the global level so it applies to everything. We will create the key servers in the namespace ntp so under the data directory we need a directory called global and a YAML document inside that directory containing the keys within the ntp namespace.

We can now validate that this is correct by performing a Jerakia lookup on the command line

Look up data from the Ansible playbook

Now we have Jerakia running and the lookup plugin in place we can modify our playbook to lookup the value for the NTP servers array from Jerakia rather than hard coding it in the playbook. The Jerakia lookup plugin takes an argument of the namespace and the lookup key separated by a ‘/’.

If we run this playbook now, things should stay as they are, but the value of the NTP servers array is now coming from Jerakia.

Overriding data in the hierarchy

Now if we wish to keep these defaults for NTP servers across our estate, but we want to override these in the production environment we can do this simply by adding more data to the Jerakia hierarchy in the environment level. Jerakia will search for data within the environments directory under the sub-directory corresponding to the environment of the node. It will do this before hitting the global level, so any data defined here will win.

Now we can see on the command line that we’re able to get different results from Jerakia by feeding it a different environment

Now if I re-run my Ansible playbook from before, I should see that servers in the production environment (in this case ansible3) should be configured differently.

Summary

In our example we override based on environment and then hostname, but in your organisation this could be completely different, maybe it makes more sense to override on role, location, datacenter..etc. Building the right hierarchy to suit your infrastructure needs is crucially important.

This post touches the surface of Jerakia. It has many more advanced features, such as cascading lookups that traverse through a hierarchy and return a consolidated set of data, Vault integration for secrets management and a very powerful flexible configuration. Also, it’s pluggable so you can use the same interface to retrieve data from a variety of sources, including file based hierarchies, HTTP APIs and databases.

Jerakia is pretty established already but the integration with Ansible is still fairly new, any feedback on how to improve the integration would be greatly appreciated. More information on Jerakia can be found here and the Ansible lookup plugin can be downloaded from GitHub

Follow and share if you liked this

Understanding Hierarchical Data Lookups

An in-depth explanation of hierarchical data lookups and how it relates to infrastructure configuration

In this post I’m going to explain a concept called hierarchical data lookups. That’s a term that could mean different things in different contexts, so to be clear, although this is a pretty general concept I’m going to discuss hierarchical lookups how they relate to Jerakia, an open source data lookup tool. I’ll start by looking at what we mean by hierarchical lookups and then move on to how that is relevant to infrastructure management and other uses.

Defining hierarchical lookups

In essence, Jerakia is a tool that can be used to lookup a key value pair, that is to say, given a key it will give back the appropriate value.  This is certainly nothing special or new, but the crucial difference here is the way in which the data is looked up.  Rather than just querying a flat data source and returning the value for a requested key, when doing a hierarchical lookup we perform multiple queries against a configured hierarchy, transcending down to the next layer in the hierarchy until we find an answer.    The end result is that we can define key value pairs on a global basis but then override them under certain conditions based on the hierarchical resolution by placing that key value pair further up the hierarchy for a particular condition. Let’s look at a fairly simple example, you need to determine from a data lookup what currency you need to bill a user coming into your site.  You already have data that tells you which country and continent that your user is based in.  You determine that to start with you will bill everyone in USD regardless of where they come from.  So you store the key currency with a value of USD in a data source somewhere, and whenever a user starts a transaction, you look up that key, and they get billed in USD.

Now comes the fun part.  You decide that you would like to now start billing customers from European countries in EUR.  Since you already know the continent your user is coming from you could add another key to your data store and then use conditional logic within your code to determine which key to look up, but now we’re adding complexity within the code implementing conditional logic to determine how to resolve the correct value.  This is the very thing that hierarchical lookups aim to solve, to structure the data and perform the lookups in such a way that is transparent to the program requesting the data.

Lets add another layer of complexity, you’ve agreed to use EUR for all users based in Europe, but you must now account for the UK and Switzerland which deal in GBP and CHF respectively, and potentially more.  Now the demands for conditional logic on the program requesting the data are getting more complicated.  To avoid lots of very convoluted conditional logic in your code you could simply map every country in the world to a currency and look up one key, that would be the cleanest method right now.  But remember that we generally want to use USD for everyone and only care about changing this default under certain circumstances.  If we think about this carefully, we have a hierarchy of importance.  The country (in the case of UK or Switzerland), the continent in the case of Europe and then the rest of the world.  This is where a hierarchical lookup simplifies the management of this data.  The hierarchy we need to search here is quite simple;

When we’re dealing with storing data for hierarchical searches, we end up with a tiered hierarchy that looks something like this.  A lookup request to Jerakia contains two things, they key that is being looked up, in this case “currency”, and some data about the context of the request which Jerakia refers to as the scope.  In this instance the scope contains the country and continent of the user. The scope and the key are used together, so when we are talking about hierarchical lookups, rather than just saying “return the value for this key” we are saying “return the value for this key in the context of this scope”.  That’s the main distinction between a normal key value lookup and a hierarchical lookup.  If you’re thinking of ways to do this in a structured query language (SQL) or some other database API, you might be ok to solve this problem – but this is a stripped down example looking up one value, now imagine we throw in tax parameters, shipping costs, and other fun things into the mix – this becomes a complex beast – but not when we think of this as a simple hierarchy.

With a hierarchical data source we can declare a key value pair at the bottom level, in this case Worldwide.  We can set that to USD, at this point any lookup request for the currency will return USD.  But a hierarchical data source allows us to add a different value for the key “currency” at a different level of the hierarchy, for example we can add a value of EUR at the continent level that will only return that value if the continent is Europe.  We can then add separate entries right at the top of the hierarchy for the UK and Switzerland, for requests where the country meets that criteria.

From our program we are still making one lookup request for the data, but that data is looked up using a series of queries behind the scenes to resolve the right data.   Essentially the lookup will trigger up to three queries.  If one query doesn’t return an answer (because there is nothing specific configured in that level of the hierarchy) then it will fall back to the next level, and keep going until it hits an answer, eventually landing at the last level, Worldwide in our example.   So a typical lookup for the currency of a user would be handled as;

What is the value for currency for this specific country?
What is the value for currency for this specific continent?
What is the value for currency for everything worldwide?

Whichever level of the hierarchy responds first will win, meaning that if a user from China will get a value of USD – because we haven’t specified anything for Asia or China on the continent or country levels of the hierarchy so the lookup will fall through to our default set at the “worldwide” level.  However, at the continent level of the hierarchy we specified an override of EUR for requests where the continent of the requestor is Europe, so users from Germany, France and Spain would get EUR.  This wouldn’t be the case for the UK or Switzerland though because we’ve specifically overridden this at the country level, which is higher in the hierarchy so will win over the continent that the country belongs to.

So hierarchical lookups are generally about defining a value at the widest possible catchment (eg: worldwide) and moving up the hierarchy overriding that value at the right level.

What is key here is that rather than implementing three levels of conditional logic in our code, or mapping the lowest common denominator (country) one to one with currencies for every country in the world (remember in some cases we may not be able to identify the lowest common denominator) we have found a way to express the data in a simple way and provide one simple path to looking up the data.  Our program still makes one request for the key currency, the logic involved in resolving the correct value is completely transparent.

In this case, we had a scope (the country and continent of the requestor) and a hierarchy to search against that uses both elements of the scope and then falls back to a common catch all.

Applying this to infrastructure management

Jerakia is standalone and can be used for any number of applications that can make use of a hierarchical type of data lookup, but it was originally built with configuration management in mind.  Infrastructure data lends itself incredibly well to this model of data lookup.  Infrastructure data tends to consist of any number of configurable attributes that are used to drive your infrastructure.  These could be DNS resolvers, server hostnames, IP addresses, ports, API endpoints…. there is a ton of stuff that we configure on our infrastructures, but most of it is hierarchical.  Generally speaking a lot of infrastructure data starts off with a baseline default, for example, what DNS resolver to use.  That could be a default value thats used across the whole of your company and you add that as a key value pair to a datastore.  Then you find yourself having to override that value for systems configured in your development environment because that environment can’t connect to the production resolvers on your network, you then may deploy your production environment out to a second data centre and you need that location to be different.  But we are still dealing with simple hierarchies, so rather than programming conditionals to determine the resolution path of a DNS resolver we could build a simple hierarchy that best represents our infrastructure, such as;

When dealing with a hierarchy like this, a data lookup must give us a key to lookup and  contain a scope that tells us the hostname, environment and location of the request. Using the same principles as before our lookup will make up to 4 queries;

What is the DNS resolver for my particular hostname?
What is the DNS resolver for machines that are in my environment?
What is the DNS resolver for machines that are in my location?
What is the DNS resolver for everyone else?

Again, this is hierarchical search pattern that will stop at the first query to return an answer and return that value.  We can set our global parameters and then override them at the areas we care about.  We’ve even got a top level hierarchy entry for that one edge case special snowflake server that is different from everything else on the network, but the lookup method is identical and transparent to the application requesting the data.

Jerakia

I’ve tried to give a generic overview of hierarchical lookups, but in particular as they relate to Jerakia.  Jerakia has way more features that build on top of this principle, like cascading lookups which don’t stop at the first result and will build a combined data structure (HashMap or Array) from all levels of the hierarchy and return a unified result based on the route taken through the hierarchy, and I’ll cover those in a follow up post.  It’s also built to be extremely flexible and pluggable allowing you to source your data from pretty much anywhere and ships with an HTTP API meaning you can integrate Jerakia with any tool regardless of the underlying language.

Our focus has been very much in the configuration management space particularly integration with Puppet and Hiera, and more recently a lookup plugin for Ansible.  But Jerakia could be used for any number of applications where data needs to be organised hierarchically without introducing logic into the code.

It’s an open source project, please feel free to contribute or give feedback on the GitHub site

Follow and share if you liked this