Designing modules in a Puppet 4 world

Puppet 4.0 has been around for a while now, and most of it’s language features have been around even longer using the future parser option in later versions of the 3.x series. Good module design has been the subject of many talks, blog posts and IRC discussions for years and although best practice has evolved with experience, fundamentally things haven’t changed that much for a long time… Until now.

Puppet 4.x introduces some of the biggest changes to the Puppet language and module functionality in a single release. Some of these new features dramatically change how you should be designging Puppet module to leverage the the most power out of Puppet. After diving in deep with Puppet 4 recently I wanted to write about some things that I see as the most fundamental and beneficial changes in how modules are now written in the new age.

Type Casting

Is it a bird?, is it a plane?… no it’s a string. The issue of type casting has been a long standing irritation with Puppet. Theres a lot of insanity to be found with type casting (or lack thereof) within Puppet that I won’t go over here, but Stephen Johnson covered this really well in his talk at Puppet Camp Melbourne a while back. Puppet’s inability to natively enforce basic data types was confusing enough within the scope of Puppet itself, but when you add in ERB templates and functions these issues bleed over to the Ruby world which is much less forgiving of Puppets whimsical attitude to what a variable actually is and there have been countless problems as a result. Fortunately, Puppet 4.0 has finally come to address this shortcoming and this really is a great leap forward for the Puppet language.

Previously we would create a class with parameters, random variables that might be any number of types. And then, if we were being careful we could use the validate_* functions from stdlib to check that the provided values met the required type, or we could just blindly hope for the best. Now Puppet 4.0 has an in-built way of doing this by being able to define the types of variables that a parameterized class accepts. Eg:

For the most part I’ve found this to be an excellent addition, it’s much easier to define your data types when writing classes and seeing what types a class supports has become a whole lot more readable and removes the need to use external functions to validate your data. There still seems to be some quirky behaviour with it though that makes me wonder how solid these data types are under the hood, the difference between a string and an integer isn’t as well enforced as you might expect in other languages, take this example;

[Edit: see comments]. But quirks aside, using data types in your class will help with validation and readability of your module and you should always use them

Data

Back in the early days of Puppet managing your site specific data was a nightmare involving hard coded variables and elaborate coding patterns (a.k.a nasty hacks) that lead to non-reusable un-sharable modules, each organisation maintaining their own copy of modules to manage software. Then in 2011, Hiera was born and all that changed. The ability to maintain data separate from Puppet code gave way to a new generation of Puppet modules that could be shared and used without modifying the code and the Puppet Forge flourished as a result with dependable and maintainable modules. It wasn’t long before Hiera was incorporated officially into Puppet core and Puppet released the data binding features which automatically lookup class parameters, so by simply declaring include foo any parameters of the foo class will be automatically looked up from Hiera. This lead to a design pattern that became very popular with module authors to take advantage of the data binding features of hiera and give the module author some degree of flexibility with setting dynamic default values. The pattern is commonly known as the params pattern. In short, it works by a base class inheriting a class called params and setting all of the base classes parameter defaults to variables defined within the params class, eg:

This widely adopted pattern gives the implementor of the module the ability to override settings directly from hiera by setting foo::some_setting but also gives the module author more flexibility to dynamically set intelligent defaults. I don’t think it’s a terrible pattern, and it’s certainly the cleanest way of designing a module in Puppet 3, but it can get complicated with data nested in deep conditionals in your class. We already have a much better proven way of handling data in Puppet using Hiera, so why can’t we adopt this same approach with module defaults? This was an idea first proposed by R.I Pienaar, original author of Hiera, who went on to release a POC for it. The idea was solid and made a lot of sense, and now Puppet have adopted this approach natively in 4.3. Puppet modules can now ship with their own self-contained Hiera data in the module, which Puppet will use as a fall back if no other user-defined data is found (eg: your regular Hiera data). So using data in modules our class now looks like:

We can then enable a module-specific Hiera configuration to read the variable defaults from ./data

Class parameter defaults can now be easily read in a familiar Hiera layout

I’m really only touching on the surface of the new data in modules additions here, there are plenty more really cool features available, including using Puppet functions to provide data defaults. R.I Pienaar wrote an excellent article covering this in a bit more detail and the offical puppet documentation explains things in depth.

The main advantage of using data in modules over params.pp is the ability to store your module parameter defaults in an easy-to-read style and maintains a degree of consistency between how module data and other site data is handled and avoids the need for cumbersome in-code setting of data, such as the params.pp pattern. I love the fact I can download a third party module off the forge and easily look through all of the configured defaults in a nicely formatted YAML hierarchy rather than trawling through a mish mash of conditional logic trying to understand how a variable gets set. This feature is a major plus.

Iteration

The final enhancement to the Puppet language I want to focus on in this post is iteration. The idea of iteration in Puppet was something that historically I used to be against. People have been asking for loops in Puppet since it first got released, I always felt that it wasn’t necessary and detracted from the declarative principles of Puppet. I like to think that I was half right though, most people wanted iteration because they didn’t understand the declarative aspects of Puppet and if they thought differently about their problem they would realise that 99% of the time defined resources were a perfectly fitting solution. Puppet, and how people use it, has changed since I forged those opinions. In the new age of data separation where we try and avoid hard coding site specific data inside Puppet code, that data now lives in a structured format inside Hiera and we need a way to take that data model and manage Puppet resources from it.

The current well adopted method of doing this is to use the create_resources() function in Puppet. The idea is simple enough, the function takes a resource type as an argument followed by a hash where the keys represent resource titles and the values are nested hashes containing the attributes for those resources and voila, it creates Puppet resources in your catalog

Is a more dynamic, data driven way of declaring;

I have a love/hate relationship with create_resources(). It feels like a sticking plaster designed to hammer Puppet into doing something it fundamentally wasn’t designed to do, but on the other hand, in the absence of any other solution I couldn’t have survived without it. It’s also fairly restrictive solution, in the ideal world of data separation I should be able to model my data in the best possible way to represent the data, which may look quite different from how they are represented as Puppet resources. The create_resources() pattern offers no way to filter data or to munge the structure of a hash for example. Take the following example;

If I want to model the above modified data structure as user resources with the UID’s corresponding to the value of each element, I cannot do this with create_resources() directly managing the user resources since the function expects the hash to be representative of resource titles and attributes. I could parse this with a function to munge the data first, or I could write an elaborate but messy defined resource type to do it, but neither of these seems ideal. Examples like this prove that Puppet desperately needs more, and now (some would say FINALLY) it’s arrived in the form of iterators and loops natively included in the language. Using iteration, we now have a lot more flexibility in scenarios such as this and I can easily solve the above dilema with a very simple iterator;

Despite my initial reluctance about iteration some years ago, it’s obvious now that this is a huge improvement to the language and bridges the gap between modelling data and declaring resources. I don’t however agree with some camps that say defined resources are obsolete, I think they have a genuine place in the world. If you are managing a bunch of resources from a hash it is still going to be cleaner in some circumstances to have a defined resource type to model that behaviour rather than declaring lots of resources within an iterator loop, but having iterators gives authors the ability to chose which method best achieves their objectives in the cleanest way.

Summary

I have touched on three major changes that will change how people write modules going forward, and there are plenty more great features in 4.x worthy of discussion that I will go into in future posts. But I feel the above changes are the most relevant when it comes to module design patterns and they offer a huge improvement in the module quality and functionality. In a nutshell;

  • validate functions (for basic types) are dead. Validate your parameters using native types
  • params.pp is dead. Use data in modules
  • create_resources() is dead. Use iterators
Follow and share if you liked this