Designing Puppet – Roles and Profiles.

Update, Feb 15th.
Since writing this post some of the concepts have become quite popular and have generated quite a lot of comments and questions in the community. I recently did I talk at Puppet Camp Stockholm on this subject and hopefully I might have explained it a bit better there than I did below :-). The slides are available here and the video can be viewed here;
Introduction
So you’ve installed Puppet, downloaded some forge modules, probably written a few yourself too. So, now what? You start applying those module to your nodes and you’re well on your way to super-awesomeness of automated deployments. Fast forward a year or so and your infrastructure has grown considerably in size, your organisations business requirements have become diverse and complex and your architects have designed technical solutions to solve business problems with little regard for how they might actually be implemented. They look great in the diagrams but you’ve got to fit in to Puppet. From personal experience, this often leads to a spell of fighting with square pegs and round holes, and the if statement starts becoming your go-to guy because you just can’t do it any other way. You’re probably now thinking its time to tear down what you’ve got and re-factor. Time to think about higher level design models to ease the pain.
There is a lot of very useful guidance in the community surrounding Puppet design patterns for modules, managing configurable data and class structure but I still see people struggling with tying all the components of their Puppet manifests together. This seems to me to be an issue with a lack of higher level code base design. This post tries to explain one such design model that I refer to as “Roles/Profiles” that has worked quite well for me in solving some off the more common issues encountered when your infrastructure grows in size and complexity, and as such, the requirements of good code base design become paramount.
The design model laid out here is by no means my suggestion on how people should design Puppet, it’s an example of a model that I’ve used with success before. I’ve seen many varied designs, some good and some bad, this is just one of them – I’m very interested in hearing other design models too. The point of this post is to demonstrate the benefits of adding an abstraction layer before your modules
What are we trying to solve
I’ve spent a lot of time trying to come up with what I see as the most common design flaws in Puppet code bases. One source of problems is that users spend a lot of time designing great modules, then include those modules directly to the node. This may work but when dealing with large and complex infrastructures this becomes cumbersome and you end up with a lot of node level logic in your manifests.
Consider a network consisting of multiple different server types. They will all share some common configuration, some subsets of servers will also share configuration while other configuration will be applicable only to that server type. In this very simple example we have three server types. A development webserver (www1) that requires a local mysql instance and PHP logging set to debug, a live webserver (www2) that doesn’t use a local mysql, requires memcache and has standard PHP logging, and a mail server (smtp1). If you have a flat node/module relationship with no level of abstraction then your nodes file starts to look like this:
node www1 {
include networking
include users
include tomcat
include jdk
include mysql
include memcache
include apache
class { "php":
loglevel => "debug"
}
}
node www2 {
include networking
include users
include tomcat
include jdk
include memcache
include apache
include php
}
node smtp1 {
include networking
include users
include exim
}
Note: if you’re already thinking about ENC’s this will be covered later
As you can see, the networking and users modules are universal across all our boxes, Apache, Tomcat and JDK is used for all webservers, some webservers have mysql and PHP logging options vary depending on what type of webserver it is.
At this point most people try and simplify their manifests by using node inheritance. In this very simple example that might be sufficient, but it’s only workable up to a point. If you’re environment grows to hundreds or even thousands of servers, made up over 20 or 30 different types of server, some with shared attributes and subtle differences, spread out over multiple environments, you will likely end up with an unmanagable tangled web of node inheritance. Nodes also can inherit only one other node, which will be restrictive in some edge cases.
Adding higher level abstraction
One way I have found to minimise the complexity of node definitions and make handling nuances between different server types and edge case scenarios a lot easier is to add a layer (or in this case, two layers) of seperation between my nodes and the modules they end up calling. I refer to these as roles and profiles.
Consider for a moment how you would represent these servers if you weren’t writing a Puppet manifest. You wouldn’t say “www1 is a server that has mysql, tomcat, apache, PHP with debug logging, networking and users” on a high level network diagram. You would more likely say “www1 is a dev web server” so really this is all the information I want to be applying directly to my node.
So after analysing all our nodes we’ve come up with three distinct definitions of what a server can be. A development webserver, a live webserver and a mailserver. These are your server roles, they describe what the server represents in the real world. In this design model a node can only ever have one role, it cant be two things simultaneously. If your business now has an edge case for QA webservers to be the same as live servers, but incorporate some extra software for performance testing, then you’ve just defined another role, a QA Webserver.
Now we look at what a role should contain. If you were describing the role “Development webserver” you would likely say “A development webserver has a Tomcat application stack, a webserver and a local database server”. At this level we start defining profiles.
Unlike roles, which are named in a more human representation of the server function, a profile incorporates individual components to represent a logical technology stack. In the above example, the profile “Tomcat application stack” is made up of the Tomcat and JDK components, whereas the webserver profile is made up of the httpd, memcache and php components. In Puppet, these lower level components are represented by your modules.
Now our nodes definitions look a lot simpler and are representitive of their real world roles…
node www1 {
include role::www::dev
}
node www2 {
include role::www::live
}
node smtp1 {
include role::mailserver
}
Roles are simply collections of profiles that provide a sensible mapping between human logic and technology logic. In this scenario our roles may look something like:
class role {
include profile::base
}
class role::www inherits role {
# All WWW servers get tomcat
include profile::tomcat
}
class role::www::dev inherits role::www {
include profile::webserver::dev
include profile::database
}
class role::www::live inherits role::www {
include profile::webserver::live
}
class role::mailserver inherits role {
include profile::mailserver
}
Whether or not you choose to use inherited classes in the way I have done is up to you of course, some people stay clear of inheritence completely, others over use it. Personally I think it works for the purposes of laying out roles and profiles to minimise duplication.
The profiles included above would look something like the following
class profile::base {
include networking
include users
}
class profile::tomcat {
class { "jdk": }
class { "tomcat": }
}
class profile::webserver {
# Configuration for all webservers
class { "httpd": }
class { "php": }
class { "memcache": }
}
class profile::webserver::dev inherits profile::webserver {
Class["php"] {
loglevel => "debug"
}
}
class profile::webserver::live inherits profile::webserver {
# Any live webserver specific stuff here
}
class profile::database {
class { "mysql": }
}
class profile::mailserver {
class { "exim": }
}
In summary the “rules” surrounding my design can be simplified as;
- A node includes one role, and one only.
- A role includes one or more profiles to define the type of server
- A profile includes and manages modules to define a logical technical stack
- Modules manage resources
- Modules should only be responsible for managing aspects of the component they are written for
Let’s just clarify what we mean by “modules”
I’ve talked about profiles and roles like they are some special case and modules being something else. In reality, all of these classes can be, and should be modularised. I make a logical distinction between the profile and role modules, and everything else (e.g.: modules that provide resources).
Other useful stuff to do with profiles.
So far I’ve demonstrated using profiles as collections of modules, but it has other uses too. As a rule of thumb, I don’t define any resources directly from roles or profiles, that is the job for my modules. However, I do realise virtualised resources and occasionally do resource chaining in profiles which can solve problems that otherwise would have meant editing modules and other functionality that doesn’t quite fit in the scope of an individual module. Adding some of this functionality at the modular level will reduce the re-usability and portability of your module.
Hypothetically lets say I have a module, let’s call it foo for originalities sake. The foo module provides a service type called foo, in my implementation I have another module called mounts that declares some mount resource types. If I want all mount resource types to be initiated before the foo service is started as without the filesystems mounted the foo service will fail. I’ll go even further and say that foo is a Forge module that I really don’t want to (and shouldn’t have to) edit, so where do I put this configuration? This is where having the profiles level of abstraction is handy. The foo module is coded perfectly, it’s the use case determined from my own technology stack that is requiring that my mount points exists before the foo service, so since my stack is defined in the profile, this is where I should specify it. e.g.:
class profile::foo {
include mounts
include foo
Mount <| |> -> Service['foo']
}
It’s widely known that good modules are modules that you don’t need to edit. Quite often I see people reluctant to use Forge modules because their set up requires some peripheral set up or dependancies not included in the module. Modules exist to manage resources directly related to what they were written for. For example, someone may choose to edit a forge mysql module because their set up has dependancies on MMM being installed after MySQL (purely hypothetical). The mysql module is not the place to do this, mysql and mmm are separate entities and should be configured and contained within their own modules, tying the two together is something you’ve defined in your stack, so again, this is where you’re profiles come in…
class profile::database {
class { "mysql": }
class { "mmm": }
Package["mysql"] -> Package["mmm"]
}
This approach is also potentially helpful for those using Hiera. Although Hiera and Puppet are to become much more fused in Puppet 3.0, at the moment people writing forge modules have to make them work with Hiera or not, and people running Hiera have to edit the modules that aren’t enabled. Take a hypothetical module from the Forge called fooserver. This module exposes a paramaterized class that has an option for port, I want to source this variable from Hiera but the module doesn’t support it. I can add this functionality into the profile without the need for editing the module.
class profile::fooserver {
$fooport = hiera("fooserver_port")
class { "fooserver":
port => $fooport
}
}
What about using an ENC?
So you’re probaby wondering why I haven’t mentioned using an ENC (External Node Classifier). The examples above don’t use any kind of ENC, but the logic behind adding a layer of separation between your nodes and your modules is still the same. You could decide to use an ENC to determine which role to include to a node, or you could build/configure an ENC to perform all the logic and return the list of components (modules) to include. I prefer using an ENC in place of nodes definitions to determine what role to include and keep the actual roles and profiles logic within Puppet. My main reason for this is that I get far greater control of things such as resource chaining, class overrides and integration with things like Hiera at the profile level and this helps overcome some tricky edge cases and complex requirements.
Summary
None of the above is set in stone, what I hope I’ve demonstrated though is that adding a layer of abstraction in your Puppet code base design can have some significant benefits that will avoid pitfalls when you start dealing with extremely complex, diverse and large scale set ups. These include
- Reducing complexity of configuration at a node level
- Real-world terminology of roles improves “at-a-glance” visibility of what a server does
- Definition of logical technology stacks (profiles) gives greater flexibility for edge cases
- Profiles provide an area to add cross-module functionality such as resource chaining
- Modules can be granular and secular and tied together in profiles, thus reducing the need to edit modules directly
- Reduced code duplication
I use Hiera to handle all of my environment configuration data, which I won’t go into detail about in this post. So, at a high level my Puppet design can be represented as;
As I said previously, this is not a the way to design Puppet, but an example of one such way. The purpose of this post is to explore higher level code base design for larger and more complex implementations of Puppet, I would love to hear other design models that people have used either successfully or not and what problems it solved for you (or introduced :)) so please get in touch with your own examples.