Extending Puppet

Chapter 1. Puppet Essentials

There are moments in our professional life when we meet technologies that trigger an inner wow effect. We realize there's something special in them and we start to wonder how they can be useful for our current needs and, eventually, wider projects. Puppet, for me, has been one of these turning point technologies. I have reason to think that we might share a similar feeling.

If you are new to Puppet, you are probably starting from the wrong place, there are better fitting titles around to grasp its basic concepts.

This book won't indulge too much on the fundamentals, but don't despair, this chapter might help for a quick start. It provides the basic Puppet background needed to understand the rest of the contents and may also offer valuable information to more experienced users.

We are going to review the following topics:

The Puppet ecosystem, its components, history, and the basic concepts behind configuration management
How to install and configure Puppet commands and paths, to understand where things are placed
The core components and concepts; terms such as manifests, resources, nodes, and classes will become familiar
The main language elements—variables, references, resources defaults, ordering, conditionals, comparison operators, virtual and exported resources
How Puppet stores the changes it makes and how to revert them

The contents of this chapter are quite dense, so take your time to review and assimilate them if they sound new or look too complex; the path towards Puppet awareness is never too easy.

The Puppet ecosystem

Puppet is a configuration management and automation tool; we use it to install, configure, and manage components of our servers.

Initially written in Ruby, some parts were rewritten in version 4 in Clojure. Released with an open source license (Apache 2), it can run on any Linux distribution, many other UNIX variants (Solaris, *BSD, AIX, and Mac OS X), and Windows. Its development started in 2005 by Luke Kanies as an alternative approach to the existing configuration management tools (most notably CFEngine and BladeLogic). The project has grown year after year; Kanies' own company, Reductive Labs, renamed in 2010 to Puppet Labs, has received a total funding of $ 45.5 million in various funding rounds (among the investors there are names such as VMware, Google, and Cisco).

Now, it is one of the top 100 fastest growing companies in the US. It employs more than 150 people and it has a solid business based on open source software, consisting of consulting services, training, certifications, and Puppet Enterprise. Puppet Enterprise is the commercial version that is based on the same open source Puppet codebase, but it provides an integrated stack with lots of tools, such as a web GUI that improves and makes Puppet usage and administration easier, and more complete support for some major Linux distributions, Mac OS X, and Microsoft Windows Server.

The Puppet ecosystem features a vibrant, large, and active community, which discusses on the Puppet Users and Puppet Developers Google groups, on the crowded free node #puppet IRC channel, at the various Puppet Camps that are held multiple times a year all over the world, and at the annual PuppetConf, which is improving and getting bigger year after year.

Various software products are complementary to Puppet; some of them are developed by Puppet Labs:

Hiera: This is a key-value lookup tool that is the current choice of reference for storing data related to our Puppet infrastructure.
Mcollective: This is an orchestration framework that allows parallel execution of tasks on multiple servers. It is a separate project by Puppet Labs, which works well with Puppet.
Facter: This is a required complementary tool; it is executed on each managed node and gathers local information in key/value pairs (facts), which are used by Puppet.
Geppetto: This is an IDE, based on Eclipse that allows easier and assisted development of Puppet code.
Puppet Dashboard: This is an open source web console for Puppet.
PuppetDB: This is a powerful backend that can store all the data gathered and generated by Puppet.
Puppet Enterprise: This is the commercial solution to manage via a web frontend Puppet, Mcollective, and PuppetDB.

The community has produced other tools and resources. The most noticeable ones are:

The Foreman: This is a systems lifecycle management tool that integrates perfectly with Puppet.
PuppetBoard: This is a web front end for PuppetDB.
Kermit: This is a web front end for Puppet and Mcollective.
Modules: These are reusable components that allow management of any kind of application and software via Puppet.

Why configuration management matters

IT operations have changed drastically in the past few years. Virtualization, cloud, business needs, and emerging technologies have accelerated the pace of how systems are provisioned, configured, and managed.

The manual setup of a growing number of operating systems is no longer a sustainable option. At the same time, in-house custom solutions to automate the installation and the management of systems cannot scale in terms of required maintenance and development efforts.

For these reasons, configuration management tools such as Puppet, Chef, CFEngine, Rudder, Salt, and Ansible (to mention only the most known open source ones) are becoming increasingly popular in many infrastructures.

They show infrastructure as code, that allows, in systems management, the use of some of the same best practices in software development for decades, such as maintainability, code reusability, testability, or version control.

Once we can express the status of our infrastructure with versioned code, there are powerful benefits:

We can reproduce our setups in a consistent way, what is executed once can be executed any time, the procedure to configure a server from scratch can be repeated without the risk of missing parts
Our code commits log reflects the history of changes on the infrastructure; who did what, when, and if commits comments are pertinent, why.
We can scale quickly; the configurations we made for a server can be applied to all the servers of the same kind.
We have aligned and coherent environments; our Development, Test, QA, Staging, and Production servers can share the same setup procedures and configurations.

With these kinds of tools, we can have a system provisioned from zero to production in a few minutes, or we can quickly propagate a configuration change over our whole infrastructure automatically.

Their power is huge and has to be handled with care; as we can automate massive and parallelized setups and configurations of systems; we might automate distributed destructions.

With great power comes great responsibility.

Puppet components

Before diving into installation and configuration details, we need to clarify and explain some Puppet terminology to get the whole picture.

Puppet features a declarative Domain Specific Language (DSL), which expresses the desired state and properties of the managed resources.

Resources can be any component of a system, for example, packages to install, services to start, files to manage, users to create, and also custom and specific resources, such as MySQL grants, Apache virtual hosts, and so on.

Puppet code is written in manifests, which are simple text files with a .pp extension. Resources can be grouped in classes (do not consider them classes as in OOP, they aren't). Classes and all the files needed to define the configurations required are generally placed in modules, which are directories structured in a standard way that are supposed to manage specific applications or system's features (there are modules to manage Apache, MySQL, sudo, sysctl, networking, and so on).

When Puppet is executed, it first runs facter, a companion application, which gathers a series of variables about the system (IP address, hostname, operating system, and MAC address), which are called facts and are sent to the Master.

Facts and user-defined variables can be used in manifests to manage how and what resources to provide to clients.

When the Master receives a connection, then it looks in its manifests (starting from /etc/puppet/manifests/site.pp) what resources have to be applied for that client host, also called node.

The Master parses all the DSL code and produces a catalog, which is sent back to the client (in PSON format, a JSON variant used in Puppet). The production of the catalog is often referred to as catalog compilation.

Once the client receives the catalog, it starts to apply all the resources declared there; packages are installed (or removed), services started, configuration files created or changed, and so on. The same catalog can be applied multiple times, if there are changes on a managed resource (for example, a manual modification of a configuration file) they are reverted back to the state defined by Puppet; if the system's resources are already at the desired state, nothing happens.

This property is called idempotence and is at the root of the Puppet declarative model; since it defines the desired state of a system, it must operate in a way that ensures that this state is obtained whatever are the starting conditions and the number of times Puppet is applied.

Puppet can report the changes it makes on the system and audit the drift between the system's state and the desired state as defined in its catalog.

Installing and configuring Puppet

Puppet uses a client-server paradigm. Clients (also called agents) are installed on all the systems to be managed, and the server(s) (also called Master) is installed on a central machine(s) from where we control the whole infrastructure.

We can find Puppet's packages on most recent OS, either in the default repositories or in other ones maintained by the distribution or its community (for example, EPEL for Red Hat derivatives).

Starting with Puppet version 4.0, Puppet Labs introduced Puppet Collections. These collections are repositories containing packages that can be used between them. When using collections, all nodes in the infrastructure should be using the same one. As a general rule, Puppet agents are compatible with newer versions of Puppet Master, but Puppet 4 breaks compatibility with previous versions.

To look for the more appropriate packages for our infrastructure, we should use Puppet Labs repositories.

The server package is called puppetserver, so to install it we can use these commands:

apt-get install puppetserver # On Debian derivatives
yum install puppetserver # On Red Hat derivatives

And similarly, to install the agent:

apt-get install puppet-agent # On Debian derivatives
yum install puppet-agent # On Red Hat derivatives

Note

To install Puppet on other operating systems, check out http://docs.puppetlabs.com/guides/installation.html.

In versions before 4.0, the agent package was called puppet, and the server package was called puppetmaster on Debian-based distributions and puppet-server, on Red Hat, derivated distributions.

This will be enough to start the services with the default configuration, but the commands are not installed in any of the usual standard paths for binaries, we can find them under /opt/puppetlabs/puppet/bin, we'd need to add it to our PATH environment variable if we want to use them without having to write the full path.

Configuration files are placed in versions before 4.0 in /etc/puppet, and from 4.0 in /etc/puppetlabs/ as well as the configuration of other Puppet Labs utilities as Mcollective. Inside the puppetlabs directory, we can find the Puppet one, that contains the puppet.conf file; this file is used by both agents and server, and includes the parameters for some directories used in runtime and specific information for the agent, for example, the server to be used. The file is divided in [sections] and has an INI-like format. Here is the content just installed:

[master]
vardir = /opt/puppetlabs/server/data/puppetserver
logdir = /var/log/puppetlabs/puppetserver
rundir = /var/run/puppetlabs/puppetserver
pidfile = /var/run/puppetlabs/puppetserver/puppetserver.pid
codedir = /etc/puppetlabs/code

[agent]
server = puppet

A very useful command to see all the current client configuration settings is:

puppet config print all

The server has additional configuration in the puppetmaster/conf.d directory in files in the HOCON format, a human-readable variation of JSON format. Some of these files are as follows:

global.conf: This contains global settings; their defaults are usually fine
webserver.conf: This contains webserver settings, such as the port and the listening address
puppetserver.conf: This contains the settings used by the server
ca.conf: This contains the settings for the Certificate Authority service

Configurations in previous files refer to other files on some occasions, which are also important to know when we work with Puppet:

Logs: They are in /var/log/puppetlabs (but also on normal syslog files, with facility daemon), both for agents and servers
Puppet operational data: This is placed in /opt/puppetlabs/server/data/puppetserver
SSL certificates: They are stored in /opt/puppetlabs/puppet/ssl. By default, the agent tries to contact a server hostname called puppet, so either name our server puppet.$domain or provide the correct name in the server parameter.
Agent certificate name: When the agent communicates with the server, it presents itself with its certname (is also the hostname placed in its SSL certificates). By default, the certname is the fully qualified domain name (FQDN) of the agent's system.
The catalog: This is the configuration fetched by the agent from the server. By default, the agent daemon requests that every 30 minutes. Puppet code is placed, by default, under /etc/puppetlabs/code.
SSL certificate requests: On the Master, we have to sign each client's certificates request (manually by default). If we can cope with the relevant security concerns, we may automatically sign them by adding their FQDNs (or rules matching them) to the autosign.conf file, for example, to automatically sign the certificates for node.example.com and all nodes whose hostname is a subdomain for servers.example.com:
```
node.example.com
*.servers.example.com
```

However, we have to take into account that any server can request configuration with any FQDN so this is potentially a security flaw.

Puppet in action

Client-server communication is done using REST-like API calls on a SSL socket, basically it's all HTTPS traffic from clients to the server's port 8140/TCP.

The first time we execute Puppet on a node, its x509 certificates are created and placed in ssldir, and then the Puppet Master is contacted in order to retrieve the node's catalog.

On the Puppet Master, unless we have autosign enabled, we must manually sign the clients' certificates using the cert subcommand:

puppet cert list # List the unsigned clients certificates
puppet cert list --all # List all certificates
puppet cert sign <certname> # Sign the given certificate

Once the node's certificate has been recognized as valid and been signed, a trust relationship is created and a secure client-server communication can be established.

If we happen to recreate a new machine with an existing certname, we have to remove the certificate of the old client from the server:

puppet cert clean  <certname> # Remove a signed certificate

At times, we may also need to remove the certificates on the client; a simple move command is safe enough:

mv /etc/puppetlabs/puppet/ssl /etc/puppetlabs/puppet/ssl.bak

After that, the whole directory will be recreated with new certificates when Puppet is run again (never do this on the server—it'll remove all client certificates previously signed and the server's certificate, whose public key has been copied to all clients).

A typical Puppet run is composed of different phases. It's important to know them in order to troubleshoot problems:

Execute Puppet on the client. On a root shell, run:
```
puppet agent -t
```
If pluginsync = true (default from Puppet 3.0), then client retrieves any extra plugin (facts, types, and providers) present in the modules on the Master's $modulepath client output with the following command:
```
Info: Retrieving pluginfacts
Info: Retriving plugin
```
The client runs facter and sends its facts to the server client output:
```
Info: Loading facts in /var/lib/puppet/lib/facter/... [...]
```
The server looks for the client's certname in its nodes list.
The server compiles the catalog for the client using its facts. Server logs as follows:
```
Compiled catalog for <client> in environment production in 8.22 seconds
```
If there are syntax errors in the processed Puppet code, they are exposed here and the process terminates; otherwise, the server sends the catalog to the client in the PSON format. Client output is as follows:
```
Info: Caching catalog for <client>
```
The client receives the catalog and starts to apply it locally. If there are dependency loops, the catalog can't be applied and the whole run fails. Client output is as follows:
```
Info: Applying configuration version '1355353107'
```
All changes to the system are shown on stdout or in logs. If there are errors (in red or pink, depending on Puppet versions), they are relevant to specific resources but do not block the application of the other resources (unless they depend on the failed ones). At the end of the Puppet run, the client sends a report of what has been changed to the server. Client output is as follows:
```
Notice: Applied catalog in 13.78 seconds
```
The server sends the report to a report collector if enabled.

Resources

When dealing with Puppet's DSL, most of the time, we use resources as they are single units of configuration that express the properties of objects on the system. A resource declaration is always composed of the following parts:

type: This includes package, service, file, user, mount, exec, and so on
title: This is how it is called and may be referred to in other parts of the code

Zero or more attributes:

type { 'title':
  attribute  => value,
  other_attribute => value,
}

Inside a catalog, for a given type, there can be only one title; there cannot be multiple resources of the same type with the same title, otherwise we get an error like this:

Error: Duplicate declaration: <Type>[<name>] is already declared in file <manifest_file> at line <line_number>; cannot redeclare on node <node_name>.

Resources can be native (written in Ruby), or defined by users in Puppet DSL.

These are examples of common native resources; what they do should be quite obvious:

  file { 'motd':
    path    => '/etc/motd',
    content => "Tomorrow is another day\n",
  }

  package { 'openssh':
    ensure => present,
  }

  service { 'httpd':
    ensure => running, # Service must be running
    enable => true,    # Service must be enabled at boot time
  }

For inline documentation about a resource, use the describe subcommand, for example:

puppet describe file

Note

For a complete reference of the native resource types and their arguments check: http://docs.puppetlabs.com/references/latest/type.html

The resource abstraction layer

From the previous resource examples, we can deduce that the Puppet DSL allows us to concentrate on the types of objects (resources) to manage and doesn't bother us on how these resources may be applied on different operating systems.

This is one of Puppet's strong points, resources are abstracted from the underlying OS, we don't have to care or specify how, for example, to install a package on Red Hat Linux, Debian, Solaris, or Mac OS, we just have to provide a valid package name. This is possible thanks to Puppet's Resource Abstraction Layer (RAL), which is engineered around the concept of types and providers.

Types, as we have seen, map to an object on the system. There are more than 50 native types in Puppet (some of them applicable only to specific OSes), the most common and used are augeas, cron, exec, file, group, host, mount, package, service, and user. To have a look at their Ruby code, and learn how to make custom types, check these files:

ls -l $(facter rubysitedir)/puppet/type

For each type, there is at least one provider, which is the component that enables that type on a specific OS. For example, the package type is known for having a large number of providers that manage the installation of packages on many OSes, which are aix, appdmg, apple, aptitude, apt, aptrpm, blastwave, dpkg, fink, freebsd, gem, hpux, macports, msi, nim, openbsd, pacman, pip, pkgdmg, pkg, pkgutil, portage, ports, rpm, rug, sunfreeware, sun, up2date, urpmi, yum, and zypper.

We can find them here:

ls -l $(facter rubysitedir)/puppet/provider/package/

The Puppet executable offers a powerful subcommand to interrogate and operate with the RAL: puppet resource.

For a list of all the users present on the system, type:

puppet resource user

For a specific user, type:

puppet resource user root

Other examples that might give glimpses of the power of RAL to map systems' resources are:

puppet resource package
puppet resource mount
puppet resource host
puppet resource file /etc/hosts
puppet resource service

The output is in the Puppet DSL format; we can use it in our manifests to reproduce that resource wherever we want.

The Puppet resource subcommand can also be used to modify the properties of a resource directly from the command line, and, since it uses the Puppet RAL, we don't have to know how to do that on a specific OS, for example, to enable the httpd service:

puppet resource service httpd ensure=running enable=true

Nodes

We can place the preceding resources in our first manifest file (/etc/puppetlabs/code/environments/production/manifests/site.pp) or in the form included there and they will be applied to all our Puppet managed nodes. This is okay for quick samples out of books, but in real life things are very different. We have hundreds of different resources to manage, and dozens, hundreds, or thousands of different systems to apply different logic and properties to.

To help organize our Puppet code, there are two different language elements: with node, we can confine resources to a given host and apply them only to it; with class, we can group different resources (or other classes), which generally have a common function or task.

Whatever is declared in a node, definition is included only in the catalog compiled for that node. The general syntax is:

   node $name [inherits $parent_node] {
  [ Puppet code, resources and classes applied to the node ]
}

$name is the certname of the client (by default its FQDN) or a regular expression; it's possible to inherit, in a node, whatever is defined in the parent node, and, inside the curly braces, we can place any kind of Puppet code: resources declarations, classes inclusions, and variable definitions. An example is given as follows:

node 'mysql.example.com' {

  package { 'mysql-server':
    ensure => present,
  }
  service { 'mysql':

    ensure => 'running',
  }
}

But generally, in nodes we just include classes, so a better real life example would be:

node 'mysql.example.com' {
  include common
  include mysql
}

The preceding include statements that do what we might expect; they include all the resources declared in the referred class.

Note that there are alternatives to the usage of the node statement; we can use an External Node Classifier (ENC) to define which variables and classes assign to nodes or we can have a nodeless setup, where resources applied to nodes are defined in a case statement based on the hostname or a similar fact that identifies a node.

Classes and defines

A class can be defined (resources provided by the class are defined for later usage but are not yet included in the catalog) with this syntax:

class mysql {
  $mysql_service_name = $::osfamily ? {
    'RedHat' => 'mysqld',
    default  => 'mysql',
  }
  package { 'mysql-server':
    ensure => present,
  }
  service { 'mysql':
    name => $mysql_service_name,
    ensure => 'running',
  }
  […]
}

Once defined, a class can be declared (the resources provided by the class are actually included in the catalog) in multiple ways:

Just by including it (we can include the same class many times, but it's evaluated only once):
```
include mysql
```
By requiring it—what makes all resources in current scope require the included class:
```
require mysql
```
Containing it—what makes all resources requiring the parent class also require the contained class. In the next example, all resources in mysql and in mysql::service will be resolved before exec:
```
class mysql {
  contain mysql::service
  ...
}

include mysql
exec { 'revoke_default_grants.sh':
  require => Class['mysql'],
}
```
Using the parameterized style (available since Puppet 2.6), where we can optionally pass parameters to the class, if available (we can declare a class with this syntax only once for each node in our catalog):
```
class { 'mysql':
  root_password => 's3cr3t',}
```

A parameterized class has a syntax like this:

class mysql (
  $root_password,
  $config_file_template = undef,
  ...
) {
  […]
}

Here, we can see the expected parameters defined between parentheses. Parameters with an assigned value have it as their default, as it is here. The case of undef for the $config_file_template parameter.

The declaration of a parameterized class has exactly the same syntax of a normal resource:

class { 'mysql':
  $root_password => 's3cr3t',
}

Puppet 3.0 introduced a feature called data binding; if we don't pass a value for a given parameter, as in the preceding example, before using the default value, if present, Puppet does an automatic lookup to a Hiera variable with the name $class::$parameter. In this example, it would be mysql::root_password.

This is an important feature that radically changes the approach of how to manage data in Puppet architectures. We will come back to this topic in the following chapters.

Besides classes, Puppet also has defines, which can be considered classes that can be used multiple times on the same host (with a different title). Defines are also called defined types, since they are types that can be defined using Puppet DSL, contrary to the native types written in Ruby.

They have a similar syntax to this:

define mysql::user (
  $password,                # Mandatory parameter, no defaults set
  $host      = 'localhost', # Parameter with a default value
  [...]
 ) {
  # Here all the resources
}

They are used in a similar way:

mysql::user { 'al':
  password => 'secret',
}

Note that defines (also called user defined types, defined resource type, or definitions) like the preceding one, even if written in Puppet DSL, have exactly the same usage pattern as native types, written in Ruby (such as package, service, file, and so on).

In types, besides the parameters explicitly exposed, there are two variables that are automatically set. They are $title (which is the defined title) and $name (which defaults to the value of $title) and can be set to an alternative value.

Since a define can be declared more than once inside a catalog (with different titles), it's important to avoid to declare resources with a static title inside a define. For example, this is wrong:

define mysql::user ( ...) {
  exec { 'create_mysql_user':
    [ … ]
  }
}

Because, when there are two different mysql::user declarations, it will generate an error like:

Duplicate definition: Exec[create_mysql_user] is already defined in file /etc/puppet/modules/mysql/manifests/user.pp at line 2; cannot redefine at /etc/puppet/modules/mysql/manifests/user.pp:2 on node test.example42.com

A correct version could use the $title variable which is inherently different each time:

define mysql::user ( ...) {
  exec { "create_mysql_user_${title}":
    [ … ]
  }
}

Class inheritance

We have seen that in Puppet classes are just containers of resources that have nothing to do with Object Oriented Programming classes so the meaning of class inheritance is somehow limited to a few specific cases.

When using class inheritance, the parent class (puppet in the sample below) is always evaluated first and all the variables and resource defaults sets are available in the scope of the child class (puppet::server).

Moreover, the child class can override the arguments of a resource defined in the parent class:

class puppet {
  file { '/etc/puppet/puppet.conf':
    content => template('puppet/client/puppet.conf'),
  }
}
class puppet::server inherits puppet {
  File['/etc/puppet/puppet.conf'] {
    content => template('puppet/server/puppet.conf'),
  }
}

Note the syntax used; when declaring a resource, we use a syntax such as file { '/etc/puppet/puppet.conf': [...] }; when referring to it the syntax is File['/etc/puppet/puppet.conf'].

Even when possible, class inheritance is usually discouraged in Puppet style guides except for some design patterns that we'll see later in the book.

Resource defaults

It is possible to set default argument values for a resource type in order to reduce code duplication. The general syntax to define a resource default is:

Type {
  argument => default_value,
}

Some common examples are:

Exec {
  path => '/sbin:/bin:/usr/sbin:/usr/bin',
}
File {
  mode  => 0644,
  owner => 'root',
  group => 'root',
}

Resource defaults can be overridden when declaring a specific resource of the same type.

It is worth noting that the area of effect of resource defaults might bring unexpected results. The general suggestion is as follows:

Place the global resource defaults in site.pp outside any node definition
Place the local resource defaults at the beginning of a class that uses them (mostly for clarity's sake, as they are parse-order independent)

We cannot expect a resource default defined in a class to be working in another class, unless it is a child class, with an inheritance relationship.

Resource references

In Puppet, any resource is uniquely identified by its type and its name. We cannot have two resources of the same type with the same name in a node's catalog.

We have seen that we declare resources with a syntax such as:

type { 'name':
  arguments => values,
}

When we need to reference them (typically when we define dependencies between resources) in our code, the syntax is (note the square brackets and the capital letter):

Type['name']

Some examples are as follows:

file { 'motd': ... }
apache::virtualhost { 'example42.com': .... }
exec { 'download_myapp': .... }

These examples are referenced, respectively, with the following code:

File['motd']
Apache::Virtualhost['example42.com']
Exec['download_myapp']

Variables, facts, and scopes

When writing our manifests, we can set and use variables; they help us in organizing which resources we want to apply, how they are parameterized, and how they change according to our logic, infrastructure, and needs.

They may have different sources:

Facter (variables, called facts, automatically generated on the Puppet client)
User-defined variables in Puppet code (variables defined using Puppet DSL)
User-defined variables from an ENC
User-defined variables on Hiera
Puppet's built-in variables

System's facts

When we install Puppet on a system, the facter package is installed as a dependency. Facter is executed on the client each time Puppet is run and it collects a large set of key/value pairs that reflect many system's properties. They are called facts and provide valuable information like the system's operatingsystem, operatingsystemrelease, osfamily, ipaddress, hostname, fqdn, macaddress to name just some of the most used ones.

All the facts gathered on the client are available as variables to the Puppet Master and can be used inside manifests to provide a catalog that fits the client.

We can see all the facts of our nodes, running locally:

facter -p

(The -p argument is the short version of --puppet, and also shows eventual custom facts, which are added to the native ones, via our modules).

In facter 1.x, only plain values were available; facter 2.x introduces structured values, so any fact can contain arrays or hashes. Facter is replaced by cFacter for 3.0, a more efficient implementation in C++ that makes an extensive use of structured data. In any case, it keeps legacy keys, making these two queries equivalent:

$ facter ipaddress
1.2.3.4
$ facter networking.interfaces.eth0.ip
1.2.3.4

External facts

External facts, supported since Puppet 3.4/Facter 2.0.1, provide a way to add facts from arbitrary commands or text files.

These external facts can be added in different ways:

From modules, by placing them under facts.d inside the module root directory
In directories within nodes:
- In a directory specified by the –external-dir option
- In the Linux and Mac OS X in /etc/puppetlabs/facter/facts.d/ or /etc/facts.d/
- In Windows in C:\ProgramData\PuppetLabs\facter\facts.d\
- When running as a non-root user in $HOME/.facter/facts.d/

Executable facts can be scripts in any language, or even binaries; the only requirement is that its output has to be formed by lines with the format key=value, like:

key1=value1
key2=value2

Structured data facts have to be plain text files with an extension indicating its format, .txt files for files containing key=value lines, .yaml for YAML files, and .json for JSON files.

User variables in Puppet DSL

Variable definition inside the Puppet DSL follows the general syntax: $variable = value.

Let's see some examples. Here the value is set as a string, a boolean, an array. or a hash:

$redis_package_name = 'redis'
$install_java = true
$dns_servers = [ '8.8.8.8' , '8.8.4.4' ]
$config_hash = { user => 'joe', group => 'admin' }

From Puppet 3.5, using the future parser or starting on version 4.0 by default Here docs are also supported, what is a convenient way of define multiline strings:

$gitconfig = $("GITCONFIG")
  [user]
    name = ${git_name}
    email = ${email}
  | GITCONFIG
file { "${homedir}/.gitconfig":
  content => $gitconfig,
}

They have multiple options. In the previous example, we set GITCONFIG as the delimiter, the quotes indicate that variables in the text have to be interpolated, and the pipe character marks the indentation level.

Here, the value is the result of a function call (which may have strings and other data types or other variables as arguments):

$config_file_content = template('motd/motd.erb')

$dns_servers = hiera(name_servers)
$dns_servers_count = inline_template('<%= @dns_servers.length %>')

Here, the value is determined according to the value of another variable (here the $::osfamily fact is used), using the selector construct:

$mysql_service_name = $::osfamily ? {
  'RedHat' => 'mysqld',
  default  => 'mysql', 
}

A special value for a variable is undef (a null value similar to Ruby's nil), which basically removes any value to the variable (can be useful in resources when we want to disable, and make Puppet ignore, an existing attribute):

$config_file_source = undef
file { '/etc/motd':
  source  => $config_file_source,
  content => $config_file_content,
}

Note that we can't change the value assigned to a variable inside the same class (more precisely inside the same scope; we will review them later). Consider a code like the following:

$counter = '1'
$counter = $counter + 1

The preceding code will produce the following error:

Cannot reassign variable counter

Type-checking

The new parser used as default in Puppet 4 has better support for data types, what includes optional type-checking. Each value in Puppet has a data type, for example, strings are of the type String, booleans are of the type Boolean, and types themselves have their own type Type. Every time we declare a parameter, we can enforce its type:

class ntp (
    Boolean $enable = true,
    Array[String] $servers = [],
  ) { … }

We can also check types with expressions like this one:

$is_boolean =~ String

Or make selections by the type:

$enable_real = $enable ? {
  Boolean => $enable,
  String  => str2bool($enable),
  Numeric => num2bool($enable),
  default => fail('Illegal value for $enable parameter')
}

Types can have parameters, String[8] would be a string of at least 8 characters-length, Array[String] would be an array of strings and Variant[Boolean, Enum['true', 'false']] would be a composed value that would match with any Boolean or with members of the enumeration formed by the strings true and false.

User variables in an ENC

When an ENC is used for the classification of nodes, it returns the classes to include in the requested node and variables. All the variables provided by an ENC are at top scope (we can reference them with $::variablename all over our manifests).

User variables in Hiera

A very popular and useful place to place user data (yes, variables) is also Hiera; we will review it extensively in Chapter 2, Managing Puppet Data with Hiera; let's just point out a few basic usage patterns here. We can use it to manage any kind of variable, whose value can change according to custom logic in a hierarchical way. Inside manifests, we can lookup a Hiera variable using the hiera() function. Some examples are as follows:

$dns = hiera(dnsservers)
class { 'resolver':
  dns_server => $dns,
}

The preceding example can also be written as:

class { 'resolver':
  dns_server => hiera(dnsservers),
}

In our Hiera YAML files, we would have something like this:

dnsservers:
  - 8.8.8.8
  - 8.8.4.4

If our Puppet Master uses Puppet version 3 or greater, then we can benefit from the Hiera automatic lookup for class parameters, that is, the ability to define values for any parameter exposed by the class in Hiera. The preceding example would become something like this:

include resolver

Then, in the Hiera YAML files:

resolver::dns_server:
  - 8.8.8.8
  - 8.8.4.4

Puppet built-in variables

A bunch of other variables are available and can be used in manifests or templates:

Variables set by the client (agent):
- $clientcert: This is the name of the node (certname setting in its puppet.conf, by default is the host's FQDN)
- $clientversion: This is the Puppet version on the agent
Variables set by the server (Master):
- $environment: This is a very important special variable, which defines the Puppet's environment of a node (for different environments the Puppet Master can serve manifests and modules from different paths)
- $servername, $serverip: These are respectively, the Master's FDQN and IP address
- $serverversion: This is the Puppet version on the Master (is always better to have Masters with Puppet version equal or newer than the clients)
- $settings::<setting_name>: This is any configuration setting of the Puppet Master's puppet.conf variable
Variables set by the parser during catalog compilation:
- $module_name: This is the name of the module that contains the current resource definition
- $caller_module_name: This is the name of the module that contains the current resource declaration

Variables scope

One of the parts where Puppet development can be misleading and not so intuitive is how variables are evaluated according to the place in the code where they are used.

Variables have to be declared before they can be used and this is parse order dependent, so, for this reason, Puppet language can't be considered completely declarative.

In Puppet, there are different scopes; partially isolated areas of code where variables and resource defaults values can be confined and accessed.

There are four types of scope, from general to local:

Top scope: This is any code defined outside nodes and classes, as what is generally placed in /etc/puppet/manifests/site.pp)
Node scope: This is code defined inside nodes definitions
Class scope: This is code defined inside a class or define
Sub class scope: This is code defined in a class that inherits another class

We always write code within a scope and we can directly access variables (that is just specifying their name without using the fully qualified name) defined only in the same scope or in a parent or containing one. The following are the ways, we can access top scope variables, node scope variables, and class variables:

Top scope variables can be accessed from anywhere
Node scope variables can be accessed in classes (used by the node), but not at the top scope
Class (also called local) variables are directly available, with their plain name, only from within the same class or define where they are set or in a child class

Variables' value or resources default arguments defined at a more general level can be overridden at a local level (Puppet uses always the most local value).

It's possible to refer to variables outside a scope by specifying their fully qualified name, which contains the name of the class where the variables are defined. For example, $::apache::config_dir is a variable, called config_dir, defined in the apache class.

One important change introduced in Puppet 3.x is the forcing of static scoping for variables; this involves that a parent scope for a class can be only its parent class.

Earlier Puppet versions had dynamic scoping, where parent scopes were assigned both by inheritance (as in static scoping) and by simple declaration; that is, any class has the first scope where it has been declared as parent. This means that, since we can include classes multiple times, the order used by Puppet to parse our manifests may change the parent scope and therefore how a variable is evaluated.

This can obviously lead to any kind of unexpected problems, if we are not particularly careful on how classes are declared, with variables evaluated in different (parse order dependent) ways. The solution is Puppet 3's static scoping and the need to reference to out of scope variables with their fully qualified name.

Managing order and dependencies

The Puppet language is declarative and not procedural; it defines states as follows: the order in which resources are written in manifests does not affect the order in which they are applied to the desired state.

Note

The Puppet language is declarative and not procedural. This is not entirely true—contrary to resources, variables definitions are parse order dependent, so the order used to define variables is important. As a general rule, just set variables before using them, which sounds logical, but is procedural.

There are cases where we need to set some kind of ordering among resources, for example, we might want to manage a configuration file only after the relevant package has been installed, or have a service automatically restart when its configuration files change. Also, we may want to install packages only after we've configured our packaging systems (apt sources, yum repos, and so on) or install our application only after the whole system and the middleware has been configured.

To manage these cases, there are three different methods, which can coexist:

Use the meta parameters before, require, notify, and subscribe
Use the chaining arrows operator (respective to the preceding meta parameters: ->, <-, <~, ~>)
Use run stages

In a typical package/service/configuration file example, we want the package to be installed first, configure it, and then start the service, eventually managing its restart if the config file changes.

This can be expressed with meta parameters:

package { 'exim':
  before => File['exim.conf'],  
}
file { 'exim.conf':
  notify => Service['exim'],
}
service { 'exim': }

This is equivalent to this chaining arrows syntax:

package {'exim': } ->
file {'exim.conf': } ~>
service{'exim': }

However, the same ordering can be expressed using the alternative reverse meta parameters:

package { 'exim': }
file { 'exim.conf':
  require => Package['exim'],
}
service { 'exim':
  subscribe => File['exim.conf'], 
}

They can also be expressed like this:

service{'exim': } <~
file{'exim.conf': } <-
package{'exim': }

Run stages

Puppet 2.6 introduced the concept of run stages to help users manage the order of dependencies when applying groups of resources.

Puppet provides a default main stage; we can add any number or further stages, and their ordering, with the stage resource type and the normal syntax we have seen:

stage { 'pre':
  before => Stage['main'],
}

The normal syntax is equivalent to:

stage { 'pre': }
Stage['pre'] -> Stage['main']

We can assign any class to a defined stage with the stage meta parameter:

class { 'yum':
  stage => 'pre',
}

In this way, all the resources provided by the yum class are applied before all the other resources (in the default main stage).

The idea of stages at the beginning seemed a good solution to better handle large sets of dependencies in Puppet. In reality, some drawbacks and the augmented risk of having dependency cycles make them less useful than expected. A thumb rule is to use them for simple classes (that don't include other classes) and where it is really necessary (for example, to set up package management configurations at the beginning of a Puppet run or deploy our application after all the other resources have been managed).

Iteration and lambdas

Puppet language has historically been somehow limited in iterators, it didn't have explicit support for this till version 4.0. The old way of doing it is by the use of defined types. All Puppet resources can have an array as its title, which is equivalent to creating the same resource one time with each of the elements of the array.

This approach, although sometimes convenient and orthogonal with the rest of the language, has some limitations. First, only the title varies between each created resource, which limits the possibilities of the code in the iteration, and second, a defined type needs to be implemented just for the iteration; it can even happen that the type is defined far from the place where we want to iterate, thus over-complicating it and making it less readable. Here is an example:

define nginx::enable_site ($site = $title) {
  file { "/etc/nginx/sites-enabled/$site":
    ensure => link,
    target => "/etc/nginx/sites-available/$site",
  }
}
$sites = ['example.com', 'test.puppetlabs.com']
nginx::enable_site { $sites: }

In newer versions, the language includes support for lambda functions and some functions that accept these lambdas as parameters, allowing more explicit iterators, for example, to define resources:

$sites = ['example.com', 'test.puppetlabs.com']
$sites.each |String $value| {
  file { "/etc/nginx/sites-enabled/$site":
    ensure => link,
    target => "/etc/nginx/sites-available/$site",
  }
}

To transform data, like selecting the sites that start with "test" from a list, use a code as follows:

$test_sites = $sites.filter |$site| { $site =~ /^test\./ }

The in operator

The in operator checks whether a string is present in another string, an array, or in the keys of a hash. It is case sensitive:

if '64' in $::architecture
if $monitor_tool in [ 'nagios' , 'icinga' , 'sensu' ]

Expressions combinations

It's possible to combine multiple comparisons with and and or:

if ($::osfamily == 'RedHat') and ($::operatingsystemrelease == '5') { [ ... ] }
if (operatingsystem == 'Ubuntu') or ($::operatingsystem == 'Mint') { [ ...] }

Exported resources

When we need to provide information to a host about resources present in another host, things in Puppet become trickier. This can be needed for example, for monitoring or backup solutions. The only official solution has been, for a long time, to use exported resources; resources declared in the catalog of a node (based on its facts and variables), but applied (collected) on another node. Some alternative approaches are now possible with PuppetDB, we will review them in Chapter 3, Introducing PuppetDB.

Resources are declared with the special @@ notation, which marks them as exported so that they are not applied to the node where they are declared:

@@host { $::fqdn:
  ip  => $::ipaddress,
}
@@concat::fragment { "balance-fe-${::hostname}":
  target  => '/etc/haproxy/haproxy.cfg',
  content => "server ${::hostname} ${::ipaddress} maxconn 5000",
  tag     => "balance-fe",
}

Once a catalog containing exported resources has been applied on a node and stored by the Puppet Master, the exported resources can be collected with the <<| |>> operator, where it is possible to specify search queries:

Host <<| |>>
Concat::Fragment <<| tag == "balance-fe" |>>
Sshkey <<| |>>
Nagios_service <<| |>>

In order to use exported resources, we need to enable on the Puppet Master the storeconfigs option and specify the backend to be used. For a long time, the only available backend was Rails' active records, which typically used MySQL for data persistence. This solution was the best for its time, but suffered severe scaling limitations. Luckily, things have changed, a lot, with the introduction of PuppetDB, which is a fast and reliable storage solution for all the data generated by Puppet, including exported resources.

Virtual resources

Virtual resources define a desired state for a resource without adding it to the catalog. Like normal resources, they are applied only on the node where they are declared, but, as virtual resources, we can apply only a subset of the ones we have declared; they have also a similar usage syntax: we declare them with a single @ prefix (instead of the @@ used for exported resources), and we collect them with <| |> (instead of <<| |>>).

A useful and rather typical example involves user's management.

We can declare all our users in a single class, included by all our nodes:

class my_users {
  @user { 'al': […] tag => 'admins' }
  @user { 'matt': […] tag => 'developers' }
  @user { 'joe': [… tag => 'admins' }
[ … ]
}

These users are actually not created on the system; we can decide which ones we actually want on a specific node with a syntax like this:

User <| tag == admins |>It is equivalent to:
realize(User['al'] , User['joe'])

Note that the realize function needs to address resources with their name.

Modules

Modules are self-contained, distributable, and (ideally) reusable recipes to manage specific applications or system's elements.

They are basically just a directory with a predefined and standard structure that enforces configuration over naming conventions for the managed provided classes, extensions, and files.

The $modulepath configuration entry defines where modules are searched; this can be a list of colon separated directories.

Paths of a module and auto loading

Modules have a standard structure, for example, for a MySQL module the code reads thus:

mysql/            # Main module directory

mysql/manifests/  # Manifests directory. Puppet code here.
mysql/lib/        # Plugins directory. Ruby code here
mysql/templates/  # ERB Templates directory
mysql/files/      # Static files directory
mysql/spec/       # Puppet-rspec test directory
mysql/tests/      # Tests / Usage examples directory
mysql/facts.d/    # Directory for external facts

mysql/Modulefile  # Module's metadata descriptor

This layout enables useful conventions, which are widely used in Puppet world; we must know them to understand where to look for files and classes:

For example, we can use modules and write the following code:

include mysql

Puppet will then automatically look for a class called mysql defined in the file $modulepath/mysql/manifests/init.pp:

The init.pp script is a special case that applies for classes that have the same name of the module. For sub classes there's a similar convention that takes in consideration the subclass name:

include mysql::server

It then auto loads the $modulepath/mysql/manifests/server.pp file.

A similar scheme is also followed for defines or classes at lower levels:

mysql::conf { ...}

This define is searched in $modulepath/mysql/manifests/conf.pp:

include mysql::server::ha

It then looks for $modulepath/mysql/manifests/server/ha.pp.

It's generally recommended to follow these naming conventions that allow auto loading of classes and defines without the need to explicitly import the manifests that contain them.

Note

Note that, even if not considered good practice, we can currently define more than one class or define inside the same manifest as, when Puppet parses a manifest, it parses its whole contents.

Module's naming conventions apply also to the files that Puppet provides to clients.

We have seen that the file resource accepts two different and alternative arguments to manage the content of a file: source and content. Both of them have a naming convention when used inside modules.

Templates, typically parsed via the template or the epp functions with syntax like the one given here, are found in a place like $modulepath/mysql/templates/my.cnf.erb:

content => template('mysql/my.cnf.erb'),

This also applies to sub directories, so for example:

content => template('apache/vhost/vhost.conf.erb'),

It uses a template located in $modulepath/apache/templates/vhost/vhost.conf.erb.

A similar approach is followed with static files provided via the source argument:

source => 'puppet:///modules/mysql/my.cnf'

It serves a file placed in $modulepath/mysql/files/my.cnf:

source => 'puppet:///modules/site/openssh/sshd_config'

This serves a file placed in $modulepath/site/openssh/sshd_config.

Notice the differences in templates and source paths. Templates are resolved in the server, and they are always placed inside the modules. Sources are retrieved by the client and modules in the URL could be a different mount point if it's configured on the server.

Finally, the whole content of the lib subdirectory in a module has a standard scheme. Note that here, we can place Ruby code that extends Puppet's functionality and is automatically redistributed from the Master to all clients (if the pluginsync configuration parameter is set to true, this is default for Puppet 3 and widely recommended in any setup):

mysql/lib/augeas/lenses/                # Custom Augeas lenses.
mysql/lib/facter/                       # Custom facts.
mysql/lib/puppet/type/                  # Custom types.
mysql/lib/puppet/provider/<type_name>/  # Custom providers.
mysql/lib/puppet/parser/functions/      # Custom functions.

Templates

Files provisioned by Puppet can be templates written in Ruby's ERB templating language or in the Embedded Puppet Template Syntax (EPP).

An ERB template can contain whatever text we need and have inside <% %> tags, interpolation of variables or Ruby code. We can access, in a template, all the Puppet variables (facts or user assigned) with the <%= tag:

# File managed by Puppet on <%= @fqdn %>
search <%= @domain %>

The @ prefix for variable names is highly recommended in all Puppet versions, and mandatory starting from 4.0.

To use out of scope variables, we can use the scope.lookupvar method:

path <%= scope.lookupvar('apache::vhost_dir') %>

This uses the variable's fully qualified name. If the variable is at top scope then run the following command:

path <%= scope.lookupvar('::fqdn') %>

Since Puppet 3, we can use this alternative syntax:

path <%= scope['apache::vhost_dir'] %>

In ERB templates, we can also use more elaborate Ruby code inside a <% opening tag, for example, to reiterate over an array:

<% @dns_servers.each do |ns| %>
nameserver <%= ns %>
<% end %>

The <% tag is used to place a line of text if some conditions are met:

<% if scope.lookupvar('puppet::db') == "puppetdb" -%>
  storeconfigs_backend = puppetdb
<% end -%>

Noticed the -%> ending tag here? When the dash is present, no line is introduced on the generated file, as it would if we had written <% end %>.

EPP templates are quite similar, they are also plain text files and they use the same tags for the embedded code, the main differences are that EPPs use Puppet code instead of Ruby, that they can receive type-checked parameters, and that they can directly access other variables where in ERBs we'd have to use lookup functions.

The parameters definition is optional but if it's included it has to be in the beginning of the file:

<%- | Array[String] $dns_servers,
      String $search_domain | -%>

To use templates in Puppet code, we have to use the template function for ERBs or the epp function for EPPs; epp can receive a hash with the values of the arguments as a second argument:

file { '/etc/resolv.conf':
  content => epp('resolvconf/resolv.conf.epp', { 
    'dns_ervers': ['8.8.8.8', '8.8.4.4'],
    'search_domain': 'example.com',
  })
}

Jeroen Hooyberghs Jul 19, 2016

[Disclaimer: I reviewed the technical aspects of this book during the writing and therefor received a free copy of the book. I was also asked by Packt Publishing to post my own review here.]I personally believe that the first edition of this book, written by Alessandro Francesci was the best puppet book in the wild when it got published. It really think it deserved an update, to follow along with the fast changing puppet ecosystem.Next to a lot of updated information, this second edition has some really interesting additions like: - Puppet 4 language: iterations and lambda's - Example42's tiny puppet - Puppet master based on trapperkeeper - and much more...There's some things inside the book that still feel a little outdated, and that in my opinion should've been looked at. An example of this is some modules mentioned in the book that are no longer maintained that could have gotten a rewrite.If you haven't read the first edition yet, and are looking for something to help you expand your knowledge to advanced puppet infrastructure, I really advise reading this updated version of the book. I'm sure you will get a lot of new information you haven't found in other books. However if you already read the first edition, I think this one will give you the idea that a lot of the structure is the same, although it had a lot of updates.

Amazon Verified review