The Puppet ecosystem
Puppet is a configuration management and automation tool; we use it to install, configure, and manage components of our servers.
Initially written in Ruby, some parts were rewritten in version 4 in Clojure. Released with an open source license (Apache 2), it can run on any Linux distribution, many other UNIX variants (Solaris, *BSD, AIX, and Mac OS X), and Windows. Its development started in 2005 by Luke Kanies as an alternative approach to the existing configuration management tools (most notably CFEngine and BladeLogic). The project has grown year after year; Kanies' own company, Reductive Labs, renamed in 2010 to Puppet Labs, has received a total funding of $ 45.5 million in various funding rounds (among the investors there are names such as VMware, Google, and Cisco).
Now, it is one of the top 100 fastest growing companies in the US. It employs more than 150 people and it has a solid business based on open source software, consisting of consulting services, training, certifications, and Puppet Enterprise. Puppet Enterprise is the commercial version that is based on the same open source Puppet codebase, but it provides an integrated stack with lots of tools, such as a web GUI that improves and makes Puppet usage and administration easier, and more complete support for some major Linux distributions, Mac OS X, and Microsoft Windows Server.
The Puppet ecosystem features a vibrant, large, and active community, which discusses on the Puppet Users and Puppet Developers Google groups, on the crowded free node #puppet IRC channel, at the various Puppet Camps that are held multiple times a year all over the world, and at the annual PuppetConf, which is improving and getting bigger year after year.
Various software products are complementary to Puppet; some of them are developed by Puppet Labs:
- Hiera: This is a key-value lookup tool that is the current choice of reference for storing data related to our Puppet infrastructure.
- Mcollective: This is an orchestration framework that allows parallel execution of tasks on multiple servers. It is a separate project by Puppet Labs, which works well with Puppet.
- Facter: This is a required complementary tool; it is executed on each managed node and gathers local information in key/value pairs (facts), which are used by Puppet.
- Geppetto: This is an IDE, based on Eclipse that allows easier and assisted development of Puppet code.
- Puppet Dashboard: This is an open source web console for Puppet.
- PuppetDB: This is a powerful backend that can store all the data gathered and generated by Puppet.
- Puppet Enterprise: This is the commercial solution to manage via a web frontend Puppet, Mcollective, and PuppetDB.
The community has produced other tools and resources. The most noticeable ones are:
- The Foreman: This is a systems lifecycle management tool that integrates perfectly with Puppet.
- PuppetBoard: This is a web front end for PuppetDB.
- Kermit: This is a web front end for Puppet and Mcollective.
- Modules: These are reusable components that allow management of any kind of application and software via Puppet.
Why configuration management matters
IT operations have changed drastically in the past few years. Virtualization, cloud, business needs, and emerging technologies have accelerated the pace of how systems are provisioned, configured, and managed.
The manual setup of a growing number of operating systems is no longer a sustainable option. At the same time, in-house custom solutions to automate the installation and the management of systems cannot scale in terms of required maintenance and development efforts.
For these reasons, configuration management tools such as Puppet, Chef, CFEngine, Rudder, Salt, and Ansible (to mention only the most known open source ones) are becoming increasingly popular in many infrastructures.
They show infrastructure as code, that allows, in systems management, the use of some of the same best practices in software development for decades, such as maintainability, code reusability, testability, or version control.
Once we can express the status of our infrastructure with versioned code, there are powerful benefits:
- We can reproduce our setups in a consistent way, what is executed once can be executed any time, the procedure to configure a server from scratch can be repeated without the risk of missing parts
- Our code commits log reflects the history of changes on the infrastructure; who did what, when, and if commits comments are pertinent, why.
- We can scale quickly; the configurations we made for a server can be applied to all the servers of the same kind.
- We have aligned and coherent environments; our Development, Test, QA, Staging, and Production servers can share the same setup procedures and configurations.
With these kinds of tools, we can have a system provisioned from zero to production in a few minutes, or we can quickly propagate a configuration change over our whole infrastructure automatically.
Their power is huge and has to be handled with care; as we can automate massive and parallelized setups and configurations of systems; we might automate distributed destructions.
With great power comes great responsibility.