Make your own configuration deployment system, part 1
Posted by rossen on Mon 30 Jun 2008 at 20:24
In this series of articles, I describe the steps to making a flexible configuration deployment system tailored to your needs. It can be as simple or as complete as you care to make it. And since you made it, you can understand it intimately.
If you have two or more machines to manage, you have probably noticed that they have certain similarities of configuration.
These similarities may include
- network configuration
- basic package list
- configurations of packages
- aliases and shortcuts
- internationalisation settings
You may have spent an enormous amount of time finding the ideal configuration for a piece of software and you would really regret losing your masterpiece in an unfortunate accident. Or you may need to rapidly deploy the same configuration change to a hundred machines. Or you may be simply tired of doing the same procedures every time you install a new machine.
A configuration deployment system can greatly reduce the amount of work necessary to manage 2 or more machines, but the amount of time necessary to learn the in-and-outs of currently existing systems may be daunting. ISconf, FAI, cfengine, debconf+LDAP, Subversion, etc. all have their strong points, but if you are just getting started, they are probably overkill. One solution is to build your own system from scratch.
Essential components
The essential components of the system are:
- a configuration repository, i.e. what to deploy, including containing the database of configuration files, data, package lists, scripts, and jobs
- a configuration transfer method, i.e. how to get the data to the clients
- a collection of deployment scripts, i.e. how to apply the data to the clients
Depending on your needs, you can use solutions that overlap the boundaries of these functional divisions or you can keep them strictly separate which allows you to easily substitute methods or build on them if the need arises.
Configuration repository
You have a wide choice available for a configuration repository. Here is a non-exhaustive list of possibilities:
- directories of files, one directory for each machine
- directories of files, organised by classes
- tarballs (or .deb or .rpm packages), one for each machine
- versioning systems like CVS or Subversion
- LDAP server
- SQL database
There is a choice of media too. You can use a network-connected server or some removable media like a floppy, USB key, or a CDROM.
Note that one is not limited to one configuration repository - you can have multiple repositories, but you will have to make decisions about their priorities and what to do if a repository fails.
Configuration transfer method
You need a method to get your configuration from the repository to the client machines. This is somewhat determined by your choice of repository, but there is still some flexibility.
Here is yet another non-exhaustive list of common methods:
- direct copy from removable media
- direct copy from network-mounted share
- rsync or scp or SSH
- download from FTP or web server
- versioning system check-out (CVS, Subversion, etc.)
- transfer integrated into configuration-management software (cfengine)
And here are some more exotic methods for transferring configuration info:
- POP or IMAP
- LDAP query
- SQL query
- SNMP query
- DHCP query (somewhat limited)
- IRC download (think "botnet")
- peer-to-peer (like Bittorrent)
- DNS query (!)
Deployment scripts
After you get the configuration info to the client it must be used, but how? Again, you have a lot of flexibility.
- Config files can be simply be copied into place automatically, or first manipulated in a local workspace to resolve configuration priorities coming from several repositories and then finally copied into place.
- Scripts can be used to automatically edit configuration files and registries using the new values of various parameters if a change is necessary.
- Little jobs to check/signal/reload/restart daemons can be triggered if configuration changes.
- Old config files can be backed-up before being over-written.
- A configuration roll-back mechanism can be implemented.
Research and define your needs
One of the most thoroughly thought-out configuration systems is ISconf, found at www.isconf.org. ISconf is probably too complicated for a beginner and over-kill for just a few systems, but the philosophy and history of the system is detailed at www.infrastructures.org and it is well worth the time to read over the paper "Bootstrapping an Infrastructure" at http://www.infrastructures.org/papers/bootstrap/bootstrap.html.
Since I usually use Debian or Ubuntu, my preferred installation/configuration system is FAI, "Fully Automatic Installation", http://www.informatik.uni-koeln.de/fai/.
One of the sub-systems used by FAI is cfengine, www.cfengine.org, a self-contained high-level scripting language and configuration deployment system itself.
Before you reach for your favorite scripting language, think about what you want your system to manage now and in the future. A few hours of reading reflection at this point could save a few false starts and re-inventions of the wheel.
- contents of system config files only?
- file permissions and ownerships?
- user files too?
- changes are fully automatic or just advisory?
- push or pull?
- polled or instantaneous changes?
- logging?
- backups?
- roll-back capability?
- multiple source?
- package management?
- multiple distribution?
- multiple OS?
- how many sites?
- integration with present systems?
- preserve local admin changes?
- bullet-proof or hackware?
- cryptographicly secured?
- management interface other than the command-line+vi?
- uploading of local changes?
- confirmation of changes?
Hints and warnings
Organise your deployment by following the checklist at http://www.infrastructures.org/bootstrap/checklist.shtml. The principle is to always assemble the lowest-level infrastructure first in order to save time assembling the rest.
Make sure that everything in your DNS is complete and perfectly correct. A misspelling of a machine name or a false address will cause all sorts of time-wasting mysteries.
Use NTP to make sure every machine knows precisely what time it is or updates based on "make" or file time-stamps can fail in a bizarre manner.
Decide on a method for dealing with local changes (AKA cowboy admins). You might consider strictly forbidding local changes to configuration like Infrastructures.Org and FAI recommend.
Install integrit or some other file-system integrity checker and tune it so that configuration changes are obvious. That is, tune it to ignore files that are expected to change so that the reports are always tiny.
Simple examples
Here are some simple examples of configuration deployment systems. For small networks of composed of a small number of more-or-less identical machines all on one site, these examples may be all that you need. The examples also illustrate how the functions of configuration repository, transfer, and deployment scripts can overlap.
Simple recursive copy
Assume that you have a directly-accessible repository directory /srv/cfg/site/etc. It contains only /etc files that are valid for every machine at your site, eg. /etc/resolv.conf, /etc/hosts. To deploy these files, just copy them recursively into place using the GNU "cp" command and its "-a" or "--archive" option to preserve modification time, ownerships, and permissions:
cp -a /srv/cfg/site/etc -T /etc
There are a few problems with the above example. Firstly, the files will be copied every time the command is run even if the source and target files are already identical. Apart from being inefficient, this might cause file integrity systems (like integrit) to trigger a useless warning. Secondly, if modifications were made to the files in /etc but the repository was not updated, the changes will be wiped out without a backup. Nevertheless, if your needs are simple and you intend to manually run the command only on the rare occasions that there is a change, this may be all that you need.
Congratulations - you are done.
Simple recursive update (based on file mod time) with backups
GNU cp has two options that are interesting: the "-u" or "--update" option that will copy a source file only if its modification time is newer than the target file and the "-b" or "--backup" option that makes a single or incrementally-numbered backup of the target file if a copy is done. Here is how they might be used:
cp -u -a --backup=numbered /srv/cfg/site/etc -T /etc
This method has problems too. You end up with /etc directories cluttered with backup files with names like "hosts.~4~" that need to be dealt with. And if one of your target files is touched, which changes the modification timestamp, the cp will not copy the source to the target since the target is newer. This is a problem if all machines are supposed to be always using the canonical configuration file from the repository. Local administrators might consider this problem to be a feature and not a bug.
Simple recursive update (based on contents) with backups
Ideally, the updates should be based upon the files' contents, not their modification times. By default rsync will update only files with differing mod times or sizes, but it can be told to ignore these checks and look at file contents with the "-I" (or "--ignore-times") and "-c" (or "--checksum") options. In addition, one can specify a separate directory for keeping backed-up files:
rsync -I -c -a --backup --backup-dir=/var/backup /srv/cfg/site/etc/ /etc
Simple recursive update from a remote repository with date-organised backups
Of course rsync has extra features that make it the ideal simple configuration deployment tool. It has remote file-transfer capabilities that can be used to solve the problem of access to the configuration repository if it is on another machine in your network instead of some locally-accessible media.
Assume that "cfg" is the name (or even better, a DNS alias) for the configuration repository machine and we want to save backups of local files that get replaced into directory hierarchies organised by date (and time, if you need). The configuration deployment commands could be:
bd=/var/backup/cfg/$(date '+%Y/%m/%d'); mkdir -p $bd
rsync -I -c -a --backup --backup-dir=$bd root@cfg:/srv/cfg/site/etc/ /etc
Simple recursive update from multiple remote repositories with date-organised backups
So far, we have only been recuperating site-wide /etc files. It is highly probable that we want to add useful files to /usr/local/{bin,sbin}, /root, and other directories. And we probably want to manage customisations that are valid only for a particular machine. The structure of our configuration repository on "cfg" might look like this:
/srv/cfg/site/
/srv/cfg/site/etc/
/srv/cfg/site/etc/hosts
/srv/cfg/site/etc/resolv.conf
...
/srv/cfg/host01/
/srv/cfg/host01/etc/
/srv/cfg/host01/etc/network/
/srv/cfg/host01/etc/network/interfaces
...
/srv/cfg/host02/
...
Here are the deployment commands to run on host01, host02, etc.:
bd=/var/backup/cfg/$(date '+%Y/%m/%d'); mkdir -p $bd
rsync -I -c -a --backup --backup-dir=$bd root@cfg:/srv/cfg/site/ /
rsync -I -c -a --backup --backup-dir=$bd root@cfg:/srv/cfg/$(hostname)/ /
What next?
Part 2 of this series will probably deal with writing helper tools, for example a script to easily check files from the client into the configuration repository. If there is interest in this article, direction of the series will be in part determined by any questions that are posed.
About this document
URL: http://www.rtfm-sarl.ch/articles/configuration-deployment-p1.txt
HTML-conversion: txt2html --titlefirst --noanchors --preformat_trigger_lines 1 configuration-deployment-p1.txt > configuration-deployment-p1.html
Title: Make your own configuration deployment system, part 1
Version: 2008-06-27-001
Author: Erik Rossen <rossen@rossen.ch>
Licence: Creative Commons Attribution-Share Alike 2.5 Switzerland, http://creativecommons.org/licenses/by-sa/2.5/ch/
Best regards,
Lucas
[ Parent | Reply to this comment ]
Part of the reason for this is that I often find myself working for clients who are extremely conservative and it is difficult to convince them to start using an automatic configuration deployment system.
In almost all cases these people started building their infrastructures one PC at a time and that is how they are used to managing things. Often they have quite useful collections of scripts to do everything that they need, but they just lack the courage to make the leap to centralising their collection in order to rapidly and consistently deploy it.
Puppet looks fine and I might decide switch to it in a few years. The point is that when I make the decision, the role-out will be very rapid because the sites that I manage will already have a system (or many systems) for deployment in place.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
Commence the deluge of Puppet fanboys (myself included). Shameless plug for my infrastructure management pages, too.
Three main points to add here:
- I suspect by the time a configuration management toolset has been developed from scratch (including package management, users, cron jobs, config files, and which services should be running or disabled), it'll be fairly complex. Not quite as complex as a premade configuration management system, but close.
- If one has prebuilt packages for a particular management system, doing simple things shouldn't be much more complex than in the from-scratch version. Example: Steve's Puppet intro. You trade off some hassle in key-signing for other hassles in getting rsync secured against unauthorized access and running without passwords. It may approximately even out.
- Small homogeneous infrastructures rarely remain small and homogeneous. One big advantage to Puppet, at least, is the resource abstraction library that lets you define platform-neutral resources in a platform-neutral language. That is, ssh on Debian, Redhat, Solaris, and other systems all operate basically the same way: the same type of configuration files (in different locations), a need to run a service (with differing names, and run via sysvinit, svcadm, or whatever), and so on. Puppet lets you change the specific behavior depending on the OS or other characteristics of the client system (example here).
[ Parent | Reply to this comment ]
The repository would look something like this.
-- inbound_mail
|-- trunk
|-- ldap
|-- slapd.conf
|-- scripts
|-- count_msgs_in_queue.sh
|-- sendmail
|-- sendmail.mc
|-- branches
|-- dev
Deployment would mean running a script via ssh on the remote server that would check out the configuration files from SVN and reload the associated server daemon.
Any suggestions or reasons not to go with this kind of setup?
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
have you heard of etckeeper [1][2]. It helps you to keep track of /etc with the help of an VCS.
Thanks,
Paul
[1] http://kitenet.net/~joey/code/etckeeper/
[2] aptitude show etckeeper
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]