Posted 23 July 2015
One of the cornerstones of my employer's (infrastructure) deployment systems is a Jenkins box performing 'pipeline' delivery. It uses a bunch of Jenkins plugins to provide the UI and job-interconnections required to make pipelines possible. We then use a separate pipeline for every Puppet module we write, another for (nearly) every Apache config, and more for various other miscellany, such as Drupal modules/themes, message broker configs and more.
Jenkins provides a pretty good way to make pipelines work. The actual server does very little real work - it's mostly just checking things in and out of an SCM (SVN or Git in our case), and occasionally running Mcollective to push things to boxes. This means at any given moment, our Jenkins box runs pretty idle - yet moving around the UI is really sluggish.
Looking around the Internet, the usual use-case for Jenkins is build boxes. There, you'd have a relatively small number of jobs, but each job does some resource-hungry work (eg. compile or package code). In that use-case, you can create Jenkins 'slaves' to off-load the hard work to other machines. We do actually do this for Puppet module integration testing, but not much more.
Our performance problems are due to having about 15,000 jobs (each with dozens of historic job reports) configured on one Jenkins instance. Jenkins does quite admirably with a system that large, but occasionally has to
stat() each of the jobs (although not the history). No matter which way you look at it, a loop that performs a
stat() 15,000+ times is never going to be especially quick. What's more, slaves are no use - all the jobs are still defined on the master, and most of those jobs don't use up much CPU to execute.
We're working to split one box into two (or more later, if needs be). Our Puppet module pipelines make up about 80% of the total, so they're going on one box, and everything else will go on another. Hopefully that means all the miscellany gets really perky service. How the Puppet stuff pans out is anyone's guess. If it's still too slow, we could potentially split it again, although it's not clear how we'd do that in a meaningful way. Reading Jenkins data files isn't too hard, so maybe I'll delete whole pipelines if they haven't been used in a while. Jenkins itself doesn't really have a concept of a "pipeline" - it's just a bunch of related jobs, so it'll need some logic to actually remove the pipeline as a whole and not just random jobs inside it. As I say though, it doesn't look too hard if it indeed comes to needing such a thing.
Another requirement of this work is to keep Jenkins as 'seamless' as possible. Apparently people find it too hard to go to one URL for Puppet pipelines and another for anything else, so I devised a little HTML banner (using a plugin of the same name) on the top of every page on both boxes. It's got links to both boxes on it, so hopefully that means users can go to either box and navigate to the one they really want.
Lastly, since I've got some time to work on this, it seemed fitting to also give everything an upgrade. Jenkins core seems to get a new version about every fortnight, so it's unlikely we'll keep up to date with that schedule. However, getting as much up to date (as of now) as possible means it should be a bit easier to do another bulk update in the future. We don't allow in-place upgrades of the core or plugins (which is a feature Jenkins has), so it's server-defined upgrades or nothing (probably why we're a bit out of date!). I did get bored of downloading the Jenkins RPM and manually putting it into our local repo, so make a Jenkins job to do that stuff for me - hopefully it'll easy enough that we'll get around to it a little more regularly than we do now.
The actual big switch-over is planned for just over a week's time. Hopefully it'll go as planned...
More blog posts:Previous Post: Mcollective | Next Post: Check MK