Lately I’ve been investigating how Mozilla’s build and test system works (there’s a number of pieces that tend to have a pretty tight integration and I wanted to learn more). I asked developer Ben Hearsum and he kindly obliged. I’ve included the questions and information here in the hopes that others will be able to learn from it, as well.
There’s two critical components to Mozilla’s build and test infrastructure: Buildbot and Tinderbox – I was wondering if you could tell me about their relationship and integration.
Ben: I’d like to break this down a little bit more. Tinderbox consists of
two parts: Client and Server. The server is essentially just the Waterfall display. It sits on a server somewhere and reacts to incoming e-mail. The client is a set of scripts that knows how to do various interesting things (building, packaging, generating updates, etc) with Mozilla products.
Historically, builds are done on an infinite loop by tinderbox client, reporting back to the server at the start and end of each. It’s a completely stateless system; the client sends out a specifically formatted e-mail, and the server acts on it. Because of the simplicity of communicating with the tinderbox server we can use it as a display, with Buildbot driving the builds.
That is only direct communication between the two. In some cases Buildbot does some post-processing of logs to get Tinderbox to display things directly on the Waterfall (f.e., unit test pass/fail numbers).
At this point, Buildbot is responsible for driving almost all of the new build, unit test, and talos infrastructure we bring up. Right now, we’ll be sticking with Tinderbox as the main developer frontend. In the future we may want to present developers with the Buildbot Waterfall instead — but we’ve got some feature parity to address before that can happen.
How are unit tests integrated into the Buildbot and Tinderbox systems? How are different types of tests handled?
Ben: With the exception of reporting back to the Tinderbox server all of our unit test infrastructure is 100% Buildbot. (I’m not even sure tinderbox client can run unit tests, actually.) We’ve got custom Buildbot steps that run our various tests and parse the output. These classes deal with the different types of tests – how they run, how to parse the output they generate, etc.
Currently, unit test output is only available to the outside world via Tinderbox. It shows a quick pass/fail/skip for each test. Internally, we mostly watch the Buildbot Waterfall. For the curious, here’s what that looks like:
What if a developer wants to test out the implications of a patch before landing it and affecting all other developers?
Ben: The Try Server is exactly what you want here. You can submit a patch to it (or a set of HG repositories, if you’re testing Mozilla2 code) – and have it compile, package, and upload a build for you. Recently, we added two new features: rudimentary Talos and a win32 symbol server. For instructions on how to use it swing by the wiki page.
Buildbot appears to be used all throughout the Mozilla infrastructure – is its use applicable to other projects?
Ben: It’s true, we use Buildbot a lot here. Off the top of my head we use
it for: Release Automation, Try Server, unit tests, Talos, misc. test infrastructure, l10n, and probably some things I’m unaware of. One of the great things about Buildbot is that it’s an active project with a healthy community. This is one of the reasons we want to move to it. It’s used by many different projects, including: Python, Twisted, KDE, WebKit, Subversion, OpenOffice, Gnome – to name a few.
Buildbot is built in a very extensible and customizable way. It’s relatively easy, even for a project like Mozilla (whose build system/process is quite odd) to start driving infrastructure with Buildbot.
Monitoring Tinderbox frequently serves as the heart of Mozilla development. What does this tool provide that makes it so important to developers?
Ben: Because developers still use the Tinderbox server as their source of information there isn’t currently much direct benefit to them – other than us Release Engineering folk having more time to do interesting things.
In the future we may want to give developers direct access to the Buildbot waterfall. This is where we can make developers’ lives easier. Ideally (depending on what features get implemented) we’d want to provide the following:
- The ability to trigger a clobber build from the web interface (no more CLOBBER files).
- Lots of different ways to display data (“give me the latest build from each builder”, “give me a list of builds from builder X”, etc).
- Build status via IM/RSS/E-Mail.
Specific to the try server
- Ability to stop a running build (useful when one platform fails quickly – you can save time by canceling the others).
- Ability to submit a patch directly from the Buildbot Waterfall.
I want to thank Ben for taking the time to answer these questions for me. I’ve been really impressed with Mozilla’s continuous integration set up – especially the use of Buildbot. I suspect that I’d like to have something similar set up for jQuery (doing automated testing, etc.) but a lot of work is still required to make that happen. At least it doesn’t look that hard to get started, which is quite important.