People behind Debian: Colin Watson, the tireless man-db maintainer and a debian-installer developer

Colin Watson is not a high-profile Debian figure, you rarely see him on mailing lists but he cares a lot about Debian and you will see him on Debconf videos sharing many thoughtful comments. I have the pleasure to work with him on dpkg as he maintains the package in Ubuntu, but he does a lot of more interesting things. I also took the opportunity to ask some Ubuntu specific questions since he’s worked for Canonical since the start. Read on.

My questions are in bold, the rest is by Colin.

Who are you?

Hi. I’m 32 years old, grew up in Belfast in Northern Ireland, but have been living in Cambridge, England, since I was 18. I’m married with a stepson and a daughter.

I became interested in Debian due to the critical mass of Debian work happening in Cambridge at the time (and perhaps more immediately because my roommate was running Debian: “hey, what’s that?”), started doing random bits of development in 2000, and joined as a developer in 2001 (a really exciting time, with lots of new people joining who became integral parts of the project). I’d only really been intending to do QA work and various bits of packaging around the edges, and maybe some work on the BTS, but then Fabrizio Polacco died and I took over man-db from him, and it sort of snowballed from there.

I graduated from university shortly before becoming a Debian developer. I worked for a web server company (Zeus), then a hardware cryptography company (nCipher), before moving to work for Canonical in 2004, since when I’ve been working full-time on Ubuntu. By this point, I suspect that going back to work in an office every day would be pretty tough.

What’s your biggest achievement within Debian or Ubuntu?

One thing I should say: I rarely start projects. Firstly I don’t think I’m very good at it, and secondly I much prefer coming on to an existing project and worrying away at all the broken bits, often after other people have got bored and wandered off to the next new and shiny thing. That’s probably why I ended up in the GNU/Linux distribution world in the first place, rather than doing lots of upstream development from the start – I like being able to polish things into a finished product that we can give to end users.

So, I’ve had my fingers in a lot of pies over the years, doing ongoing maintenance and fixing lots of bugs. I think the single project I’m most proud of would have to be my work on the Debian installer. I joined that team in early 2004 (a few months before Canonical started up) partly because I was a release assistant at the time and it was an obvious hot-spot, and partly because I thought it’d be a good idea to make sure it worked well on the shiny new G4 PowerBook I’d just treated myself to. I ended up as one of the powerpc d-i port maintainers for a while (no longer, as that machine is dead), but I’ve done a lot of core work as well: much of the work to put progress bars in front of absolutely everything that used to have piles of text output, rescue mode, the current kernel selection framework, a good deal of udev support, several significant debconf extensions, lots of os-prober work, and I think I can claim to be one of the few people who understands the partitioner almost top to bottom. :-)

d-i is the very first thing many of our users see, and has a huge range of uses, from simple desktop installs to massive corporate deployments; it’s unspeakably important that it works well, and it’s a testament to its design that it’s been able to trundle along without actually very much serious refactoring for the best part of five years now.

I have a soft spot for man-db too. It was my first major project in Debian, starting out from an embarrassingly broken state, and is now nice and stable to the point where I recently had time to spawn a useful generic library out of it (libpipeline).

What are your plans for Debian Wheezy?

d-i has a lot of code to deal with disks and partitions. Of course a lot of it is in the partitioner, and for that we use libparted so we don’t have to worry very much about the minutiae of device naming. But there are several other cases where we do need to care about naming, mainly before the partitioner when detecting disks, and after the partitioner when installing the boot loader. Back in etch, we introduced ‘list-devices’, which abstracted away the disk naming assumptions involved in hardware detection. In wheezy, I would like to take all the messy, duplicated, and error-prone code that handles disk naming in the boot loader installers, and design a simple interface to cover all of them. This has only got more important following the addition of the kFreeBSD and Hurd d-i ports in squeeze, but it bites us every time we notice that, say, CCISS arrays aren’t handled consistently, and it’s a pain to test all that duplicated code.

I’d also like to spread the use of libpipeline through C programs in the archive, which I think has potential to eliminate a class of security vulnerabilities in a much simpler way than was previously available.

If you could spend all your time on Debian, what would you work on?

I would love to systematically reduce the need for the current mass of boot loaders. There’s a significant cost to having so much variation across architectures here: it’s work that needs to be done in N different places, the wildly differing configuration means that d-i has to have huge piles of code to manage them all differently, and there are a bunch of strange arbitrary limitations on what you can do.

The reason I’m working on GRUB 2 is that, in my view, it’s the project with the best chance of centralising all this duplicated work into a single place, and making it easier to bring up new hardware in future (in a way that doesn’t compromise software freedom, as many proprietary boot loaders of the kind often found on phones do). Of course, with flexibility tends to come complexity, and some people have a natural objection to that and prefer something simpler. The things I don’t quite have time to do here are to figure out a coherent way to address the specific over-complexity problems people have with the configuration framework while still keeping the flexibility we need, and to do enough QA and porting work to be able to roll out GRUB 2 at installation time to all the Debian architectures it theoretically supports.

What’s the biggest problem of Debian?

Backbiting, and too much playing the man rather than the ball. With one or two honourable exceptions, I’ve largely stopped reading most Debian mailing lists since it just never seems a productive way to spend time compared to writing code and fixing bugs; and yet I’m conscious that they’re one of the primary means of communication for the project and I’m derelict in not taking part in them.

I do find it a bit frustrating that people are seen primarily in terms of their affiliations. I suppose it’s natural for people to see me as an “Ubuntu guy”, but I don’t really see myself that way: I’ve been working on Debian for nearly twice as long as I’ve been working on Ubuntu, and, while I care a great deal about both projects, I’ve put far more of my own personal time into Debian and I try to make sure that a decent number of the things I’m involved with there aren’t to do with work. Work/life separation is a good thing, not that I’m very good at it. Generally speaking, when I’m working on Debian, I’m doing so as a Debian developer, because I want Debian to be better. When that’s not the case, if it matters, I try to indicate it explicitly.

You’re working for Canonical since Ubuntu’s inception. If you were Mark Shuttleworth, is there something that you would have done differently?

We had many good intentions when we founded Ubuntu. We also had a huge amount of work to deliver, to the point where it wasn’t at all clear whether it would be possible (the warty release was named based on the expectations of it, after all, and came out much more usable than we’d dared to hope). In hindsight, it might have helped to be quieter about our good intentions, so that we could exceed expectations rather than in some cases failing to meet them. That might have set a very different tone early on.

(Personally, I’m happy I’m not Mark. The decisions in my office are much easier to take.)

It seems to me that the community part of Ubuntu is much more eager to cooperate with Debian than the corporate part. It’s probably just that more and more Canonical employees are not former Debian contributors. Do you also have this feeling? Are there processes in place to ensure everybody at Canonical is trying to do the right thing towards Debian cooperation?

Just to be clear, I’m wearing my own hat here—which, ironically, is a fedora—rather than a company hat.

It makes sense for Canonical to be taking on more non-Debian folks; after all, we can’t simply hire from the Debian community forever, and a variety of backgrounds is healthy. As you say, it may well be natural that Ubuntu developers who don’t work for Canonical are more likely to have a Debianish background, as it tends to take something significant to get people to switch to a very different family of GNU/Linux distributions, and changing jobs is one of the most obvious of those things.

Certainly, there was a definite sense among the early developers that we were all part of the Debian family and cared about the success of Debian as well. As Ubuntu has developed its own identity, people involved in it now tend to care primarily about the success of Ubuntu. At the same time, pragmatically, it’s still true that getting code changes into Debian is one of the most economical ways to land them; changes made in Debian or upstream land once and tend to stay in place, while changes made only in Ubuntu incur an ongoing merge overhead, which is not at all trivial.

In many ways it’s human nature to try to fulfil your immediate goals in the most direct way possible. If your goal is to deliver changes to Ubuntu users, then it’s natural to concentrate on that rather than looking at the bigger picture (which takes experience). Debian developers often fail to send changes upstream for much the same reason, although there’s more variation there because they’re normally working on Debian of their own volition and thus tend to have wider goals; the economics are more or less parallel though.

Thus, I think the best way to improve things is to make it the path of least resistance for Ubuntu developers to send changes to Debian. We’re already seeing how this works with the Ubuntu MOTU group; if you send a patch for review, or work on merging a package from Debian, very often the response includes “have you sent these changes to Debian?”. We’re working on both streamlining our code review through a regular patch pilot programme and requiring more code review for changes in general, so I think this will be a good opportunity to ask more people to work with Debian when they propose changes to Ubuntu.

For myself, this may be obvious, but I notice that I’m much better at getting changes into Debian when I already have commit access to the Debian package in question. All the work on improving collaborative maintenance in Debian can only help, for Ubuntu as well as for everyone else. It doesn’t make so much difference for large changes that require extensive discussion, but there are lots of small changes too.

Canonical is upstream of many software projects (unity, indicators infrastructure, etc.). Why aren’t those software immediately packaged in Debian? Do you think we can get this to change?

I’m not sure what the right approach is here, particularly as I haven’t been involved with much of that on the Ubuntu side. I suspect it would be helpful to look at this in a similar way to Ubuntu changes in general. It’s understandable that those developers have getting changes into Ubuntu as their first goal. And yet, having code in Debian offers a wider, and often technically adept, audience, and most developers like having their code reach a wider audience even if it’s not their first priority, particularly if that audience is likely to be able to help with finding problems and fixing bugs. It should be seen as something beneficial to both distributions.

The hardest problems will be with things that aren’t merely optional add-ons (which should generally be fairly non-controversial in Debian, given the breadth of the archive in general – the existence of things like bzr and germinate as Debian packages was never a hard question), but which require changes in established packages. For example, gnome-power-manager in Ubuntu is built with application indicator support, and that’s an important part of having a good indicator-based panel: a lot of the point of indicators is consistency. Since I do very little desktop work myself, I don’t know exactly what would be involved in making it possible to choose this system based on a Debian desktop, but I think it’s probably a bit more complicated than just making sure all the new packages exist in Debian too. Obviously you have to start somewhere.

Is there someone in Debian that you admire for his contributions?

Christian Perrier is absolutely tireless and has done superb things for the state of translations in Debian. And Russ Allbery, even aside from his fine ongoing work on policy, Lintian, and Kerberos, is a constant voice of sanity and calmness.

Release management is incredibly hard work, as I know from my own experience, and anyone who can sustain involvement in it for a long period is somebody pretty special. Steve Langasek and I got involved at about the same time but he outlasted me by quite a few years. He deserves some kind of medal for everything he’s done there.


Thank you to Colin for the time spent answering my questions. I hope you enjoyed reading his answers as I did. Subscribe to my newsletter to get my monthly summary of the Debian/Ubuntu news and to not miss further interviews. You can also follow along on Identi.ca, Twitter and Facebook.

Latest features of dpkg-dev: debian packaging tools

I’m attending the mini-Debconf Paris and I just gave a talk about the latest improvement of dpkg-dev—the package providing the basic tools used to build Debian packages. Latest is a bit stretched since it embraces the last 2-3 years of development.

My talk covered the following topics:

  • Support of symbols files by dpkg-shlibdeps, dpkg-gensymbols
  • Support of new source formats by dpkg-source
  • Supplementary options for dpkg-source
  • Cross distribution collaboration with dpkg-vendor
  • Custom compilation flags with dpkg-buildflags
  • Miscellaneous improvements to other tools

The slides are relatively verbose so that you can understand them even if you did not attend the talk. Click here to get the slides.

Related links

This section points to various articles that cover more extensively some of the features mentioned in my talk.

Concerning dpkg-source:

Concerning dpkg-maintscript-helper:

Concerning dpkg-vendor:

The secret plan behind the “3.0 (quilt)” Debian source package format

New source package formats do wondersWhile I have spent countless hours working on the new source format known as “3.0 (quilt)”, I’ve just realized that I have never blogged about its features and the reasons that lead me to work on it. Let’s fix this.

The good old “1.0″ format

Up to 2008, dpkg-source was only able to cope with a single source format (now named “1.0″). That format was used since the inception of the project. While it worked fine for most cases, it suffered from a number of limitations—mainly because it stored the Debian packaging files as a patch to apply on top of the upstream source tarball.

This patch can have two functions: creating the required files in the debian sub-directory and applying changes to the upstream sources. Over time, if the maintainer made several modifications to the upstream source code, they would end up entangled (and undocumented) in this single patch. In order to solve this problem, patch systems were created (dpatch, quilt, simple-patchsys, dbs, …) and many maintainers started using them. Each implementation is slightly different but the basic principle is always the same: store the upstream changes as multiple patches in the debian/patches/ directory and apply them at build-time (and remove them during cleanup).

Design goals for the new formats

When I started working on the new source package format, I set out to get rid of all the known limitations and to integrate a patch system in dpkg-source. I wanted to clear up the situation so that learning packaging only requires to learn one patch system and would not require modifying debian/rules to use it. I picked quilt because it was popular, came with a large set of features, and was not suffering from NIH syndrome. This lead to the “3.0 (quilt)” source format.

I also created “3.0 (native)” as a distinct format. “1.0″ was able to generate two types of source packages (native and non-native) but I did not want to continue with this mistake of mixing both in a single format. The KISS principle dictated that the user should pick the format of his choice, put it in debian/source/format and be done with it. Now the build can rightfully fail when the requirements are not met instead of doing something unexpected as a fallback.

Features of “3.0 (quilt)”

This is the format that replaces the non-native variant of the 1.0 source format. The features below are specific to the new format and differentiate it from its ancestor:

  • Supports compression formats other than gzip: bzip2, lzma, xz.
  • Can use multiple upstream tarballs.
  • Can include binary files in the debian packaging.
  • Automatically replaces the “debian” directory present in the upstream tarball (no repacking required).
  • Creates a new quilt-managed patch in debian/patches/ when it finds changes to the upstream files.

Features of “3.0 (native)”

This format is very similar to the native variant of the 1.0 source format except for two things:

  • it supports compression formats other than gzip: bzip2, lzma, xz.
  • it excludes by default a bunch of files that should usually not be part of the tarball (VCS specific files, vim backup files, etc.)

Timeline

Looking back at the history is interesting. This project already spans multiple years and is not really over until a majority of packages have switched to the new formats.

  • January 2008: the discussion how to cope with patches sanely rages on debian-devel@lists.debian.org. My initial decisions are the result of this discussion.
  • March 2008: I have implemented the new formats and I request feedback. dpkg 1.14.17 (uploaded to experimental) is the first release supporting them.
  • April 2008: I ask ftpmasters to support the new source packages in #457345.
  • June 2008: Lenny freeze. dpkg is not supposed to change anymore. Several changes concerning the new source formats are still accepted in the following months given that this code is not yet used in production and must only be present so that lenny can cope with new source packages once squeeze starts using them.
  • February 2009: Lenny release.
  • March 2009: Work on squeeze has started, ftpmasters have done nothing to support new source formats, I submit a patch in #457345 to speed things up. I start a wiki page to track the project’s progress and to answer common questions of maintainers.
  • November 2009: After an ftpmaster sprint, it’s now possible to upload new source packages in unstable. This draws massive attention to the new format and some people start complaining about some design decisions. The implementation of “3.0 (quilt)” changes a lot during this month. dpkg in lenny is even updated to keep up with those changes.
  • March 2010: Up to now, I was planning to let dpkg-source build new source packages by default at some point in the future. After several rounds of discussions, I agree that it’s not the best course of action and decides instead to make debian/source/format mandatory. The maintainer must be explicit about the source format that s/he wants to use.
  • October 2010: The new source formats are relatively popular, a third of the source packages have already switched: see the graph. The squeeze freeze in August clearly stopped the trend, hopefully it will continue once squeeze is released.
  • June 2013: Project is finished?

As you can see this project is not over yet, although the most difficult part is already behind me. For my part, the biggest lesson is that you won’t ever get enough review until your work is used within unstable. So if you have a Debian project that impacts a lot of people, make sure to organize an official review process from the start. And specifying your project through a Debian Enhancement Proposal is probably the best way to achieve this.

If you appreciate the work that I put into this project, feel free to join Flattr and to flattr dpkg from time to time. Or check out my page “Support my work“.

Understanding Membership Structures in Debian and Ubuntu

Debian and Ubuntu have a set of official membership roles that can be granted to regular contributors. Those roles come with rights that enable the contributors to do their work and to participate in the project governance (elections and other official decision-making processes). It’s also a way for the distributions to acknowledge the work done: most contributors are proud of the status they reached.

The membership structure plays an important role in the development of a distribution: it defines the kind of contributors that are welcome in the project, it sets expectations of the project towards its contributors and defines their rights. In the end, this shapes the project’s ability to recruit new contributors to keep the project alive and kicking. This article introduces the existing statuses in Debian and Ubuntu, and defines the — sometimes confusing — jargon associated with them.

The Debian Case

Debian only has two types of official members: Debian Developers (DD) and Debian Maintainers (DM). The rights of the developers are codified in the Debian Constitution while those of the maintainers have been defined in a general resolution of 2007. The Debian Maintainer status is still mostly documented in a wiki page. The integration of this new status in Debian’s official processes has been slow to come largely because it was introduced — at that time — without enough negotiation with the involved parties. Nowadays, it is preferred that people get the DM status before applying for DD.

DM is a very limited role: maintainers can only upload packages that already have their name on them (either in the Maintainer or Uploaders field) and a specific flag (DM-Upload-Allowed: yes) that only Debian Developers can add. They have no other rights and limited access to Debian’s resources.

Besides those official roles, there are also maintainers of packages that have no official status within Debian except that they are listed in the “Maintainer” field of the package. They are doing the maintenance work but all uploads are done by a Debian Developer after verification of the work done (this is called “sponsorship” and is the only way to start with official packaging work). Once the DD trusts the maintainer, the developer will typically ask the maintainer to apply for DM status in order to be relieved from the sponsorship work.

In the end, that makes three different kind of package maintainers and a lot of confusion when you discuss membership issues… in particular when the New Maintainer process is the path that you follow to become a Debian Developer. Don’t be fooled by the names when reading Debian’s documentation!

The Ubuntu Case

Ubuntu had, from the start, an official Ubuntu Member status that includes all contributors: developers of course, but also documentation writers, artists, translators, etc. This status notably grants the right to vote in elections of the Community Council, the right to participate on Planet Ubuntu, and the @ubuntu.com email alias.

For developers, the situation is more complicated: the wiki page lists no less than five different statuses. Initially, developers were split between Ubuntu Core Developers and the MOTU (Masters Of The Universe). The latter were responsible of the universe/multiverse sections of the archive while the former also had upload rights for the main/restricted sections. But, inspired by the Debian Maintainers concept and facing concrete problems in terms of archive management, they changed their infrastructure to offer more fine-grained control on package uploads.

Ubuntu can now grant upload permissions on a package-per-package basis, but it can also delegate the right to grant upload permissions with the same granularity. This lead to the new Per-Package Uploader status which is simply an Ubuntu Member with upload rights on a limited set of packages where they have a specific expertise. The more generic Ubuntu Developers status now encompasses members of various development teams that have been delegated the right to manage upload permissions on a (usually large) package set (the current teams are Ubuntu Desktop, Mythubuntu, Kubuntu, and Edubuntu). Those teams can define their own policy to add new members provided they follow the basic rules defined by the Developer Membership Board (see this wiki page).

Ubuntu Contributing Developer is an intermediate status for someone who is not yet ready for one of the other developer statuses but who has still shown enough commitment to be an Ubuntu Member.

All those statuses can be obtained in a similar way: you prepare a wiki page listing your past contributions, you collect testimonials from existing members that you have worked with, you add yourself in the agenda of the next meeting of the board (or council) that grants the status that you seek, and you attend the meeting. The members of the board will decide whether you are ready for the status (or not) based on what you provided in the wiki, based on your answers during the meeting (and on a mailing-list for developers), and based on what others have to say about you.

The most important boards are usually elected by the community while others are commonly appointed by the community council. Those governance bodies include Canonical employees but not as many as one would expect: two out of eight in the Developer Membership Board, two out of eight in the Community Council, but all six members of the Technical Board. The last figure, while not intended, is not surprising given the high expectations set on potential members of the technical board. Mark, as the founder, is the only person to have a permanent seat on both the Community Council and the Technical Board.

Comparison of the Statuses Between Debian and Ubuntu

The following table summarizes the rights given to each developer role in the two projects (Put the mouse over the abbreviations to know what they are referring to).

Rights Debian Ubuntu
DM DD UM PPU/UD MOTU UCD
Package maintenance via sponsorshipYN/AYYYN/A
Official email alias-YYYYY
Participate in votes for members-YYYYY
Participate in votes for developers-Y-YYY
Upload rights restricted to pre-approved packagesY--Y--
Upload rights restricted to a section of the archive----Y-
Unlimited upload rights-Y---Y
Number of contributors (as of 2010-07-27)117904462278563

Please note that the number of contributors are not 100% accurate for Ubuntu. A contributor can have multiple statuses (direct membership to a launchpad group) granted over time (while gaining experience). The problem has been mostly avoided by calculating differences between number of members of the various groups but it’s not perfect and it can’t be: some MOTU are also PPU for packages in main and it’s legitimate (but I only counted them as MOTU and not as PPU). Another limitation is that members of some administrative teams are included indirectly in many teams and thus appear in the count while they should not.

Anyway, this simple table makes it obvious that Ubuntu’s structure offers a broader choice of statuses. They acknowledge the work of all contributors from the start while still giving the most critical rights only to those who have proven that they deserve them. Despite this difference, Debian still has a significant advantage in terms of number of developers. That number does not tell the whole story though: the Ubuntu contributors include many Canonical employees (e.g. 36 out of 63 core developers have a @canonical.com email registered on their launchpad account) that are likely to spend more time working on the distribution than the average Debian member. But even if comparing person-hours would be a challenging thought experiment, in practice it’s of not much interest if both projects continue to cooperate and if more and more of the contributions flow in both directions.

Debian is aware of the shortcomings of its structure. Changes to better accommodate non-packagers have been discussed several times already. The last efforts in that direction were unfortunately perceived as a solution ready to go rather than a proposal to be discussed, and the project got quickly buried by a general resolution (GR). Even if that resolution invited for further discussion and a new proposal, the truth is that when someone’s initiative is “corrected” by way of GR, it usually kills any motivation to go forward.

Possible Evolution?

On the Ubuntu side, the infrastructural changes were completed recently and they don’t expect any further change in the near future. They do plan, however, to expand usage of those new features so that more teams benefit from the possibility to control upload rights on packages that are relevant to them, and so that more individuals developers apply to become Per-Package Uploaders on packages that they know very well.

On the Debian side, a recent discussion on the debian-project list brought back the topic of the bad terminology and it was agreed that the “New Maintainer process” should be renamed into something else (“New Developer process” has been suggested). But Christoph Berg — Debian Account Manager and hence heavily involved in the New Maintainer Team — suggested that Debian would be better off implementing the long-awaited membership changes before trying to update all the documentation. It would certainly imply some more vocabulary updates. Later in the discussion, he confirmed that membership reform is on the top of the TODO list of the new maintainer team (just after the rewrite of the nm.debian.org website).

What can be expected from this reform? The following answers are my own guesses based on my experience of Debian, but the project hasn’t decided anything yet.

  • First of all: a new status for contributors that are not packagers. The tricky part will be defining the process to follow and the rights granted.

  • Changes to the technical implementation of the DM status. The current implementation does not allow to give upload rights to a single DM if two are listed in the Uploaders field of a package (and both might not have the same experience for that package). Furthermore, it suffers from annoying restrictions like the inability to upload new binary packages.

  • A change of the Debian constitution to integrate those new statuses is almost unavoidable.

  • Other more invasive changes have been proposed like replacing the NM process by a simple designation by other DD, but it’s unlikely to happen. The NM process can already be greatly simplified by the application manager if the applicant can show good testimonials from other developers and if he has a track record of real contributions (e.g. as witnessed by changelog entries in Debian packages).

Almost two years have elapsed since the previous efforts in that direction, the new maintainer team has recruited new members and is in a general better shape. Hopefully, the next episode of this saga will have a better outcome.

This article was first published in Linux Weekly News. In a comment, Mark Shuttleworth tried to explain how the Ubuntu community is being setup.