Test Driven Development with CppUTest, now in Debian

I have recently read Test Driven Development with Embedded C by James W. Grenning and published by Pragmatic Programmers.

I really enjoyed the book: while I was aware of the huge benefits of having a comprehensive test suite, I never studied seriously the principles behind Test Driven Development (TDD) and this book makes a good introduction to the topic. At the same time it focuses on the C language and contains lots of examples on how you can create tests even for projects which have to interact with hardware or other unpredictable components (the key is to create many abstractions) using all the possibilities that C offers.

The author convincingly argues that developing code with TDD forces you to create a modular design that is easier to evolve when the underlying requirements change. He also highlights how the tests serve as reference documentation of the API.

James W. Grenning recommends CppUTest as his xUnit test framework of choice. When I wanted to try this test framework, I discovered that it was not available in Debian. I decided to package it because it has some interesting features not offered by the contenders (at least not to my knowledge). It’s now available in Debian and in Ubuntu.

First, it doesn’t require any explicit registration of tests and has a very lightweight syntax. The small downside is that CppUTest requires the usage of C++ for the tests. But C++ is compatible with C so it doesn’t matter much if you have a C++ compiler for your target. On the contrary, usage of variables and methods scoped to the test group makes it easy to write clear tests. Here’s a short sample of test code:

extern "C" {
#include "timer.h"
#include "timefn.h"
}
 
#include "CppUTest/TestHarness.h"
 
static Time the_time;
static const int start_sec = 123;
static const int start_nsec = 456789000;
static const int delay_sec = 8;
static const int delay_nsec = 111111000; // start_nsec + delay_nsec < 10^9
 
TEST_GROUP(Timer)
{
    /* Class variables available to all tests in the group */
    Timer timer;
    Delay remaining;
 
    /* Standard setup/teardown methods of xUnit tests */
    void setup() {
        timer = timer_new();
        time_set(&the_time, start_sec, start_nsec);
        /* [...] */
    }
 
    void teardown() {
        timer_free(timer);
        /* [...] */
    }
 
    /* Helper functions specific to the test group */
    void start_timer_with_delay(long sec, long nsec)
    {
        timer_set_real_delay(timer, sec, nsec);
        timer_start(timer);
    }
 
    void ensure_remaining_is(long sec, long nsec)
    {
        CHECK_EQUAL(sec, delay_get_seconds(remaining));
        CHECK_EQUAL(nsec, delay_get_nanoseconds(remaining));
    }
};
 
TEST(Timer, NewIsNotStarted)
{
    CHECK(!timer->started);
}
/* [...] */
TEST(Timer, GetRemainingTimeWithNanosecondPrecision_ShiftOfSeconds)
{
    start_timer_with_delay(delay_sec, delay_nsec);
    time_set(&the_time, start_sec + delay_sec - 5, start_nsec + delay_nsec + 1000);
 
    remaining = timer_get_remaining_time(timer);
 
    ensure_remaining_is(4, 999999000);
}

To run those tests, you just need this boilerplate code in a main.cpp:

#include "CppUTest/CommandLineTestRunner.h"
 
int main(int argc, char** argv)
{
   return CommandLineTestRunner::RunAllTests(argc, argv);
}

Another interesting feature is its integrated memory leak detection system. Any test that hasn’t released allocated memory at the end of the “teardown” process will be marked as failed.

The upstream developers have made some unusual choices (static library only, installation in a private directory) but this will likely change with the switch to an automake and autoconf-based build system. I have reported the oddities that I found and I requested them to provide a pkg-config file to make it easier to compile and link unit tests exploiting CppUTest.

I already used CppUTest to develop a small application running on an embedded Linux. At some point, I might try to use CppUTest for dpkg development. I believe that it makes for a good fit. dpkg is already C++ ready since dselect is written in C++ and reuses a good part of dpkg’s code.

In any case, if you like Test Driven Development and are writing C or C++ based applications, I invite you to try CppUTest.

People Behind Debian: Mark Shuttleworth, Ubuntu’s founder

I probably don’t have to present Mark Shuttleworth… he was already a Debian developer when he became millionaire after having sold Thawte to Verisign in 1999. Then in 2002 he became the first African (and first Debian developer) in space. 2 years later, he found another grandiose project to pursue: bring the Microsoft monopoly to an end with a new alternative operating system named Ubuntu (see bug #1).

I have met Mark during Debconf 6 in Oaxtepec (Mexico), we were both trying to find ways to enhance the collaboration between Debian and Ubuntu. The least I can say is that Mark is opinionated but any leader usually is, and in particular the self-appointed ones! :-)

Read on to discover his view on the Ubuntu-Debian relationship and much more.

Raphael: Who are you?

Mark: At heart I’m an explorer, inventor and strategist. Change in technology, society and business is what fascinates me, and I devote almost all of my time and wealth to the catalysis of change in a direction that I hope improves society and the environment.

I’m 38, studied information systems and finance at the University of Cape Town. My ‘hearts home’ is Cape Town, and I’ve lived there and in Star City and in London, now I live in the Isle of Man with my girlfriend Claire and 14 precocious ducks. I joined Debian in around 1995 because I was helping to setup web servers for as many groups as possible, and I thought Debian’s approach to packaging was very sensible but there was no package for Apache. In those days, the NM process was a little easier ;-)

Raphael: What was your initial motivation when you decided to create Ubuntu 7 years ago?

Mark: Ubuntu is designed to fulfill a dream of change; a belief that the potential of free software was to have a profound impact on the economics of software as well as its technology. It’s obvious that the technology world is enormously influenced by Linux, GNU and the free software ecosystem, but the economics of software are still essentially unchanged.

Before Ubuntu, we have a two-tier world of Linux: there’s the community world (Debian, Fedora, Arch, Gentoo) where you support yourself, and the restricted, commercial world of RHEL and SLES/SLED. While the community distributions are wonderful in many regards, they don’t and can’t meet the needs of the whole of society; one can’t find them pre-installed, one can’t get certified and build a career around them, one can’t expect a school to deploy at scale a platform which is not blessed by a wide range of institutions. And the community distributions cannot create the institutions that would fix that.

Ubuntu brings those two worlds together, into one whole, with a commercial-grade release (inheriting the goodness of Debian) that is freely available but also backed by an institution.

The key to that dream is economics, and as always, a change in economics; it was clear to me that the flow of money around personal software would change from licensing (“buying Windows”) to services (“paying for your Ubuntu ONE storage”). If that change was coming, then there might be room for a truly free, free software distribution, with an institution that could make all the commitments needed to match the commercial Linux world. And that would be the achievement of a lifetime. So I decided to dedicate a chunk of my lifetime to the attempt, and found a number of wonderful people who shared that vision to help with the attempt.

It made sense to me to include Debian in that vision; I knew it well as both a user and insider, and believed that it would always be the most rigorous of the community distributions. I share Debian’s values and those values are compatible with those we set for Ubuntu.

“Debian would always be the most rigorous of the community distributions.”

Debian on its own, as an institution, could not be a partner for industry or enterprise. The bits are brilliant, but the design of an institution for independence implies making it difficult to be decisive counterparty, or contractual provider. It would be essentially impossible to achieve the goals of pre-installation, certification and support for third-party hardware and software inside an institution that is designed for neutrality, impartiality and independence.

However, two complementary institutions could cover both sides of this coin.

So Ubuntu is the second half of a complete Debian-Ubuntu ecosystem. Debian’s strengths complement Ubuntu’s, Ubuntu can achieve things that Debian cannot (not because its members are not capable, but because the institution has chosen other priorities) and conversely, Debian delivers things which Ubuntu cannot, not because its members are not capable, but because it chooses other priorities as an institution.

Many people are starting to understand this: Ubuntu is Debian’s arrow, Debian is Ubuntu’s bow. Neither instrument is particularly useful on its own, except in a museum of anthropology ;)

“Ubuntu is Debian’s arrow, Debian is Ubuntu’s bow.”

So the worst and most frustrating attitude comes from those who think Debian and Ubuntu compete. If you care about Debian, and want it to compete on every level with Ubuntu, you are going to be rather miserable; you will want Debian to lose some of its best qualities and change some of its most important practices. However, if you see the Ubuntu-Debian ecosystem as a coherent whole, you will celebrate the strengths and accomplishments of both, and more importantly, work to make Debian a better Debian and Ubuntu a better Ubuntu, as opposed to wishing Ubuntu was more like Debian and vice versa.

Raphael: The Ubuntu-Debian relationship was rather hectic at the start, it took several years to “mature”. If you had to start over, would you do some things differently?

Mark: Yes, there are lessons learned, but none of them are fundamental. Some of the tension was based on human factors that cannot really be altered: some of the harshest DD critics of Canonical and Ubuntu are folk who applied for but were not selected for positions at Canonical. I can’t change that, and wouldn’t change that, and would understand the consequences are, emotionally, what they are.

Nevertheless, it would have been good to be wiser about the way people would react to some approaches. We famously went to DebConf 5 in Porto Allegre and hacked in a room at the conference. It had an open door, and many people popped a head in, but I think the not-a-cabal collection of people in there was intimidating and the story became one of exclusion. If we’d wanted to be exclusive, we would have gone somewhere else! So I would have worked harder to make that clear at the time if I’d known how many times that story would be used to paint Canonical in a bad light.

As for engagement with Debian, I think the situation is one of highs and lows. As a high, it is generally possible to collaborate with any given maintainer in Debian on a problem in which there is mutual interest. There are exceptions, but those exceptions are as problematic within Debian as between Debian and outsiders. As a low, it is impossible to collaborate with Debian as an institution, because of the design of the institution.

“It is generally possible to collaborate with any given maintainer […] [but] it is impossible to collaborate with Debian as an institution.”

In order to collaborate, two parties must make and keep commitments. So while one Debian developer and one Ubuntu developer can make personal commitments to each other, Debian cannot make commitments to Ubuntu, because there is no person or body that can make such commitments on behalf of the institution, on any sort of agile basis. A GR is not agile ;-). I don’t say this as a critique of Debian; remember, I think Debian has made some very important choices, one of those is the complete independence of its developers, which means they are under no obligation to follow a decision made by anyone else.

It’s also important to understand the difference between collaboration and teamwork. When two people have exactly the same goal and produce the same output, that’s just teamwork. When two people have different goals and produce different product, but still find ways to improve one anothers product, that’s collaboration.

So in order to have great collaboration between Ubuntu and Debian, we need to start with mutual recognition of the value and importance of the differences in our approach. When someone criticises Ubuntu because it exists, or because it does not do things the same way as Debian, or because it does not structure every process with the primary goal of improving Debian, it’s sad. The differences between us are valuable: Ubuntu can take Debian places Debian cannot go, and Debian’s debianness brings a whole raft of goodness for Ubuntu.

Raphael: What’s the biggest problem of Debian?

Mark: Internal tension about the vision and goals of Debian make it difficult to create a harmonious environment, which is compounded by an unwillingness to censure destructive behaviour.

Does Debian measure its success by the number of installs? The number of maintainers? The number of flamewars? The number of packages? The number of messages to mailing lists? The quality of Debian Policy? The quality of packages? The “freshness” of packages? The length and quality of maintenance of releases? The frequency or infrequency of releases? The breadth of derivatives?

Many of these metrics are in direct tension with one another; as a consequence, the fact that different DD’s prioritise all of these (and other goals) differently makes for… interesting debate. The sort of debate that goes on and on because there is no way to choose between the goals when everyone has different ones. You know the sort of debate I mean :-)

Raphael: Do you think that the Debian community improved in the last 7 years? If yes, do you think that the coopetition with Ubuntu partly explains it?

Mark: Yes, I think some of the areas that concern me have improved. Much of this is to do with time giving people the opportunity to consider a thought from different perspectives, perhaps with the benefit of maturity. Time also allows ideas to flow and and of course introduces new people into the mix. There are plenty of DD’s now who became DD’s after Ubuntu existed, so it’s not as if this new supernova has suddenly gone off in their galactic neighbourhood. And many of them became DD’s because of Ubuntu. So at least from the perspective of the Ubuntu-Debian relationship, things are much healthier.

We could do much better. Now that we are on track for four consecutive Ubuntu LTS releases, on a two-year cadence, it’s clear we could collaborate beautifully if we shared a freeze date. Canonical offered to help with Squeeze on that basis, but institutional commitment phobia reared its head and scotched it. And with the proposal to put Debian’s first planned freeze exactly in the middle of Ubuntu’s LTS cycle, our alignment in interests will be at a minimum, not a maximum. Pure <facepalm />.

Raphael: What would you suggest to people (like me) who do not feel like joining Canonical and would like to be paid to work on improving Debian?

Mark: We share the problem; I would like to be paid to work on improving Ubuntu, but that’s also a long term dream ;-)

Raphael: What about using the earnings of the dormant Ubuntu Foundation to fund some Debian projects?

Mark: The Foundation is there in the event of Canonical’s failure to ensure that commitments, like LTS maintenance, are met. It will hopefully be dormant for good ;-)

Raphael: The crowdfunding campaign for the Debian Administrator’s Handbook is still going on and I briefly envisioned the possibility to create the Ubuntu Administrator’s Handbook. What do you think of this project?

Mark: Crowdfunding is a great match for free software and open content, so I hope this works out very well for you. I also think you’d find a bigger market for an Ubuntu book, not because Ubuntu is any more important than Debian but because it is likely to appeal to people who are more inclined to buy or download a book than to dive into the source.

Again, this is about understanding the difference in audiences, not judging the projects or the products.

Raphael: Is there someone in Debian that you admire for their contributions?

Mark: Zack is the best DPL since 1995; it’s an impossible job which he handles with grace and distinction. I hope praise from me doesn’t tarnish his reputation in the project!


Thank you to Mark for the time spent answering my questions. I hope you enjoyed reading his answers as I did.

Subscribe to my newsletter to get my monthly summary of the Debian/Ubuntu news and to not miss further interviews. You can also follow along on Identi.ca, Google+, Twitter and Facebook

.

20 Things to Learn About APT With the Free Chapter of the Debian Administrator’s Handbook

We just released a sample chapter of the Debian Administrator’s Handbook. It covers the APT family of tools: apt-get, aptitude, synaptic, update-manager, etc.


Click here to get your free sample chapter

I’m sure you will enjoy it. There are many interesting things to learn:

  • How to customize the sources.list file
  • The various APT repositories that Debian offers (Security Updates, Stable Updates, Proposed Updates, Backports, Experimental, etc.)
  • How to select the best Debian mirror for you
  • How to find old package versions
  • How to install the same selection of packages on multiple computers
  • How to install and remove a package on a single command-line
  • How to reinstall packages and how to install a specific version of a package
  • How to pass options to dpkg via APT
  • How to configure a proxy for APT
  • How to set priorities to various package sources (APT pinning)
  • How to safely mix packages from several distributions on a single system
  • How to use aptitude’s text-mode graphical interface
  • How to use the tracking of automatically installed packages to keep a clean system
  • How APT checks the authenticity of packages that it downloads
  • How to add supplementary GnuPG keys to APT’s trusted keyring
  • How to upgrade from one stable distribution to the next
  • How to handles problems after an upgrade
  • How to keep a system up-to-date
  • How to automate upgrades
  • How to find the package that you’re looking for

If you liked this chapter, click here to contribute a few euros towards the liberation of the whole book. That way you’ll get a copy of the ebook as soon as it’s available. Thank you!

I also invite you to share this sample chapter as widely as possible. We’re only at 40% of the liberation fund and there’s less than 2 weeks left. I hope this book extract will convince enough people that the book is going to be great, and that it really deserves to be liberated and bundled with Debian!

What about creating The Ubuntu Administrator’s Handbook?

I am currently running a crowdfunding campaign whose ultimate goal is to liberate the English translation of a French book that I have written. This book will be named The Debian Administrator’s Handbook because it has primarily been written for Debian.

Creating a new Ubuntu book based on The Debian Administrator’s Handbook

But since Ubuntu is based on Debian, a large part of its content applies equally well to Ubuntu. While discussing with Mark Shuttleworth, he suggested me to reuse those parts and to create a new book dedicated to Ubuntu. It would also cover the latest cloud technologies that Ubuntu has been delivering (since this is a topic that the current book does not cover).

This is something that I have been envisioning for a while and something that I would be ready to try if we manage to complete the liberation of the current book. This project would then bring a truly free book to the Ubuntu ecosystem.

Why? The official Ubuntu books are not really free

There’s a policy in place that ensures that official Ubuntu books use a free software/culture license and they are effectively available under the terms of a Creative Commons Share Alike license. But try to create a derivative book… you won’t find the “sources” (LaTeX or DocBook usually with most big books). You can only find a few PDF copies if you google for it. But this is really not the preferred form of modification for such a book.

Those books are also not packaged. Ubuntu much like Debian deserves to have a good book embodying the values of free software that can be shipped together with its product.

When I speak of liberation of the book, I really mean it in the way that free software hackers are used to: a public Git repository containing the DocBook sources, the pictures and the .dia files for the various schemas.

Help Ubuntu by spreading the word

I understand that at this point this proposed Ubuntu book is really hypothetical (“vaporware” one could say) but we need to go step by step to make it a reality. And the first step is to ensure that we manage to liberate the Debian Administrator’s Handbook.

For this I am seeking the support of the Ubuntu community to promote the current fundraising campaign. If the perspective of the Ubuntu book is not enough to convince you, you’ll be glad to learn that I also commit to give back to Ubuntu 15% of the money raised via the link below (once VAT has been subtracted).

Click here to go to the crowdfunding campaign page and pledge a few euros. Then share this article (or the link http://debian-handbook.info/go/ulule-ubuntu/) and convince others to participate.

At this point, the liberation target is entirely reachable with your help and the help of the community: the remaining 18 K€ needed in the liberation fund represent 720 persons giving 25 EUR each or 1800 persons giving 10 EUR each.

Thank you very much for your support and your help in this project!

Trying to make dpkg triggers more useful and less painful

Lately I have been working on the triggers feature of dpkg. I would like to share my plan and what I have done so far. I’ll first explain what triggers are, the current problems, and the work I did to try to improve the situation.

Introduction

Dpkg triggers are a neat feature of dpkg that package can use to send/receive notifications to/from other installed packages. Those notifications take the form a simple string.

This feature is heavily used to track changes of packaged files in a list of predefined directories, and to update other files based on this. For instance, man-db is watching the directories containing manual pages so that it can update its cache (in /var/cache/man/). install-info is updating the index of info pages when there have been changes in /usr/share/info. gnome-menus is updating its own copy of the menu hierarchy (with entries from /etc/gnome/menus.blacklist blacklisted) every time that a .desktop file is installed/updated/removed.

From a user’s perspective

You see triggers in action very often during upgrades (in fact too often as we’ll see it later):

Preparing to replace zim 0.52-1 (using .../archives/zim_0.52-1_all.deb) ...
Unpacking replacement zim ...
Processing triggers for shared-mime-info ...
Processing triggers for menu ...
Processing triggers for desktop-file-utils ...
Processing triggers for man-db ...
Processing triggers for hicolor-icon-theme ...
Processing triggers for python-support ...
Processing triggers for gnome-menus ...
Setting up zim (0.52-1) ...
Processing triggers for python-support ...
Processing triggers for menu ...

As you guessed it, those “Processing triggers” lines correspond to the packages which received (one or more) trigger notifications and which are doing the corresponding task.

By default the triggers are processed at the end of the dpkg --unpack invocation which is often too soon because APT will often call dpkg --unpack repeatedly during important upgrades. There are some options to ask APT to use dpkg’s --no-triggers option in order to defer the trigger processing at the end of the APT run. You can put this in /etc/apt/apt.conf.d/triggers:

// Trigger deferred
DPkg::NoTriggers "true";
PackageManager::Configure "smart";
DPkg::ConfigurePending "true";
DPkg::TriggersPending "true";

I have now asked APT maintainers to use those options by default, I filed bug #626599 to track this. At the same time I fixed bug #526774 reported by APT maintainers. This bug forced them to put a work-around in APT which resulted in running triggers sooner than expected.

(And while writing this article I filed bug #628564 and #628574 because it was clearly not normal that the menu triggers was executed twice for the installation of a single package)

From a packager’s perspective

The implementation of triggers has several consequences on the status that packages can have.

Let’s assume that the package A installs a file in a directory that is watched by package B (and that B is currently in the “installed” state). When A is unpacked, dpkg adds B to its “Triggers-Awaited” field and lists the activated trigger in B’s “Triggers-Pending” field. Package A is in “unpacked” state, but B has been changed to “triggers-pending”.

When A is configured, instead of going to the “installed” state, it will go to the “triggers-awaited” state. In that state the package is assumed to NOT fulfill dependencies. However, B—which is still in “triggers-pending” state—does fulfill dependencies.

A and B will switch to “installed” at the same time when the trigger has been processed.

The fact that the triggers-awaited status does not fulfill dependencies means that some common triggers like man-db have to be processed regularly just to be able to ensure dependencies are satisfied before running the postinst of other installed packages.

But a package which ships a manual page can certainly be considered as configured when its postinst has been run even if man-db has not yet updated its cache to know about the new/updated manual page.

When you activate a trigger with the dpkg-trigger command you have an option --no-await to avoid awaiting the trigger processing (and thus to go directly to installed state after the postinst has been run). But with file triggers or activate trigger directives, you do not have this option.

My proposal to improve the situation

This is the problem that I tried to solve during my last vacation. But before changing the inner working of triggers, I wrote a non-regression test suite for that feature (commit here) so I could hack with some confidence that I did not break everything.

The result has been presented on the debian-dpkg mailling list: see the discussion here. I added new directives that can be used in triggers files that work exactly like the current triggers except that they do not put triggering packages in trigger-awaited status.

I believe the code to be mostly ready, but in its current form the patch brings zero benefits until all packages have been converted to use the trigger variants that do not require awaiting trigger processing (and the change requires a pre-dependency on dpkg to ensure we have the required dpkg that understands the new kind of trigger directives).

Remaining question

Thus I wonder if I should not change the default semantic of triggers. The packages which really provide crucial functionality to awaiting packages through triggers would then have to be updated to switch to the new directives.

If you’re a packager using triggers, you can thus help me by answering this question: do you know some triggers where it’s important that the awaiting packages are not considered as configured before the trigger processing? In most of the cases I checked, it’s important for the triggered package rather than for the triggering package.

In truth, a package in triggers-awaited status is usually in a good enough shape to be able to satisfy dependencies (i.e. requirements that other packages can have), but it would still be worth to record the fact that it’s not entirely configured yet because it might be true from the user’s point of view: for example if the menu trigger has not yet been processed, the software might not yet be visible in the application menu.

If you appreciate this kind of groundwork that benefits to the whole Debian ecosystem, please consider supporting my work. Click here and give it a look, there are many ways to contribute and to make a difference for me.

People behind Debian: Steve Langasek, release wizard

Steve Langasek has been contributing to Debian for more than a decade. He was a release manager for sarge and etch, and like many former release managers, he’s still involved in the Debian release team although as a release wizard (i.e. more of an advisory role than a day-to-day contributor). Oh, and he did the same with Ubuntu: on the picture on the left, he just announced the release of Ubuntu 10.04 from his Debian-branded laptop. ;-)

He has also been maintaining PAM in Debian for as long as can I remember and does a great job at that. He’s very knowledgeable and fully deserves his place within the Debian Technical Committee. I’m glad he still has the time to participate on several important Debian mailing lists because his contributions are always very useful.

I’m sure you’ll notice this just by reading his answers below. My questions are in bold, the rest is by Steve.

Who are you?

I’m 32 years old, have been running Linux since my first year in college back in ’96, and have been a Debian developer now for ten years. Along the way I’ve been involved in maintaining a variety of server packages, worked on the Alpha port for a while, did a stint as a release manager for a couple of years, and serve on the technical committee.

This year I’m also celebrating my ten year anniversary with my lovely wife Patty, who many know as an erstwhile front-desk volunteer at DebConf. God only knows why she puts up with my late-night hacking!

These days in my day job I’m a manager on the Ubuntu Platform team at Canonical, working to help make Ubuntu a daughter distribution that the Debian community can be proud of.

What’s your biggest achievement within Debian or Ubuntu?

There’s no doubt that my biggest achievement in Debian has been overseeing the release of two Debian releases as release manager.

On the other hand, the scope of a release is so huge, and it represents the output of so many developers working together, that it would be arrogant to claim the release itself as an achievement of my own. Also, sarge and etch have long since been rotated off of the mirrors so no one cares about them anymore. ;) For a more personal and lasting contribution in the distro itself, I’m very proud of writing pam-auth-update. It’s a small piece of code, but one that Debian was missing for a long time – it’s made a big difference to PAM module integration in packages!

What are your plans for Debian Wheezy?

My top priority for this cycle is to see multiarch through. We’re still not far enough along in Debian for most developers to see any difference… and once we are, the first thing people are going to see is a fair bit of breakage when we start breaking a lot of assumptions about paths that have been hard-coded upstream. But I’m still excited by the progress that is being made here. We should be able to ship wheezy without any ia32-libs package. We might even be able to get rid of all the biarch library packages, including those used by the toolchain itself. 54 packages in testing build-depend on gcc-multilib right now, in order to build 32-bit code to ship in the amd64 package; a bunch of those should go away with absolutely no reduction in functionality, saving us a bit of space in the archive and saving the maintainers a lot of complexity in their packages, while at the same time giving us much better support for cross-compilation than we’ve ever had before.

It’s a tall order, certainly, but the pieces are falling into place one by one.

My second priority is to get a policy in Debian around packages integrating upstart jobs. It would of course benefit Ubuntu to have many packages back in sync with Debian, but if all we wanted was to sync with Debian, we could mostly just make debhelper ignore upstart jobs in Debian, prefer them in Ubuntu, and call it good. I’m interested in making sure Debian also gets the benefits of being able to use upstart, because as Linux has become increasingly asynchronous (doing more in parallel at start up), the traditional sysvinit has not been able to keep up. There are all kinds of bugs now related to network startup, for instance, that we don’t have a good answer for in a sysvinit model but that we can fix with an event-based system.

Upstart has been around for a while now, but we’ve been slow to integrate it into Debian because it only works on Linux. It would be a shame if right after the first Debian GNU/kFreeBSD technology preview, packages all stopped working on kFreeBSD because they started to assume the availability of upstart! Unfortunately, having been so cautious we now have systemd on the scene, which not only doesn’t support non-Linux but seems to be in the process of trying to gobble up other, non-Linux-specific components of the desktop stack. So I have to wonder what the future holds for the free desktop on non-Linux kernels.

If you could spend all your time on Debian, what would you work on?

Well, based on my previous experiences when I did spend all my time on Debian, I think the answer here is QA / release work. :) Otherwise, I don’t know. My hands are full enough now with multiarch that it’s hard for me to see what the Next Thing would be.

You’re a member of the technical committee. In the interview of Bdale Garbee, I have argued that it’s not working well. What’s your point of view on this topic?

Well, I feel a constant low level guilt about my own poor level of activity in the TC; but that doesn’t translate into a belief that the system is broken. This is, after all, the decision making body of last resort for technical disputes in Debian, and as such it should really be used sparingly. And if a reputation for glacial deliberation means more developers work out their disputes on their own rather than asking the TC to step in, I think that’s actually a healthy thing!

I do still wish we were more effective at resolving those issues that do come our way, but there’s no silver bullet for this. Though the funny thing is, I’ve noticed that the majority of issues that get referred to the TC nowadays never even need us to make a decision; a short conversation with the disputants is often enough to get them to come to an agreement.

What’s the biggest problem of Debian?

By and large, I think Debian is still doing a great job at what it’s best at — delivering a rock-solid distribution that users can rely on. If I would highlight one problem in Debian, though, it would be that I think we’re becoming less innovative as time goes on. Part of that comes from being such a large project that we’re bound to be more conservative as an institution; but even though the three pet Debian projects of mine that I mentioned above are fairly innovative (multiarch, pam-auth-update, upstart), each of these has landed first in Ubuntu rather than in Debian. Always with a clear intent of pushing back up into Debian, of course, but it just wasn’t possible to do this work within Debian for the first cut without much longer delays.

I worry that if Debian is no longer the place to try new things, that we’re going to miss out on attracting contributions from the folks who are inspired to make Free Software better – and not simply to make it stable.

I’m not sure how to address this, though. Maybe improved conversations with derivatives such as (but not limited to) Ubuntu, about what crack of the day is being tried where and how that can be integrated into Debian once it’s proven to work? I don’t think that team-based maintenance or low-threshold NMUs do anything to address this, though, as the kinds of innovation that matter most are ones that require discussion and consensus-finding — not just routing around inactive maintainers.

Do you have wishes for Debian Wheezy?

Well, I’d like to see the armhf port get on its feet and become an official port. Over the lifetime of the arm and armel ports, the state of the art on ARM has changed quite a bit; it would be great to see Debian taking advantage of this richer platform, to let people make better use of their hardware via Debian.

As a former release manager, you’re now a “release wizard”. I guess you have seen it on debian-devel, there are proposals to not freeze testing and to use another distribution starting as a snapshot of testing to finalize the new stable release. According to your experience, what needs to happen to make this possible?

Frankly, I’ve stayed out of that discussion because I don’t think what’s being asked for is possible. I think proponents of a freezeless release have seriously underestimated the amount of work required on the part of the release team to wrangle testing into a releasable product, and that anything that makes propagation of fixes into the pending release more time consuming will make Debian worse on the whole, not better.

If people really want to avoid long freezes for the Debian release, the best way they can help this happen is by making Debian more releasable on an ongoing basis, by helping to hold our packages to our shared standards for quality (i.e., by fixing RC bugs!). The biggest factor in long freezes for Debian is the slow rate at which we bring the RC bug count down during the freeze. Back in the sarge, etch days we used to have really great bug squashing parties that would get people together on weekends to hack through RC bugs by the dozens. I don’t see that happening as much anymore. I’d really like us to get back to that, but my few attempts at this so far since retiring as release manager have led me to think I’m really terrible at organizing parties of any kind. :)

On the other side, as seen at http://bugs.debian.org/release-critical/, the RC bug count for testing at the beginning of the release cycle keeps getting higher and higher. I’d love to know why that is so we can address it. I know we’ve gotten better at detecting some classes of RC bugs; that’s part of it, but I don’t think it explains the whole trend.

Is there someone in Debian that you admire for their contributions?

Wow, what kind of arrogant jerk would I be if I didn’t admire anyone in Debian for their contributions? Debian is and always has been an amazing community of top-notch developers; there are certainly too many I admire to list them all here. Joey Hess certainly makes the list, for his longstanding example of code speaking louder than words and for his ability to get to the heart of common problems and come up with elegant solutions. So does Russ Allbery, who by all accounts had his ability to feel anger in response to email burned out of him at a young age in a flame-related accident on Usenet. ;-) The list goes on, but here I think I have to follow Joey’s example and cut the words short.


Thank you to Steve for the time spent answering my questions. I hope you enjoyed reading his answers as I did. Subscribe to my newsletter to get my monthly summary of the Debian/Ubuntu news and to not miss further interviews. You can also follow along on Identi.ca, Twitter and Facebook.

People behind Debian: Michael Vogt, synaptic and APT developer

Michael and his daughter Marie

Michael has been around for more than 10 years and has always contributed to the APT software family. He’s the author of the first real graphical interface to APT—synaptic. Since then he created “software-center” as part of his work for Ubuntu. Being the most experienced APT developer, he’s naturally the coordinator of the APT team. Check out what he has to say about APT’s possible evolutions.

My questions are in bold, the rest is by Michael.

Who are you?

My name is Michael Vogt, I’m married and have two little daughters. We live in Germany (near to Trier) and I work for Canonical as a software developer. I joined Debian as a developer in early 2000 and started to contribute to Ubuntu in 2004.

What’s your biggest achievement within Debian or Ubuntu?

I can not decide on a single one so I will just be a bit verbose.

From the very beginning I was interested in improving the package manager experience and the UI on top for our users. I’m proud of the work I did with synaptic. It was one of the earliest UIs on top of apt. Because of my work on synaptic I got into apt development as well and fixed bugs there and added new features. I still do most of the uploads here, but nowadays David Kalnischkies is the most active developer.

I also wrote a bunch of tools like gdebi, update-notifier, update-manager, unattended-upgrade and software-properties to make the update/install situation for the user easier to deal with. Most of the tools are written in python so I added a lot of improvements to python-apt along the way, including the initial high level “apt” interface and a bunch of missing low-level apt_pkg features. Julian Andres Klode made a big push in this area recently and thanks to his effort the bindings are fully complete now and have good documentation.

My most recent project is software-center. Its aim is to provide a UI strongly targeted for end-users. The goal of this project is to make finding and installing software easy and beautiful. We have a fantastic collection of software to offer and software-center tries to present it well (including screenshots, instant search results and soon ratings&reviews). This builds on great foundations like aptdaemon by Sebastian Heinlein, screenshots.debian.net by Christoph Haas, ddtp.debian.org by Michael Bramer, apt-xapian-index by Enrico Zini and many others (this is what I love about free software, it usually “adds”, rarely “takes away”).

What are your plans for Debian Wheezy?

For apt I would love to see a more plugable architecture for the acquire system. It would be nice to be able to make apt-get update (and the frontends that use this from libapt) be able to download additional data (like debtags or additional index file that contains more end-user targeted information). I also want to add some scripts so that apt (optionally) creates btrfs snapshots on upgrade and provide some easy way to rollback in case of problems.

There is also some interesting work going on around making the apt problem resolver a more plugable part. This way we should be able to do much faster development.

software-center will get ratings&reviews in the upstream branch, I really hope we can get that into Wheezy.

If you could spend all your time on Debian, what would you work on?

In that case I would start with a refactor of apt to make it more robust about ABI breaks. It would be possible to move much faster once this problem is solved (its not even hard, it just need to be done). Then I would add a more complete testsuite.

Another important problem to tackle is to make maintainer scripts more declarative. I triaged a lot of upgrade bug reports (mostly in ubuntu though) and a lot of them are caused by maintainer script failures. Worse is that depending on the error its really hard for the user to solve the problem. There is also a lot of code duplication. Having a central place that contains well tested code to do these jobs would be more robust. Triggers help us a lot here already, but I think there is still more room for improvement.

What’s the biggest problem of Debian?

That’s a hard question :) I mostly like Debian the way it is. What frustrated me in the past were flamewars that could have been avoided. To me being respectful to each other is important, I don’t like flames and insults because I like solving problems and fighting like this rarely helps that. The other attitude I don’t like is to blame people and complain instead of trying to help and be positive (the difference between “it sucks because it does not support $foo” instead of “it would be so helpful if we had $foo because it enables me to let me do $bar”).

For a long time, I had the feeling you were mostly alone working on APT and were just ensuring that it keeps working. Did you also had this feeling and are things better nowadays ?

I felt a bit alone sometimes :) That being said, there were great people like Eugene V. Lyubimkin and Otavio Salvador during my time who did do a lot of good work (especially at release crunch times) and helped me with the maintenance (but got interested in other area than apt later). And now we have the unstoppable David Kalnischkies and Julian Andres Klode.

Apt is too big for a single person, so I’m very happy that especially David is doing superb work on the day-to-day tasks and fixes (plus big project like multiarch and the important but not very thankful testsuite work). We talk about apt stuff almost daily, doing code reviews and discuss bugs. This makes the development process much more fun and healthy. Julian Andres Klode is doing interesting work around making the resolver more plugable and Christian Perrier is as tireless as always when it comes to the translations merging.

I did a quick grep over the bzr log output (including all branch merges) and count around ~4300 total commits (including all revisions of branches merged). Of that there ~950 commits from me plus an additional ~500 merges. It was more than just ensuring that it keeps working but I can see where this feeling comes from as I was never very verbose. Apt also was never my “only” project, I am involved in other upstream work like synaptic or update-manager or python-apt etc). This naturally reduced the time available to hack on apt and spend time doing the important day-to-day bug triage, response to mailing list messages etc.

One the python-apt side Julian Andres Klode did great work to improve the code and the documentation. It’s a really nice interface and if you need to do anything related to packages and love python I encourage you to try it. Its as simple as:

import apt
cache = apt.Cache()
cache["update-manager"].mark_install()
cache.commit()

Of course you can do much more with it (update-manager, software-center and lots of more tools use it). With “pydoc apt” you can get a good overview.

The apt team always welcomes contributors. We have a mailing list and a irc channel and it’s a great opportunity to solve real world problems. It does not matter if you want to help triage bugs or write documentation or write code, we welcome all contributors.

You’re also an Ubuntu developer employed by Canonical. Are you satisfied with the level of cooperation between both projects? What can we do to get Ubuntu to package new applications developed by Canonical directly in Debian?

Again a tricky question :) When it comes to cooperation there is always room for improvement. I think (with my Canonical hat on) we do a lot better than we did in the past. And it’s great to see the current DPL coming to Ubuntu events and talking about ways to improve the collaboration. One area that I feel that Debian would benefit is to be more positive about NMUs and shared source repositories (collab-maint and LowThresholdNmu are good steps here). The lower the cost is to push a patch/fix (e.g. via direct commit or upload) the more there will be.

When it comes to getting packages into Debian I think the best solution is to have a person in Debian as a point of contact to help with that. Usually the amount of work is pretty small as the software will have a debian/* dir already with useful stuff in it. But it helps me a lot to have someone doing the Debian uploads, responding to the bugmail etc (even if the bugmail is just forwarded as upstream bugreports :) IMO it is a great opportunity especially for new packagers as they will not have to do a lot of packaging work to get those apps into Debian. This model works very well for me for e.g. gdebi (where Luca Falavigna is really helpful on the Debian side).

Is there someone in Debian that you admire for his contributions?

There are many people I admire. Probably too many to mention them all. I always find it hard to single out individual people because the project as a whole can be so proud of their achievements.

The first name that comes to my mind is Jason Gunthorpe (the original apt author) who I’ve never met. The next is Daniel Burrows who I met and was inspired by. David Kalnischkies is doing great work on apt. From contributing his first (small) patch to being able to virtually fix any problem and adding big features like multiarch support in about a year. Sebastian Heinlein for aptdaemon.

Christian Perrier has always be one of my heroes because he cares so much about i18n. Christoph Haas for screenshots.debian.net, Michael Bramer for his work on debian translated package descriptions.


Thank you to Michael for the time spent answering my questions. I hope you enjoyed reading his answers as I did. Subscribe to my newsletter to get my monthly summary of the Debian/Ubuntu news and to not miss further interviews. You can also follow along on Identi.ca, Twitter and Facebook.

State of the Debian-Ubuntu relationship

Debian welcoming contributions from derivatives

The relationship between Debian and Ubuntu has been the subject of many vigorous debates over the years, ever since Ubuntu’s launch in 2004. Six years later, the situation has improved and both projects are communicating better. The Natty Narwhal Ubuntu Developer Summit (UDS) featured—like all UDS for more than 2 years—a Debian Health Check session where current cooperation issues and projects are discussed. A few days after that session, Lucas Nussbaum gave a talk during the mini-Debconf Paris detailing the relationship between both projects, both at the technical and social level. He also shared some concerns for Debian’s future and gave his point of view on how Debian should address them. Both events give valuable insights on the current state of the relationship.

Lucas Nussbaum’s Debian-Ubuntu talk

Lucas started by introducing himself. He’s an Ubuntu developer since 2006 and a Debian developer since 2007. He has worked to improve the collaboration between both projects, notably by extending the Debian infrastructure to show Ubuntu-related information. He attended conferences for both projects (Debconf, UDS) and has friends in both communities. For all of these reasons, he believes himself to be qualified to speak on this topic.

Collaboration at the technical level

He then quickly explained the task of a distribution: taking upstream software, integrating it in standardized ways, doing quality assurance on the whole, delivering the result to users, and assuring some support afterward. He pointed out that in the case of Ubuntu, the distribution has one special upstream: Debian.

Indeed Ubuntu gets most of its software from Debian (89%), and only 7% are new packages coming from other upstream projects (the remaining 4% are unknown, they are newer upstream releases of software available in Debian but he was not able to find out whether the Debian packaging had been reused or not). From all the packages imported from Debian, 17% have Ubuntu-specific changes. The reasons for those changes are varied: bugfixes, integration with Launchpad/Ubuntu One/etc., or toolchain changes. The above figures are based on Ubuntu Lucid (10.04) while excluding many Ubuntu-specific packages (language-pack-*, language-support-*, kde-l10n-*, *ubuntu*, *launchpad*).

The different agendas and the differences in philosophy (Debian often seeking perfect solutions to problems; Ubuntu accepting temporary suboptimal workarounds) also explain why so many packages are modified on the Ubuntu side. It’s simply not possible to always do the work in Debian first. But keeping changes in Ubuntu requires a lot of work since they merge with Debian unstable every 6 months. That’s why they have a strong incentive to push changes to upstream and/or to Debian.

There are 3 channels that Ubuntu uses to push changes to Debian: they file bug reports (between 250 to 400 during each Ubuntu release cycle), they interact directly with Debian maintainers (often the case when there’s a maintenance team), or they do nothing and hope that the Debian maintainer will pick up the patch directly from the Debian Package Tracking System (it relays information provided by patches.ubuntu.com).

Lucas pointed out that those changes are not the only thing that Debian should take back. Ubuntu has a huge user base resulting in lots of bug reports sitting in Launchpad, often without anyone taking care of them. Debian maintainers who already have enough bugs on their packages are obviously not interested in even more bugs, but those who are maintaining niche packages, with few reports, might be interested by the user feedback available in Launchpad. Even if some of the reports are Ubuntu-specific, many of them are advance warnings of problems that will affect Debian later on, when the toolchain catches up with Ubuntu’s aggressive updates. To make this easier for Debian maintainers, Lucas improved the Debian Package Tracking System so that they can easily get Ubuntu bug reports for their packages even without interacting with Launchpad.

Human feelings on both sides

Lucas witnessed a big evolution in the perception of Ubuntu on the Debian side. The initial climate was rather negative: there were feelings of its work being stolen, claims of giving back that did not match the observations of the Debian maintainers, and problems with specific Canonical employees that reflected badly on Ubuntu as a whole. These days most Debian developers find something positive in Ubuntu: it brings a lot of new users to Linux, it provides something that works for their friends and family, it brings new developers to Debian, and it serves as a technological playground for Debian.

On the Ubuntu side, the culture has changed as well. Debian is no longer so scary for Ubuntu contributors and contributing to Debian is The Right Thing to do. More and more Ubuntu developers are getting involved in Debian as well. But at the package level there’s not always much to contribute, as many bugfixes are only temporary workarounds. And while Ubuntu’s community follows this philosophy, Canonical is a for-profit company that contributes back mainly when it has compelling reasons to do so.

Consequences for Debian

In Lucas’s eyes, the success of Ubuntu creates new problems. For many new users Linux is a synonym for Ubuntu, and since much innovation happens in Ubuntu first, Debian is overshadowed by its most popular derivative. He goes as far as saying that because of that “Debian becomes less relevant”.

He went on to say that Debian needs to be relevant because the project defends important values that Ubuntu does not. And it needs to stay as an independent partner that filters what comes out of Ubuntu, ensuring that quality prevails in the long term.

Fixing this problem is difficult, and the answer should not be to undermine Ubuntu. On the contrary, more cooperation is needed. If Debian developers are involved sooner in Ubuntu’s projects, Debian will automatically get more credit. And if Ubuntu does more work in Debian, their work can be showcased sooner in the Debian context as well.

The other solution that Lucas proposed is that Debian needs to communicate on why it’s better than Ubuntu. Debian might not be better for everybody but there are many reasons why one could prefer Debian over Ubuntu. He listed some of them: “Debian has better values” since it’s a volunteer-based project where decisions are made publicly and it has advocated the free software philosophy since 1993. On the other hand, Ubuntu is under control of Canonical where some decisions are imposed, it advocates some proprietary web services (Ubuntu One), the installer recommends adding proprietary software, and copyright assignments are required to contribute to Canonical projects.

Debian is also better in terms of quality because every package has a maintainer who is often an expert in the field of the package. As a derivative, Ubuntu does not have the resources to do the same and instead most packages are maintained on a best effort basis by a limited set of developers who can’t know everything about all packages.

In conclusion, Lucas explained that Debian can neither ignore Ubuntu nor fight it. Instead it should consider Ubuntu as “a chance” and should “leverage it to get back in the center of the FLOSS ecosystem”.

The Debian health check UDS session

While this session has existed for some time, it’s only the second time that a Debian Project Leader was present at UDS to discuss collaboration issues. During UDS-M (the previous summit), this increased involvement from Debian was a nice surprise to many. Stefano Zacchiroli—the Debian leader—collected and shared the feedback of Debian developers and the session ended up being very productive. Six months later is a good time to look back and verify if decisions made during UDS-M (see blueprint) have been followed through.

Progress has been made

On the Debian side, Stefano set up a Derivatives Front Desk so that derivative distributions (not just Ubuntu) have a clear point of contact when they are trying to cooperate but don’t know where to start. It’s also a good place to share experiences among the various derivatives. In parallel, a #debian-ubuntu channel has been started on OFTC (the IRC network used by Debian). With more than 50 regulars coming from both distributions, it’s a good place for quick queries when you need advice on how to interact with the distribution that you’re not familiar with.

Ubuntu has updated its documentation to prominently feature how to cooperate with Debian. For example, the sponsorship process documentation explains how to forward patches both to the upstream developers and to Debian. It also recommends ensuring that the patch is not Ubuntu-specific and gives some explanation on how to do it (which includes checking against a list of common packaging changes made by Ubuntu). The Debian Derivative Front Desk is mentioned as a fallback when the Debian maintainer is unresponsive.

While organizing Ubuntu Developer Week, Ubuntu now reaches out to Debian developers and tries to have sessions on “working with Debian”. Launchpad has also been extended to provide a list of bugs with attached patches and that information has been integrated in the Debian Package Tracking system by Lucas Nussbaum.

Still some work to do

Some of the work items have not been completed yet: many Debian maintainers would like a simpler way to issue a sync request (a process used to inject a package from Debian into Ubuntu). There’s a requestsync command line tool provided by the ubuntu-dev-tools package (which is available in Debian) but it’s not yet usable because Launchpad doesn’t know the GPG keys of Debian maintainers.

Another issue concerns packages which are first introduced in Ubuntu. Most of them have no reason to be Ubuntu-specific and should also end up in Debian. It has thus been suggested that people packaging new software for Ubuntu also upload them to Debian. They could however immediately file a request for adoption (RFA) to find another Debian maintainer if they don’t plan to maintain it in the long term. If Ubuntu doesn’t make this effort, it can take a long time until someone decides to reintegrate the Ubuntu package into Debian just because nobody knows about it. This represents an important shift in the Ubuntu process and it’s not certain that it’s going to work out. As with any important policy change, it can take several years until people are used to it.

Both issues have been rescheduled for this release cycle, so they’re still on the agenda.

This time the UDS session was probably less interesting than the previous one. Stefano explained once more what Debian considers good collaboration practices: teams with members from both distributions, and forwarding of bugs if they have been well triaged and are known to apply to Debian. He also invited Ubuntu to discuss big changes with Debian before implementing them.

An interesting suggestion that came up was that some Ubuntu developers could participate in Debcamp (one week hack-together before Debconf) to work with some Debian developers, go through Ubuntu patches, and merge the interesting bits. This would nicely complement Ubuntu’s increased presence at Debconf: for the first time, community management team member Jorge Castro was at DebConf 10 giving a talk on collaboration between Debian and Ubuntu.

There was also some brainstorming on how to identify packages where the collaboration is failing. A growing number of Ubuntu revisions (identified for example by a version like 1.0-1ubuntu62) could indicate that no synchronization was made with Debian, but it would also identify packages which are badly maintained on the Debian side. If Ubuntu consistently has a newer upstream version compared to Debian, it can also indicate a problem: maybe the person maintaining the package for Ubuntu would be better off doing the same work in Debian directly since the maintainer is lagging or not doing their work. Unfortunately this doesn’t hold true for all packages since many Gnome packages are newer in Ubuntu but are actively maintained on both sides.

Few of those discussions led to concrete decisions. It seems most proponents are reasonably satisfied with the current situation. Of course, one can always do better and Jono Bacon is going to ensure that all Canonical teams working on Ubuntu are aware of how to properly cooperate with Debian. The goal is to avoid heavy package modifications without coordination.

Conclusion

The Debian-Ubuntu relationships used to be a hot topic, but that’s no longer the case thanks to regular efforts made on both sides. Conflicts between individuals still happen, but there are multiple places where they can be reported and discussed (#debian-ubuntu channel, Derivatives Front Desk at derivatives@debian.org on the Debian side or debian@ubuntu.com on the Ubuntu side). Documentation and infrastructure are in place to make it easier for volunteers to do the right thing.

Despite all those process improvements, the best results still come out when people build personal relationships by discussing what they are doing. It often leads to tight cooperation, up to commit rights to the source repositories. Regular contacts help build a real sense of cooperation that no automated process can ever hope to achieve.

This article was first published in Linux Weekly News. You can get my monthly summary of the Debian/Ubuntu news, all you have to do is to click here to subscribe to my free newsletter.

Go2Linux interviewed me: the biggest problem of Debian

Guillermo Garron of Go2Linux enjoys a lot the “People behind Debian” interviews that I make. That’s why he interviewed me (with somewhat similar questions) and published the result on his blog.

Click here to read the full interview. I speak of my Debian projects, of the Ubuntu-Debian relationship, and more.

The question that I would like to highlight is “What’s the biggest problem of Debian?”. I answered this:

Our project identity is somewhat minimalistic. It evolves around the social contract and the Debian Free Software Guidelines. Both documents answer the question of what we’re doing but we lack a clear answer to the question of how we’re supposed to work towards our goals. It would be great if Debian could agree on some principles concerning topics like goal setting, collaboration, team work, politeness, respect. We could then advertise those and build on them while recruiting volunteers.

PS: The interview also hit LinuxToday.

How to find the right Debian packages: high-level search interface

The Debian archive is known to be one of the largest software collections available in the free software world. With more than 16,000 source packages and 30,000 binary packages, users sometimes have trouble finding packages that are relevant to them. Debian developer Enrico Zini has been working on infrastructure to solve this problem. During the recent mini-debconf Paris, Enrico gave a talk presenting what he has been working on in the last few years, which “hasn’t gotten yet the attention it deserves”.

Enrico is known in the Debian community for the introduction of debtags, a system used to classify all packages using facets. Each facet describes a specific kind of property: type of user-interface, programming language it’s written in, type of document manipulated, purpose of the software, etc. His most recent work builds on that. It is available in Debian and Ubuntu in the apt-xapian-index package. Its purpose is to allow advanced queries over the database of available packages.

Users of apt-xapian-index

He started by presenting some early users of the infrastructure. The most widely know is Ubuntu’s software center. Its search feature provides results almost instantly thanks to apt-xapian-index. But it is a very simple interface that doesn’t exploit many of the advanced features provided by the apt-xapian-index.

Another early adopter, making use of some more advanced features, is GoPlay!. It’s a graphical user interface to find games. It makes use of debtags to classify games so that you can browse, for example, all 3D action/arcade games related to cars. GoPlay has even been extended to be a more generic debtags based package browser and the package now also provides GoLearn!, GoAdmin!, GoNet!, GoOffice!, GoSafe!, and GoWeb!.

Fuss-launcher is an application launcher and not a package browser, but by using apt-xapian-index, it’s able to reuse information provided at the package level to make it easier to find installed applications. Package descriptions tend to be more verbose than those embedded in .desktop files. Enrico also showed another nice feature to the audience: if you drag a document onto its window, it will show you a list of applications that can open it.

Last but not least, apt-xapian-index provides a command line search tool that is vastly superior to the traditional apt-cache search: it’s axi-cache search (axi stands for apt-xapian-index). Enrico compared the output of a search on the letter “r”. While apt-cache spits out an infinite list of packages containing this letter somewhere in the description, axi-cache only listed packages related to GNU R. He also demonstrated the contextual tab completion. It makes it easy to use debtags and to refine your search. Once you have typed a first keyword, the tab-completion for the second one only contains keywords or debtags that are actually able to provide more restrictive results. Advanced queries with logical operations (AND, OR, NOT, XOR) are also supported.

Features of the backend

Enrico then dived into the internals. Xapian’s search engine is at the root of this infrastructure. He likes it because it’s a simple library (i.e. no daemon) and it has nice Python bindings. While apt-xapian-index’s core work is to index the descriptions of all the packages, it actually stores much more and can be easily extended with plugins (written in Python).

For instance, the information stored encompasses:

  • words appearing in the description of the packages (including the translated descriptions if the user uses a non-English locale);
  • their origin;
  • their section;
  • their size and installed size;
  • the time they have been first seen;
  • icons, categories, descriptions from the .desktop files they contain (through app-install-data);
  • aliases for names of some popular applications that are not available on Linux (for instance “excel” maps to the debtag office::spreadsheet).

He already has plans to store more: adding popularity contest data (see wishlist bugs #602180 and #602182) will make it possible to sort query results in a useful way. The most widely used applications are good choices when it comes to community support, and they are likely of better quality due to the larger user base. Adding timestamps of the last installation/upgrade/removal, will make it easier to pin-point a regression to a specific package update.

The generated index is world-readable and can be used from any application provided it can use the Xapian library—which is written in C++ but has bindings for Perl, Python, PHP, Java, Tcl, C#, and Ruby.

Call for experimentation

Enrico believes that many useful applications have yet to be invented on top of apt-xapian-index’s features. He’s calling for experimentation and asking for new ideas. The only practical limit that he has encountered is the size of the index: currently it varies between 50 Mb (Debian unstable without translation) and 70 Mb (Debian stable/testing/unstable with one translation). He would like it to not grow over 100 Mb since it’s installed by default (due to aptitude recommending it) and he’s not comfortable with the idea of using more than 20% of the disk footprint of a basic install just for this service. That’s why the index was configured to not store the position of the terms: it’s thus not possible to find out packages whose description contains the word “statistical” immediately followed by the word “computing”. You can however find those which have both terms somewhere in their description.

Enrico wondered if apt-xapian-index offers too much freedom. That could explain why few people experimented with it despite his numerous blog posts with code samples and information on how to get started using it. But it’s not difficult to imagine use cases for this data. It could be used to extend tools like rc-alert or wnpp-alert, for example. They provide a long list of Debian packages that are looking for some help and are installed on the machine. With apt-xapian-index, it would be possible to restrict the results to the set of packages written in a specific programming language or for a particular desktop environment.

The more likely explanation is that too few people know about the tool. There are many more itches to scratch where apt-xapian-index’s features could be very useful, and my guess is that Enrico’s wishes will eventually come true.

This article was first published in Linux Weekly News. Click here to subscribe to my newsletter and keep up with the Debian/Ubuntu news thanks to my monthly digest.