Ubuntu – Page 8

How to find the right Debian packages: high-level search interface

November 29, 2010 by Raphaël Hertzog

The Debian archive is known to be one of the largest software collections available in the free software world. With more than 16,000 source packages and 30,000 binary packages, users sometimes have trouble finding packages that are relevant to them. Debian developer Enrico Zini has been working on infrastructure to solve this problem. During the recent mini-debconf Paris, Enrico gave a talk presenting what he has been working on in the last few years, which “hasn’t gotten yet the attention it deserves”.

Enrico is known in the Debian community for the introduction of debtags, a system used to classify all packages using facets. Each facet describes a specific kind of property: type of user-interface, programming language it’s written in, type of document manipulated, purpose of the software, etc. His most recent work builds on that. It is available in Debian and Ubuntu in the apt-xapian-index package. Its purpose is to allow advanced queries over the database of available packages.

Users of apt-xapian-index

He started by presenting some early users of the infrastructure. The most widely know is Ubuntu’s software center. Its search feature provides results almost instantly thanks to apt-xapian-index. But it is a very simple interface that doesn’t exploit many of the advanced features provided by the apt-xapian-index.

Another early adopter, making use of some more advanced features, is GoPlay!. It’s a graphical user interface to find games. It makes use of debtags to classify games so that you can browse, for example, all 3D action/arcade games related to cars. GoPlay has even been extended to be a more generic debtags based package browser and the package now also provides GoLearn!, GoAdmin!, GoNet!, GoOffice!, GoSafe!, and GoWeb!.

Fuss-launcher is an application launcher and not a package browser, but by using apt-xapian-index, it’s able to reuse information provided at the package level to make it easier to find installed applications. Package descriptions tend to be more verbose than those embedded in .desktop files. Enrico also showed another nice feature to the audience: if you drag a document onto its window, it will show you a list of applications that can open it.

Last but not least, apt-xapian-index provides a command line search tool that is vastly superior to the traditional apt-cache search: it’s axi-cache search (axi stands for apt-xapian-index). Enrico compared the output of a search on the letter “r”. While apt-cache spits out an infinite list of packages containing this letter somewhere in the description, axi-cache only listed packages related to GNU R. He also demonstrated the contextual tab completion. It makes it easy to use debtags and to refine your search. Once you have typed a first keyword, the tab-completion for the second one only contains keywords or debtags that are actually able to provide more restrictive results. Advanced queries with logical operations (AND, OR, NOT, XOR) are also supported.

Features of the backend

Enrico then dived into the internals. Xapian’s search engine is at the root of this infrastructure. He likes it because it’s a simple library (i.e. no daemon) and it has nice Python bindings. While apt-xapian-index’s core work is to index the descriptions of all the packages, it actually stores much more and can be easily extended with plugins (written in Python).

For instance, the information stored encompasses:

words appearing in the description of the packages (including the translated descriptions if the user uses a non-English locale);
their origin;
their section;
their size and installed size;
the time they have been first seen;
icons, categories, descriptions from the .desktop files they contain (through app-install-data);
aliases for names of some popular applications that are not available on Linux (for instance “excel” maps to the debtag office::spreadsheet).

He already has plans to store more: adding popularity contest data (see wishlist bugs #602180 and #602182) will make it possible to sort query results in a useful way. The most widely used applications are good choices when it comes to community support, and they are likely of better quality due to the larger user base. Adding timestamps of the last installation/upgrade/removal, will make it easier to pin-point a regression to a specific package update.

The generated index is world-readable and can be used from any application provided it can use the Xapian library—which is written in C++ but has bindings for Perl, Python, PHP, Java, Tcl, C#, and Ruby.

Call for experimentation

Enrico believes that many useful applications have yet to be invented on top of apt-xapian-index’s features. He’s calling for experimentation and asking for new ideas. The only practical limit that he has encountered is the size of the index: currently it varies between 50 Mb (Debian unstable without translation) and 70 Mb (Debian stable/testing/unstable with one translation). He would like it to not grow over 100 Mb since it’s installed by default (due to aptitude recommending it) and he’s not comfortable with the idea of using more than 20% of the disk footprint of a basic install just for this service. That’s why the index was configured to not store the position of the terms: it’s thus not possible to find out packages whose description contains the word “statistical” immediately followed by the word “computing”. You can however find those which have both terms somewhere in their description.

Enrico wondered if apt-xapian-index offers too much freedom. That could explain why few people experimented with it despite his numerous blog posts with code samples and information on how to get started using it. But it’s not difficult to imagine use cases for this data. It could be used to extend tools like rc-alert or wnpp-alert, for example. They provide a long list of Debian packages that are looking for some help and are installed on the machine. With apt-xapian-index, it would be possible to restrict the results to the set of packages written in a specific programming language or for a particular desktop environment.

The more likely explanation is that too few people know about the tool. There are many more itches to scratch where apt-xapian-index’s features could be very useful, and my guess is that Enrico’s wishes will eventually come true.

This article was first published in Linux Weekly News. Click here to subscribe to my newsletter and keep up with the Debian/Ubuntu news thanks to my monthly digest.

People behind Debian: Colin Watson, the tireless man-db maintainer and a debian-installer developer

November 25, 2010 by Raphaël Hertzog

Colin Watson is not a high-profile Debian figure, you rarely see him on mailing lists but he cares a lot about Debian and you will see him on Debconf videos sharing many thoughtful comments. I have the pleasure to work with him on dpkg as he maintains the package in Ubuntu, but he does a lot of more interesting things. I also took the opportunity to ask some Ubuntu specific questions since he’s worked for Canonical since the start. Read on.

My questions are in bold, the rest is by Colin.

Who are you?

Hi. I’m 32 years old, grew up in Belfast in Northern Ireland, but have been living in Cambridge, England, since I was 18. I’m married with a stepson and a daughter.

I became interested in Debian due to the critical mass of Debian work happening in Cambridge at the time (and perhaps more immediately because my roommate was running Debian: “hey, what’s that?”), started doing random bits of development in 2000, and joined as a developer in 2001 (a really exciting time, with lots of new people joining who became integral parts of the project). I’d only really been intending to do QA work and various bits of packaging around the edges, and maybe some work on the BTS, but then Fabrizio Polacco died and I took over man-db from him, and it sort of snowballed from there.

I graduated from university shortly before becoming a Debian developer. I worked for a web server company (Zeus), then a hardware cryptography company (nCipher), before moving to work for Canonical in 2004, since when I’ve been working full-time on Ubuntu. By this point, I suspect that going back to work in an office every day would be pretty tough.

What’s your biggest achievement within Debian or Ubuntu?

One thing I should say: I rarely start projects. Firstly I don’t think I’m very good at it, and secondly I much prefer coming on to an existing project and worrying away at all the broken bits, often after other people have got bored and wandered off to the next new and shiny thing. That’s probably why I ended up in the GNU/Linux distribution world in the first place, rather than doing lots of upstream development from the start – I like being able to polish things into a finished product that we can give to end users.

So, I’ve had my fingers in a lot of pies over the years, doing ongoing maintenance and fixing lots of bugs. I think the single project I’m most proud of would have to be my work on the Debian installer. I joined that team in early 2004 (a few months before Canonical started up) partly because I was a release assistant at the time and it was an obvious hot-spot, and partly because I thought it’d be a good idea to make sure it worked well on the shiny new G4 PowerBook I’d just treated myself to. I ended up as one of the powerpc d-i port maintainers for a while (no longer, as that machine is dead), but I’ve done a lot of core work as well: much of the work to put progress bars in front of absolutely everything that used to have piles of text output, rescue mode, the current kernel selection framework, a good deal of udev support, several significant debconf extensions, lots of os-prober work, and I think I can claim to be one of the few people who understands the partitioner almost top to bottom. 🙂

d-i is the very first thing many of our users see, and has a huge range of uses, from simple desktop installs to massive corporate deployments; it’s unspeakably important that it works well, and it’s a testament to its design that it’s been able to trundle along without actually very much serious refactoring for the best part of five years now.

I have a soft spot for man-db too. It was my first major project in Debian, starting out from an embarrassingly broken state, and is now nice and stable to the point where I recently had time to spawn a useful generic library out of it (libpipeline).

What are your plans for Debian Wheezy?

d-i has a lot of code to deal with disks and partitions. Of course a lot of it is in the partitioner, and for that we use libparted so we don’t have to worry very much about the minutiae of device naming. But there are several other cases where we do need to care about naming, mainly before the partitioner when detecting disks, and after the partitioner when installing the boot loader. Back in etch, we introduced ‘list-devices’, which abstracted away the disk naming assumptions involved in hardware detection. In wheezy, I would like to take all the messy, duplicated, and error-prone code that handles disk naming in the boot loader installers, and design a simple interface to cover all of them. This has only got more important following the addition of the kFreeBSD and Hurd d-i ports in squeeze, but it bites us every time we notice that, say, CCISS arrays aren’t handled consistently, and it’s a pain to test all that duplicated code.

I’d also like to spread the use of libpipeline through C programs in the archive, which I think has potential to eliminate a class of security vulnerabilities in a much simpler way than was previously available.

If you could spend all your time on Debian, what would you work on?

I would love to systematically reduce the need for the current mass of boot loaders. There’s a significant cost to having so much variation across architectures here: it’s work that needs to be done in N different places, the wildly differing configuration means that d-i has to have huge piles of code to manage them all differently, and there are a bunch of strange arbitrary limitations on what you can do.

The reason I’m working on GRUB 2 is that, in my view, it’s the project with the best chance of centralising all this duplicated work into a single place, and making it easier to bring up new hardware in future (in a way that doesn’t compromise software freedom, as many proprietary boot loaders of the kind often found on phones do). Of course, with flexibility tends to come complexity, and some people have a natural objection to that and prefer something simpler. The things I don’t quite have time to do here are to figure out a coherent way to address the specific over-complexity problems people have with the configuration framework while still keeping the flexibility we need, and to do enough QA and porting work to be able to roll out GRUB 2 at installation time to all the Debian architectures it theoretically supports.

What’s the biggest problem of Debian?

Backbiting, and too much playing the man rather than the ball. With one or two honourable exceptions, I’ve largely stopped reading most Debian mailing lists since it just never seems a productive way to spend time compared to writing code and fixing bugs; and yet I’m conscious that they’re one of the primary means of communication for the project and I’m derelict in not taking part in them.

I do find it a bit frustrating that people are seen primarily in terms of their affiliations. I suppose it’s natural for people to see me as an “Ubuntu guy”, but I don’t really see myself that way: I’ve been working on Debian for nearly twice as long as I’ve been working on Ubuntu, and, while I care a great deal about both projects, I’ve put far more of my own personal time into Debian and I try to make sure that a decent number of the things I’m involved with there aren’t to do with work. Work/life separation is a good thing, not that I’m very good at it. Generally speaking, when I’m working on Debian, I’m doing so as a Debian developer, because I want Debian to be better. When that’s not the case, if it matters, I try to indicate it explicitly.

You’re working for Canonical since Ubuntu’s inception. If you were Mark Shuttleworth, is there something that you would have done differently?

We had many good intentions when we founded Ubuntu. We also had a huge amount of work to deliver, to the point where it wasn’t at all clear whether it would be possible (the warty release was named based on the expectations of it, after all, and came out much more usable than we’d dared to hope). In hindsight, it might have helped to be quieter about our good intentions, so that we could exceed expectations rather than in some cases failing to meet them. That might have set a very different tone early on.

(Personally, I’m happy I’m not Mark. The decisions in my office are much easier to take.)

It seems to me that the community part of Ubuntu is much more eager to cooperate with Debian than the corporate part. It’s probably just that more and more Canonical employees are not former Debian contributors. Do you also have this feeling? Are there processes in place to ensure everybody at Canonical is trying to do the right thing towards Debian cooperation?

Just to be clear, I’m wearing my own hat here—which, ironically, is a fedora—rather than a company hat.

It makes sense for Canonical to be taking on more non-Debian folks; after all, we can’t simply hire from the Debian community forever, and a variety of backgrounds is healthy. As you say, it may well be natural that Ubuntu developers who don’t work for Canonical are more likely to have a Debianish background, as it tends to take something significant to get people to switch to a very different family of GNU/Linux distributions, and changing jobs is one of the most obvious of those things.

Certainly, there was a definite sense among the early developers that we were all part of the Debian family and cared about the success of Debian as well. As Ubuntu has developed its own identity, people involved in it now tend to care primarily about the success of Ubuntu. At the same time, pragmatically, it’s still true that getting code changes into Debian is one of the most economical ways to land them; changes made in Debian or upstream land once and tend to stay in place, while changes made only in Ubuntu incur an ongoing merge overhead, which is not at all trivial.

In many ways it’s human nature to try to fulfil your immediate goals in the most direct way possible. If your goal is to deliver changes to Ubuntu users, then it’s natural to concentrate on that rather than looking at the bigger picture (which takes experience). Debian developers often fail to send changes upstream for much the same reason, although there’s more variation there because they’re normally working on Debian of their own volition and thus tend to have wider goals; the economics are more or less parallel though.

Thus, I think the best way to improve things is to make it the path of least resistance for Ubuntu developers to send changes to Debian. We’re already seeing how this works with the Ubuntu MOTU group; if you send a patch for review, or work on merging a package from Debian, very often the response includes “have you sent these changes to Debian?”. We’re working on both streamlining our code review through a regular patch pilot programme and requiring more code review for changes in general, so I think this will be a good opportunity to ask more people to work with Debian when they propose changes to Ubuntu.

For myself, this may be obvious, but I notice that I’m much better at getting changes into Debian when I already have commit access to the Debian package in question. All the work on improving collaborative maintenance in Debian can only help, for Ubuntu as well as for everyone else. It doesn’t make so much difference for large changes that require extensive discussion, but there are lots of small changes too.

Canonical is upstream of many software projects (unity, indicators infrastructure, etc.). Why aren’t those software immediately packaged in Debian? Do you think we can get this to change?

I’m not sure what the right approach is here, particularly as I haven’t been involved with much of that on the Ubuntu side. I suspect it would be helpful to look at this in a similar way to Ubuntu changes in general. It’s understandable that those developers have getting changes into Ubuntu as their first goal. And yet, having code in Debian offers a wider, and often technically adept, audience, and most developers like having their code reach a wider audience even if it’s not their first priority, particularly if that audience is likely to be able to help with finding problems and fixing bugs. It should be seen as something beneficial to both distributions.

The hardest problems will be with things that aren’t merely optional add-ons (which should generally be fairly non-controversial in Debian, given the breadth of the archive in general – the existence of things like bzr and germinate as Debian packages was never a hard question), but which require changes in established packages. For example, gnome-power-manager in Ubuntu is built with application indicator support, and that’s an important part of having a good indicator-based panel: a lot of the point of indicators is consistency. Since I do very little desktop work myself, I don’t know exactly what would be involved in making it possible to choose this system based on a Debian desktop, but I think it’s probably a bit more complicated than just making sure all the new packages exist in Debian too. Obviously you have to start somewhere.

Is there someone in Debian that you admire for his contributions?

Christian Perrier is absolutely tireless and has done superb things for the state of translations in Debian. And Russ Allbery, even aside from his fine ongoing work on policy, Lintian, and Kerberos, is a constant voice of sanity and calmness.

Release management is incredibly hard work, as I know from my own experience, and anyone who can sustain involvement in it for a long period is somebody pretty special. Steve Langasek and I got involved at about the same time but he outlasted me by quite a few years. He deserves some kind of medal for everything he’s done there.

Thank you to Colin for the time spent answering my questions. I hope you enjoyed reading his answers as I did. Subscribe to my newsletter to get my monthly summary of the Debian/Ubuntu news and to not miss further interviews. You can also follow along on Identi.ca, Twitter and Facebook.

How Ubuntu builds up on Debian

November 22, 2010 by Raphaël Hertzog

I have been asked how Ubuntu relates to Debian, and how packages flow from one to the other. So here’s my attempt at clarifying the whole picture.

Where do the packages come from?

Most packages are created by Debian contributors and they are uploaded in Debian unstable (or Debian experimental). New packages are reviewed by the Debian ftpmasters before being accepted in the official archive. The packages are held in the NEW queue until the review is over, and the time spent there varies between a few hours and a few months (usually they are processed within one week or two).

Ubuntu imports all the official Debian packages, but they also add some packages of their own. About 7% of the Ubuntu packages are third-party software that have been packaged for Ubuntu but not for Debian.

What are the changes made by Ubuntu?

From all the source packages coming from Debian, 17% have additional changes made by Ubuntu. Many of them are part of the “main” repository, which is actively maintained by Canonical and Ubuntu core developers. The “universe” repository is usually closer to the official Debian packages.

Many of the changes made by Ubuntu are the results of the decisions taken during the Ubuntu Developer Summit in order to reach specific goals: provide a better user interface, offer faster boot times, become a better platform for third-party software developers, offer a good integration with their online services (Launchpad, Ubuntu One), etc. Other changes are simply the result of fixing bugs reported by Ubuntu users.

Note that even non-modified source packages will result in different binary packages for Ubuntu. That’s because Ubuntu has made changes to the build environment. They only support Intel-based computers with a 686-class (or newer) CPU, they enable some compiler options that Debian doesn’t, etc. And all binary packages are modified by a program called pkgbinarymangler.

Ubuntu’s release cycle and the relation with Debian

Ubuntu releases every 6 months (that’s what time based releases is about). Debian has a very different schedule. How does Ubuntu manage to reuse Debian’s work?

Ubuntu imports packages from Debian unstable (even experimental sometimes) to get the newest packages. If the Ubuntu package already has Ubuntu-specific changes, they merge their changes in the updated Debian package. Otherwise the Debian package is simply grabbed and rebuilt in Ubuntu. This works well because Debian unstable is much more usable than the name suggests. And this process only goes on during the first 2 months of the cycle (until the Debian Import Freeze), so there’s plenty of time afterward to fix the biggest problems.

In the third and fourth month, it’s still possible to pick updated packages from Debian but it must be requested by a developer, it won’t be done automatically. At the end of the fourth month, the feature freeze is put in place.

The 2 months left are dedicated to bug fixing and polishing the distribution. There are various sub-freezes that happen in this period, you can check the Natty release schedule as an example. Picking updated packages from Debian is now the exception, it will only be allowed if the update on the Debian side is a bug-fix only release.

Credits: some figures taken from a talk of Lucas Nussbaum, they were collected based on the packages available in the Lucid Lynx release of Ubuntu.

Click here to subscribe to my newsletter and get my monthly update on what’s going on in Debian and Ubuntu.

4 tips to maintain a “3.0 (quilt)” Debian source package in a VCS

November 18, 2010 by Raphaël Hertzog

Most Debian packages are managed with a version control system (VCS) like git, subversion, bazaar or mercurial. The particularities of the 3.0 (quilt) source format are not without consequences in terms of integration with the VCS. I’ll give you some tips to have a smoother experience.

All the samples given in the article assume that you use git as version control system.

1. Add .pc to the VCS ignore list

.pc is the directory used by quilt to store its internal data (list of applied patches, backup of modified files). It’s also created by dpkg-source so that quilt knows that the patches are in debian/patches (and not in patches which is the default directory used by quilt). For that reason, the directory is kept even if you unapply all the patches.

However you don’t want to store this directory in your repository, so it’s best to put it in the VCS ignore list. With git you simply do:

$ echo ".pc" >>.gitignore
$ git add .gitignore
$ git commit -m "Ignore quilt dir"

The .gitignore file is ignored by dpkg-source, so you’re not adding any noise to the generated source package.

2. Unapply patches after the build

If you store upstream sources with non-applied patches (most people do), and if you don’t build packages in a temporary build directory, then you probably want to unapply the patches after the build so that your repository is again in a clean status.

This is now the default since dpkg-source will unapply any patch that it had to apply by itself. Thus if you start the build with a clean tree, you’ll end up with a clean tree.

But you can still force dpkg-source to unapply patches by adding “unapply-patches” to debian/source/local-options:

$ echo "unapply-patches" >>debian/source/local-options
$ git add debian/source/local-options
$ git commit -m "Unapply patches after build"

svn-buildpackage always builds in a temporary directory so the repository is left exactly like it was before the build, this option is thus useless. git-buildpackage can also be told to build in a temporary directory with --git-export-dir=../build-area/ (the directory ../build-area/ is the one used by svn-buildpackage, so this option makes git-buildpackage behave like svn-buildpackage in that respect).

3. Manage your quilt patches as a git branch

Instead of using quilt to manage the Debian-specific patches, it’s possible to use git itself. git-buildpackage comes with gbp-pq (“Git-BuildPackage Patch Queue”): it can export the quilt serie in a git branch that you can manipulate like you want. Each commit represents a patch, so you want to rebase that branch to edit intermediary commits. Check out the upstream documentation of this tool to learn how to work with it.

There’s an alternative tool as well: git-dpm. Its website explains the principle very well. It’s a more complicated than gbp-pq but it has the advantage of keeping the history of all branches used to generate the quilt series of all Debian releases. You might want to read a review made by Sam Hartman, it explains the limits of this tool.

4. Document how to review the changes

One of the main benefit of this new source format is that it’s easy to review changes because upstream changes are kept as separate patches properly documented (ideally using the DEP-3 format). With the tools above, the commit message becomes the patch header. Thus it’s important to write meaningful commit messages.

This works well as long as your workflow considers the Debian patches as a branch that you rebase on top of the upstream sources at each release. Some maintainers don’t like this workflow and prefer to have the Debian changes applied directly in the packaging branch. They switch to a new upstream version by merging it in their packaging branch. In that case, it’s difficult to generate a quilt serie out of the VCS. Instead, you should instruct dpkg-source to store all the changes in a single patch (which is then similar to the good old .diff.gz) and document in the header of that patch how the changes can be better reviewed, for example in the VCS web interface. You do the former with the --single-debian-patch option and the latter by writing the header in debian/source/patch-header:

$ echo "single-debian-patch" >> debian/source/local-options
$ cat >debian/source/patch-header <<END
This patch contains all the Debian-specific
changes mixed together. To review them
separately, please inspect the VCS history
at http://git.debian.org/?=collab-maint/foo.git

END

Subscribe to this blog by RSS, by email or on Facebook.

« Previous Page
1
…
6
7
8
9
10
…
12
Next Page »