Deciphering one of dpkg’s weirdest errors: short read on buffer copy

As a Debian/Ubuntu user, you’re likely to be exposed at some point to an error reported by dpkg. In a series of articles, I’ll explain some of the errors that you might encounter.

Some error messages can be confusing at times. Most of the error strings do not appear very often and developers thus tend to use very terse description of the underlying problem. In other cases the architecture of the software makes it difficult to pin-point the real problem because the part that displays the error is several layers above the one that generated the initial error.

This is for example the case with this error of dpkg:

Unpacking replacement xulrunner-1.9.2 ...
dpkg-deb (subprocess): data: internal gzip read error: '<fd:0>: too many length or distance symbols'
dpkg-deb: error: subprocess <decompress> returned error exit status 2
dpkg: error processing /var/cache/apt/archives/xulrunner-1.9.2_1.9.2.17+build3+nobinonly-0ubuntu1_amd64.deb (--unpack):
 short read on buffer copy for backend dpkg-deb during `./usr/lib/xulrunner-1.9.2.17/components/libdbusservice.so'

First, the decompression layer discovers something unexpected in the data read in the .deb file and dpkg-deb outputs the error message coming from zlib (“too many length or distance symbols”). This causes the premature end of dpkg-deb --fsys-tarfile that dpkg had executed to extract the .data.tar archive from the deb file. In turn, dpkg informs us that dpkg-deb did not send all the data that were announced (and hence the “short read” in the error message) and that were meant to be part of the file ‘/usr/lib/xulrunner-1.9.2.17/components/libdbusservice.so’.

That’s all nice but it doesn’t help you much in general. What you must understand from the above is that the .deb file is corrupted (sometimes just truncated). In theory it should not happen since APT verifies the checksums of files when they are downloaded. But computers are not infallible and even if the downloaded data was good, it can have been corrupted when stored on disk (for example cheap SSD disks are known to not last very well).

Try removing the file (usually with apt-get clean since it’s stored in APT’s cache) and let APT download it again. Chances are that it will work on the second try. Otherwise consider doing a memory and HDD check as something is probably broken in your computer.

Join my free newsletter and learn more tips for users. Or click here to support my work on dpkg with Flattr, consider subscribing for a few months.

apt-get, aptitude, … pick the right Debian package manager for you

This is a frequently asked question: “What package manager shall I use?”. And my answer is “the one that suits your needs”. In my case, I even use different package managers depending on what I’m trying to do.

APT vs dpkg, which one is the package manager?

In the Debian world, we’re usually thinking of APT-based software when we’re referring to a “package manager”. But in truth, the real package manager is dpkg. It’s the low-level tool that takes a .deb file and extracts its content on the disk, or that takes the name of a package to remove the associated files, etc.

APT is better known because it’s the part of the packaging infrastructure that matters to the user. APT makes collection of software available to the user and does the dirty work of downloading all the required packages and installing them by calling dpkg in the correct order to respect the dependencies.

But APT is not a simple program, it’s a library and several different APT frontends have been developed on top of that library. The most widely known is apt-get since it’s the oldest one, and it’s provided by APT itself.

Graphical APT front-ends

update-manager is a simple frontend useful to install security updates and other trivial daily upgrades (if you’re using testing or sid). It’s the one that you get when you click in the desktop notification that tells you that updates are available. In cases, where the upgrade is too complicated for update-manager, it will suggest to run synaptic which is full featured package manager. You can browse the list on installed/available packages in numerous ways, you can mark packages for installation/upgrade/removal/purge and then run in one go all the recorded actions.

software-center aims to be an easy to use application installer, it will hide most of the packaging details and will only present installed/available applications (as defined by a .desktop file). It’s very user friendly and has been developed by Ubuntu.

Of the graphical front-ends, I use mainly synaptic and only when I’m reviewing what I have installed to trim the system down.

Console-based GUI APT front-ends

In this category, I’ll cite only aptitude. Run without parameter, it will start a powerful console-based GUI. Much like synaptic, you can have multiple views of the installed/available packages and mark packages for installation/upgrade/removal/purge before executing everything at once.

Command-line based package managers and APT front-ends

This is where the well known apt-get fits, but there are several other alternatives: aptitude, cupt, wajig. Wajig and cupt are special cases as they don’t use libapt: the former wraps several tools including apt-get, and the latter is a (partial) APT reimplementation (versions 1.x were in Perl, 2.x are now is C++).

You’re welcome to try them out and find out which one you prefer, but I have never felt the need to use something else than apt-get and aptitude.

apt-get or aptitude?

First I want to make it clear that you can use both and mix them without problems. It used to be annoying when apt-get did not track which packages were automatically installed while aptitude did, but now that both packages share this list, there’s no reason to avoid switching back and forth.

I would recommend apt-get for the big upgrades (i.e. dist-upgrade from one stable to the next) because it will always find quickly a relatively good solution while aptitude can find several convoluted solutions (or none) and it’s difficult to decide which one should be used.

On the opposite for regular upgrades in unstable (or testing), I would recommend “aptitude safe-upgrade“. It does a better job than apt-get at keeping on hold packages which are temporarily broken due to some not yet finished changes while still installing new packages when required. With aptitude it’s also possible to tweak dynamically the suggested operations while apt-get doesn’t allow this. And aptitude’s command line is probably more consistent: with apt-get you have to switch between apt-get and apt-cache depending on the operation that you want to do, aptitude on the other hand does everything by itself.

Take some time to read their respective documentation and to try them.

Click here to subscribe to my free newsletter and get my monthly analysis on what’s going on in Debian and Ubuntu. Or just follow along via the RSS feed, Identi.ca, Twitter or Facebook.

Official Debian/Ubuntu packages for Dropbox

Dropbox is a popular service to synchronize files between multiple computers. The service is entirely proprietary but the company is Linux friendly and provides Linux binaries ready to use. They even provide Ubuntu packages that wrap the dropbox client and provide integration with Nautilus.

A bit of story

Unfortunately for Debian users, those packages do not work on Debian due to a dependency that can’t be satisfied (because Ubuntu introduced an epoch on the version of their nautilus package that Debian doesn’t have). This was even reported in a Squeeze review in Linux Weekly News.

At some point, Ivan Borzenkov introduced a dropbox package to Debian but it was not based on the above package, instead it packaged directly the proprietary binaries. This was a bad decision because the binaries bundle a set of LGPL libraries and nobody from Debian wanted to do the required work to provide the corresponding source code. So the package got dropped (see bug #610300).

More recently several persons filed ITP (Intent To Package) bugs stating their willingness to re-introduce dropbox in Debian, but after many months lingering in the bug tracking system (see #544499, #613788), they have been turned back to RFP (Request For Package) because they changed their minds.

What I did

Being a dropbox user myself (despite the recent proof that data stored on dropbox is not 100% private), I offered sponsorship to the volunteers who wanted to package dropbox. But it turns out this was not enough to motivate someone to complete the task.

In the mean time I was still using the old dropbox package that was removed (it used to be downloadable from snapshot.debian.org).

While this was good enough for me, it’s clearly not OK in the long term and way too difficult for the majority of users. So this week-end I spent some hours to create a proper package.

It’s loosely based on the package provided by Dropbox but I upgraded the packaging and changed the way it works. I patched the dropbox wrapper to provide a “dropbox update” command that downloads and updates the proprietary binaries. They are now stored in /var/lib/dropbox instead of having a copy in each user’s home directory (~/.dropbox-dist/). This update command is run by the postinst so that installing the package immediately downloads the proprietary binaries.

Get the Dropbox packages for Debian

The package nautilus-dropbox has been uploaded to Debian unstable, it’s currently in the NEW queue but will shortly reach the mirrors. Then you will be able to You can install the package with a simple apt-get install nautilus-dropbox (provided that you activated the non-free section since that’s where the package is hosted, it can’t be part of Debian since it requires the proprietary binaries to be useful).

In the mean time you can get the packages from the links below:

  • For Debian Squeeze: i386 amd64 (now in backports.debian.org)
  • For Debian unstable/wheezy: i386 amd64 (now in wheezy/sid)
  • For Debian unstable with GNOME3 from experimental: i386 amd64 (now in experimental)

Get the Dropbox packages for Ubuntu

I have setup a PPA for nautilus-dropbox. Feel free to use it in place of the upstream packages.

$ sudo add-apt-repository ppa:hertzog/nautilus-dropbox
$ sudo apt-get update
$ sudo apt-get install nautilus-dropbox

It also includes packages for oneiric, but in theory once the package is accepted into Debian, it should appear in oneiric shortly after.

Package maintenance

I have done the initial packaging work but I don’t really want to maintain it in the long term. I have more than enough to do with dpkg and my other packages. So if you are interested in maintaining this package, please get in touch with me. You should know a bit of python since there are Debian-specific patches of the upstream code. The package is maintained in a git repository.

Feedback

If you have encountered a problem with one of those packages, feel free to leave a comment.

If you’re an happy user of the above packages, click here to find out how you can thank me.

I hope you enjoy the packages!

Trying to make dpkg triggers more useful and less painful

Lately I have been working on the triggers feature of dpkg. I would like to share my plan and what I have done so far. I’ll first explain what triggers are, the current problems, and the work I did to try to improve the situation.

Introduction

Dpkg triggers are a neat feature of dpkg that package can use to send/receive notifications to/from other installed packages. Those notifications take the form a simple string.

This feature is heavily used to track changes of packaged files in a list of predefined directories, and to update other files based on this. For instance, man-db is watching the directories containing manual pages so that it can update its cache (in /var/cache/man/). install-info is updating the index of info pages when there have been changes in /usr/share/info. gnome-menus is updating its own copy of the menu hierarchy (with entries from /etc/gnome/menus.blacklist blacklisted) every time that a .desktop file is installed/updated/removed.

From a user’s perspective

You see triggers in action very often during upgrades (in fact too often as we’ll see it later):

Preparing to replace zim 0.52-1 (using .../archives/zim_0.52-1_all.deb) ...
Unpacking replacement zim ...
Processing triggers for shared-mime-info ...
Processing triggers for menu ...
Processing triggers for desktop-file-utils ...
Processing triggers for man-db ...
Processing triggers for hicolor-icon-theme ...
Processing triggers for python-support ...
Processing triggers for gnome-menus ...
Setting up zim (0.52-1) ...
Processing triggers for python-support ...
Processing triggers for menu ...

As you guessed it, those “Processing triggers” lines correspond to the packages which received (one or more) trigger notifications and which are doing the corresponding task.

By default the triggers are processed at the end of the dpkg --unpack invocation which is often too soon because APT will often call dpkg --unpack repeatedly during important upgrades. There are some options to ask APT to use dpkg’s --no-triggers option in order to defer the trigger processing at the end of the APT run. You can put this in /etc/apt/apt.conf.d/triggers:

// Trigger deferred
DPkg::NoTriggers "true";
PackageManager::Configure "smart";
DPkg::ConfigurePending "true";
DPkg::TriggersPending "true";

I have now asked APT maintainers to use those options by default, I filed bug #626599 to track this. At the same time I fixed bug #526774 reported by APT maintainers. This bug forced them to put a work-around in APT which resulted in running triggers sooner than expected.

(And while writing this article I filed bug #628564 and #628574 because it was clearly not normal that the menu triggers was executed twice for the installation of a single package)

From a packager’s perspective

The implementation of triggers has several consequences on the status that packages can have.

Let’s assume that the package A installs a file in a directory that is watched by package B (and that B is currently in the “installed” state). When A is unpacked, dpkg adds B to its “Triggers-Awaited” field and lists the activated trigger in B’s “Triggers-Pending” field. Package A is in “unpacked” state, but B has been changed to “triggers-pending”.

When A is configured, instead of going to the “installed” state, it will go to the “triggers-awaited” state. In that state the package is assumed to NOT fulfill dependencies. However, B—which is still in “triggers-pending” state—does fulfill dependencies.

A and B will switch to “installed” at the same time when the trigger has been processed.

The fact that the triggers-awaited status does not fulfill dependencies means that some common triggers like man-db have to be processed regularly just to be able to ensure dependencies are satisfied before running the postinst of other installed packages.

But a package which ships a manual page can certainly be considered as configured when its postinst has been run even if man-db has not yet updated its cache to know about the new/updated manual page.

When you activate a trigger with the dpkg-trigger command you have an option --no-await to avoid awaiting the trigger processing (and thus to go directly to installed state after the postinst has been run). But with file triggers or activate trigger directives, you do not have this option.

My proposal to improve the situation

This is the problem that I tried to solve during my last vacation. But before changing the inner working of triggers, I wrote a non-regression test suite for that feature (commit here) so I could hack with some confidence that I did not break everything.

The result has been presented on the debian-dpkg mailling list: see the discussion here. I added new directives that can be used in triggers files that work exactly like the current triggers except that they do not put triggering packages in trigger-awaited status.

I believe the code to be mostly ready, but in its current form the patch brings zero benefits until all packages have been converted to use the trigger variants that do not require awaiting trigger processing (and the change requires a pre-dependency on dpkg to ensure we have the required dpkg that understands the new kind of trigger directives).

Remaining question

Thus I wonder if I should not change the default semantic of triggers. The packages which really provide crucial functionality to awaiting packages through triggers would then have to be updated to switch to the new directives.

If you’re a packager using triggers, you can thus help me by answering this question: do you know some triggers where it’s important that the awaiting packages are not considered as configured before the trigger processing? In most of the cases I checked, it’s important for the triggered package rather than for the triggering package.

In truth, a package in triggers-awaited status is usually in a good enough shape to be able to satisfy dependencies (i.e. requirements that other packages can have), but it would still be worth to record the fact that it’s not entirely configured yet because it might be true from the user’s point of view: for example if the menu trigger has not yet been processed, the software might not yet be visible in the application menu.

If you appreciate this kind of groundwork that benefits to the whole Debian ecosystem, please consider supporting my work. Click here and give it a look, there are many ways to contribute and to make a difference for me.

People behind Debian: Steve Langasek, release wizard

Steve Langasek has been contributing to Debian for more than a decade. He was a release manager for sarge and etch, and like many former release managers, he’s still involved in the Debian release team although as a release wizard (i.e. more of an advisory role than a day-to-day contributor). Oh, and he did the same with Ubuntu: on the picture on the left, he just announced the release of Ubuntu 10.04 from his Debian-branded laptop. ;-)

He has also been maintaining PAM in Debian for as long as can I remember and does a great job at that. He’s very knowledgeable and fully deserves his place within the Debian Technical Committee. I’m glad he still has the time to participate on several important Debian mailing lists because his contributions are always very useful.

I’m sure you’ll notice this just by reading his answers below. My questions are in bold, the rest is by Steve.

Who are you?

I’m 32 years old, have been running Linux since my first year in college back in ’96, and have been a Debian developer now for ten years. Along the way I’ve been involved in maintaining a variety of server packages, worked on the Alpha port for a while, did a stint as a release manager for a couple of years, and serve on the technical committee.

This year I’m also celebrating my ten year anniversary with my lovely wife Patty, who many know as an erstwhile front-desk volunteer at DebConf. God only knows why she puts up with my late-night hacking!

These days in my day job I’m a manager on the Ubuntu Platform team at Canonical, working to help make Ubuntu a daughter distribution that the Debian community can be proud of.

What’s your biggest achievement within Debian or Ubuntu?

There’s no doubt that my biggest achievement in Debian has been overseeing the release of two Debian releases as release manager.

On the other hand, the scope of a release is so huge, and it represents the output of so many developers working together, that it would be arrogant to claim the release itself as an achievement of my own. Also, sarge and etch have long since been rotated off of the mirrors so no one cares about them anymore. ;) For a more personal and lasting contribution in the distro itself, I’m very proud of writing pam-auth-update. It’s a small piece of code, but one that Debian was missing for a long time – it’s made a big difference to PAM module integration in packages!

What are your plans for Debian Wheezy?

My top priority for this cycle is to see multiarch through. We’re still not far enough along in Debian for most developers to see any difference… and once we are, the first thing people are going to see is a fair bit of breakage when we start breaking a lot of assumptions about paths that have been hard-coded upstream. But I’m still excited by the progress that is being made here. We should be able to ship wheezy without any ia32-libs package. We might even be able to get rid of all the biarch library packages, including those used by the toolchain itself. 54 packages in testing build-depend on gcc-multilib right now, in order to build 32-bit code to ship in the amd64 package; a bunch of those should go away with absolutely no reduction in functionality, saving us a bit of space in the archive and saving the maintainers a lot of complexity in their packages, while at the same time giving us much better support for cross-compilation than we’ve ever had before.

It’s a tall order, certainly, but the pieces are falling into place one by one.

My second priority is to get a policy in Debian around packages integrating upstart jobs. It would of course benefit Ubuntu to have many packages back in sync with Debian, but if all we wanted was to sync with Debian, we could mostly just make debhelper ignore upstart jobs in Debian, prefer them in Ubuntu, and call it good. I’m interested in making sure Debian also gets the benefits of being able to use upstart, because as Linux has become increasingly asynchronous (doing more in parallel at start up), the traditional sysvinit has not been able to keep up. There are all kinds of bugs now related to network startup, for instance, that we don’t have a good answer for in a sysvinit model but that we can fix with an event-based system.

Upstart has been around for a while now, but we’ve been slow to integrate it into Debian because it only works on Linux. It would be a shame if right after the first Debian GNU/kFreeBSD technology preview, packages all stopped working on kFreeBSD because they started to assume the availability of upstart! Unfortunately, having been so cautious we now have systemd on the scene, which not only doesn’t support non-Linux but seems to be in the process of trying to gobble up other, non-Linux-specific components of the desktop stack. So I have to wonder what the future holds for the free desktop on non-Linux kernels.

If you could spend all your time on Debian, what would you work on?

Well, based on my previous experiences when I did spend all my time on Debian, I think the answer here is QA / release work. :) Otherwise, I don’t know. My hands are full enough now with multiarch that it’s hard for me to see what the Next Thing would be.

You’re a member of the technical committee. In the interview of Bdale Garbee, I have argued that it’s not working well. What’s your point of view on this topic?

Well, I feel a constant low level guilt about my own poor level of activity in the TC; but that doesn’t translate into a belief that the system is broken. This is, after all, the decision making body of last resort for technical disputes in Debian, and as such it should really be used sparingly. And if a reputation for glacial deliberation means more developers work out their disputes on their own rather than asking the TC to step in, I think that’s actually a healthy thing!

I do still wish we were more effective at resolving those issues that do come our way, but there’s no silver bullet for this. Though the funny thing is, I’ve noticed that the majority of issues that get referred to the TC nowadays never even need us to make a decision; a short conversation with the disputants is often enough to get them to come to an agreement.

What’s the biggest problem of Debian?

By and large, I think Debian is still doing a great job at what it’s best at — delivering a rock-solid distribution that users can rely on. If I would highlight one problem in Debian, though, it would be that I think we’re becoming less innovative as time goes on. Part of that comes from being such a large project that we’re bound to be more conservative as an institution; but even though the three pet Debian projects of mine that I mentioned above are fairly innovative (multiarch, pam-auth-update, upstart), each of these has landed first in Ubuntu rather than in Debian. Always with a clear intent of pushing back up into Debian, of course, but it just wasn’t possible to do this work within Debian for the first cut without much longer delays.

I worry that if Debian is no longer the place to try new things, that we’re going to miss out on attracting contributions from the folks who are inspired to make Free Software better – and not simply to make it stable.

I’m not sure how to address this, though. Maybe improved conversations with derivatives such as (but not limited to) Ubuntu, about what crack of the day is being tried where and how that can be integrated into Debian once it’s proven to work? I don’t think that team-based maintenance or low-threshold NMUs do anything to address this, though, as the kinds of innovation that matter most are ones that require discussion and consensus-finding — not just routing around inactive maintainers.

Do you have wishes for Debian Wheezy?

Well, I’d like to see the armhf port get on its feet and become an official port. Over the lifetime of the arm and armel ports, the state of the art on ARM has changed quite a bit; it would be great to see Debian taking advantage of this richer platform, to let people make better use of their hardware via Debian.

As a former release manager, you’re now a “release wizard”. I guess you have seen it on debian-devel, there are proposals to not freeze testing and to use another distribution starting as a snapshot of testing to finalize the new stable release. According to your experience, what needs to happen to make this possible?

Frankly, I’ve stayed out of that discussion because I don’t think what’s being asked for is possible. I think proponents of a freezeless release have seriously underestimated the amount of work required on the part of the release team to wrangle testing into a releasable product, and that anything that makes propagation of fixes into the pending release more time consuming will make Debian worse on the whole, not better.

If people really want to avoid long freezes for the Debian release, the best way they can help this happen is by making Debian more releasable on an ongoing basis, by helping to hold our packages to our shared standards for quality (i.e., by fixing RC bugs!). The biggest factor in long freezes for Debian is the slow rate at which we bring the RC bug count down during the freeze. Back in the sarge, etch days we used to have really great bug squashing parties that would get people together on weekends to hack through RC bugs by the dozens. I don’t see that happening as much anymore. I’d really like us to get back to that, but my few attempts at this so far since retiring as release manager have led me to think I’m really terrible at organizing parties of any kind. :)

On the other side, as seen at http://bugs.debian.org/release-critical/, the RC bug count for testing at the beginning of the release cycle keeps getting higher and higher. I’d love to know why that is so we can address it. I know we’ve gotten better at detecting some classes of RC bugs; that’s part of it, but I don’t think it explains the whole trend.

Is there someone in Debian that you admire for their contributions?

Wow, what kind of arrogant jerk would I be if I didn’t admire anyone in Debian for their contributions? Debian is and always has been an amazing community of top-notch developers; there are certainly too many I admire to list them all here. Joey Hess certainly makes the list, for his longstanding example of code speaking louder than words and for his ability to get to the heart of common problems and come up with elegant solutions. So does Russ Allbery, who by all accounts had his ability to feel anger in response to email burned out of him at a young age in a flame-related accident on Usenet. ;-) The list goes on, but here I think I have to follow Joey’s example and cut the words short.


Thank you to Steve for the time spent answering my questions. I hope you enjoyed reading his answers as I did. Subscribe to my newsletter to get my monthly summary of the Debian/Ubuntu news and to not miss further interviews. You can also follow along on Identi.ca, Twitter and Facebook.

Debian Cleanup Tip #6: Remove automatically installed packages that are no longer needed

Last week we learned how to identify cruft on your Debian system. This week, for the last article in this series, we’ll learn more about automatically installed packages and how to get rid of them when you don’t need them any longer.

APT tracks automatically installed packages

When you install a new package with apt-get/aptitude/synaptic, it’s very common to end up installing many more packages: those are the dependencies of the installed package. Here’s an example:

$ sudo apt-get install pino
[...]
The following extra packages will be installed:
  libdbusmenu-glib1 libgee2 libindicate4 libnotify1 notification-daemon
The following NEW packages will be installed:
  libdbusmenu-glib1 libgee2 libindicate4 libnotify1 notification-daemon pino
0 upgraded, 6 newly installed, 0 to remove and 0 not upgraded.
Need to get 478 kB of archives.
After this operation, 2531 kB of additional disk space will be used.
Do you want to continue [Y/n]? 

After this installation, the 5 “extra packages” will be marked as “automatically installed”. What does this mean? It means that you have not explicitly requested their installation and that the system should be free to remove them as soon as they are no longer needed.

You can verify that this is effectively the case with “apt-mark showauto” (it returns a list of the automatically installed packages).

$ apt-mark showauto |grep libdbusmenu
libdbusmenu-glib1
$ apt-mark showauto |grep pino
$

Aptitude shows this information with the “A” letter in its interactive interface and in the “aptitude search” output. “aptitude show” has a dedicated field for this:

$ aptitude show libdbusmenu-glib1
Package: libdbusmenu-glib1               
New: yes
State: installed
Automatically installed: yes
Version: 0.3.7-1
[...]

In Synaptic, it’s not very visible but once you have selected an installed package, you can verify in the “Package” menu whether “Automatically installed” is checked or not.

APT tells you which packages are no longer needed

Over time, some of those automatically installed packages become unnecessary because the packages that depended on them no longer do. It might be that they are using a newer version of the same library, or they switched to use something else, or they are able to do the task themselves.

Whatever the reason, the original dependency has vanished and the automatically installed package is no longer needed on the system.

Aptitude will automatically remove those unneeded packages the next time you run it but apt-get and synaptic do not.

Apt-get will inform you that some packages are no longer needed and will even tell you how you can get rid of them:

$ sudo apt-get remove pino
[...]
The following packages were automatically installed and are no longer required:
  notification-daemon libdbusmenu-glib1 libnotify1 libgee2 libindicate4
Use 'apt-get autoremove' to remove them.
The following packages will be REMOVED:
  pino
0 upgraded, 0 newly installed, 1 to remove and 219 not upgraded.
After this operation, 1225 kB disk space will be freed.
Do you want to continue [Y/n]? 
[...]
$ sudo apt-get autoremove
[...]
The following packages will be REMOVED:
  libdbusmenu-glib1 libgee2 libindicate4 libnotify1 notification-daemon
0 upgraded, 0 newly installed, 5 to remove and 219 not upgraded.
After this operation, 1307 kB disk space will be freed.
Do you want to continue [Y/n]? 
[...]

Synaptic will show you the packages that can be removed in a new section name “Installed (auto removable)” if you select the “Status” button in the bottom-left pane.

It’s thus a good habit to get rid of those unneeded package from time to time.

Use this feature to trim down your system

While APT usually sets the “Automatically installed” flag, you can also set it manually. It’s a very simple way to tell the system “I don’t need this package directly, feel free to remove it if nothing else requires it”.

# With apt-get
$ sudo apt-mark markauto libxml-simple-perl
# Or with aptitude
$ sudo aptitude markauto libxml-simple-perl

You can also do it in the interactive interface of aptitude with the key “M” (and “m” for unmarking). To do it in Synaptic, you have to use the menu entry “Package > Automatically installed”.

Many users like to have a minimal set of packages installed but they don’t really know which packages are really important and trying to remove every package to look what happens is cumbersome.

Thanks to this feature, you don’t try removing packages but you flag them as automatically installed. There is few risks in doing so when it concerns libraries (including python/perl modules). If the package is not indirectly needed by one of your important packages, it will be removed by apt-get autoremove, otherwise it’s kept for as long as it’s needed.

I would suggest to not mark as such packages of priority higher or equal to important to avoid nasty surprises (although I say this to not be blamed in case you remove too much, in theory the system should not remove essential components and all dependencies should be complete).

Also be aware of the consequences when you mark “task” packages like “gnome” as automatically installed… it will suggest you to remove your whole desktop. If you want to trim down the default desktop, you should “unmark” the desktop packages that you want to keep:

$ sudo apt-mark unmarkauto gnome-session gnome-panel

Do you want to read more tutorials like this one? Click here to subscribe to my free newsletter, you can opt to receive future articles by email.

Debian Cleanup Tip #5: identify cruft that can be removed from your Debian system

Last week we learned how to identify and restore packages whose files have been corrupted. This time we’ll concentrate ourselves on the non-packaged files…

Non-packaged files

They are files which are not provided by a Debian package, or in other words, files where dpkg --search finds no associated package:

$ dpkg --search /srv/cvs
dpkg-query: no path found matching pattern /srv/cvs

You always have such files on your system, at least all your own files in /home. But many daemons also create files as part of their work (and they are usually stored in /var): internal files for a database server, mail spool for a mail server, etc. Those are normal and you want to leave them alone.

But you might have non-packaged files in /usr and that should not be the case if you install everything from packages. It would thus be useful to be able to list those files in order to detect a software that has been manually installed.

Manually installed software is not a good idea

Such an installation might cause troubles for example by taking precedence over the same software provided in a Debian package. Over time the local installation will not be upgraded while the packaged one will.

The other packages which depend on this software will believe they have the latest version since their dependency is satisfied but in fact they are using the older version since it takes precedence.

So you want to get rid of those? Let’s see how we can find them.

Use cruft to identify non-packaged files

As I explained above, there are many non-packaged files that are legitimate and that you don’t want to remove. That’s why cruft does something more elaborated than a scan of the filesystem and a check of dpkg’s database.

It provides a way for packages to say which files they might legitimately create during run-time and that cruft should not report. And it knows of many such files. But it’s far from exhaustive and definitely not up-to-date.

So you should always take its output with suspicion and consider twice where the file came from. Do not trust it blindly to remove the files… you have been warned.

How to use cruft

You should give it a list of directories to ignore to reduce the noise in the output, for example like this:

$ sudo cruft -d / -r report --ignore /home --ignore /var --ignore /tmp
$ less report
cruft report: mercredi 23 février 2011, 15:45:34 (UTC+0100)

---- missing: ALTERNATIVES ----
        /etc/alternatives/cli-csc.1.gz
        /usr/share/man/man1/cli-csc.1.gz
---- missing: dpkg ----
        /etc/xdg/autostart/gnome-power-manager.desktop
        /usr/lib/libpython2.6_d.so.1.0-gdb.py
        /usr/share/fonts/X11/100dpi
        /usr/share/fonts/X11/75dpi
---- unexplained: / ----
        /boot
        /dev
        /etc/.java
        /etc/.java/.systemPrefs
[...]
        /usr/lib/pymodules/python2.6
        /usr/lib/pymodules/python2.6/.path
        /usr/lib/pymodules/python2.6/Brlapi-0.5.5.egg-info
[...]

Note that it doesn’t traverse filesystems so if your /usr is on another partition than /, you will need to use the option -d "/ /usr" to have it scan both.

Analyze the report

Now you can quietly go through the report that has been generated and decide which files need to be removed or not. The report also contains missing files (files which should exist according to the dpkg database but which are not there) but the bulk of the listing will be in the “unexplained” section: files which are not part of any package (and whose presence is not explained by any other explain script that packages can ship).

Again take this with great suspicion, and you should rather not delete a file if you don’t know it got there in the first place. For instance, on my system it lists many files below /usr/lib/pymodules/ and those are legitimate: they come from Debian packages but they are copied there dynamically from /usr/{lib,share}/pyshared in order to support multiple python versions. If you remove those files, you effectively break your system.

You will also find many .pyc files created by python packages, they are a byte-compiled version of the corresponding .py file. Removing them breaks nothing but you loose a bit of performance.

On the opposite, most of the files below /usr/local/ are likely the result of some manual software installation and those should be safe to remove (if you know that you are not using the corresponding software).

Conclusion: useful but needs work

In summary, you can use cruft to identify non-packaged files and maybe learn a bit more about what got manually installed on the system, but it requires some patience to go through the report as many of the files reported are false positives.

Yes, cruft badly needs supplementary volunteers to cope with the many ways packages legitimately generate non-packaged files. It’s not even complicated work: the package is mostly in shell and in Perl, and /usr/share/doc/cruft/README.gz explains how it all works.

Do you want to read more tutorials like this one? Click here to subscribe to my free newsletter, you can opt to receive future articles by email.

Debian Cleanup Tip #4: find broken packages and reinstall them

Last week, we learned to get rid of third-party packages, now we’re going one step further: we’ll verify if the files of the installed packages are still exactly like they were when they got installed.

If you’re a tinkerer and hand-edit some files for some quick tests, or if you tend to re-install newer versions of some packages from the sources, you might have overwritten some packaged files and it would be good to be able to detect this (and remedy to the problem). debsums is the tool that makes it possible.

Use debsums to identify modified files

I often use debsums when I take over the maintenance of a Debian server because I want to verify which files have been modified by the former administrator.

Without any argument, debsums is very verbose, it will list every installed file (except configuration files) and tells whether it’s unmodified (“OK”) or not (“FAILED”).

$ sudo debsums
/usr/bin/a2ps                                               OK
[...]

With the --all option, it will verify all files including configuration files. With --config it will verify only the configuration files.

With the --changed option, debsums will only list modified files among those inspected. The following invocation will thus list all files which have been modified on the system and which are not configuration files.

$ sudo debsums --changed
/usr/lib/perl5/AptPkg/Config.pm
/usr/lib/perl5/AptPkg.pm
[...]

Find out the package affected and reinstall it

debsums told me that /usr/lib/perl5/AptPkg.pm was modified. Indeed I remember having manually installed a modified version of that perl module for a quick test.

I find out the affected package with dpkg --search /usr/lib/perl5/AptPkg.pm: it’s libapt-pkg-perl.

Now I just have to reinstall this package to overwrite the modified files with the original ones:

$ sudo aptitude reinstall libapt-pkg-perl
[...]
# Or with apt-get
$ sudo apt-get --reinstall install libapt-pkg-perl
[...]

You might have to repeat the process until debsums no longer reports any modified file.

Do you want to read more tutorials like this one? Click here to subscribe to my free newsletter, you can opt to receive future articles by email.

Debian Cleanup Tip #3: get rid of third-party packages

Last week, we learned how to get rid of obsolete packages. This time, we’re going to learn how to bring back your computer to a state close to a “pure” Ubuntu/Debian installation.

Thanks to the power of APT, it’s easy to add new external repositories and install supplementary software. Unfortunately some of those are not very well maintained. They might contain crappy packages or they might simply not be updated. An external package which was initially working well, can become a burden on system maintenance because it will be interfering with regular updates (for example by requiring a package that should be removed in newer versions of the system).

So my goal for today is to teach you how to identify the packages on your system that are not coming from Debian or Ubuntu. So that you can go through them from time to time and keep only those that you really need. Obsolete packages are a subset of those, but I’ll leave them alone. We took care of them last week.

Each (well-formed) APT repository comes with a “Release” file describing it (example). They provide some values that can be used by APT to identify packages contained in the repository. All official Debian repositories are documented with Origin=Debian (and Origin=Ubuntu for Ubuntu). You can verify the origin value associated to each repository (if any) in the output of apt-cache policy:

[...]
 500 http://ftp.debian.org/debian/ lenny/main i386 Packages
     release v=5.0.8,o=Debian,a=stable,n=lenny,l=Debian,c=main
     origin ftp.debian.org
[...]

From there on, we can simply ask aptitude to compute a list of packages which are both installed and not available in an official Debian repository:

$ aptitude search '?narrow(?installed, !?origin(Debian))!?obsolete'
or
$ aptitude search '~S ~i !~ODebian !~o'

You can replace “search” with “purge” or “remove” if you want to get rid of all the packages listed. But you’re more likely to want to remove only a subset of carefully chosen packages… you’re probably still using some of the software that you installed from external repositories.

With synaptic, you can also browse the content of each repository. Click on the “Origin” button and you have a list of repositories. You can go through the non-Debian repositories and look which packages are installed and up-to-date.

But you can do better, you can create a custom view. Click on the menu entry “Settings > Filter”. Click on “New” to create a new filter and name it “External packages”. Unselect everything in the “Status” tab and keep only “Installed”.

Go in the “Properties” tab and here add a new entry “Origin” “Excludes” “ftp.debian.org”. In fact you must replace “ftp.debian.org” with the hostname of your Debian/Ubuntu mirror. The one that appears on the “origin” line in the output of apt-cache policy (see the excerpt quoted above in this article).

Note that the term “Origin” is used to refer to two different things, a field in the release file but also the name of the host for an APT repository. It’s a bit confusing if you don’t pay attention.

Close the filters window with OK. You now have a new listing of “External packages” under the “Custom Filters” screen. You can see which packages are installed and up-to-date and decide whether you really want to keep it. If the package is also provided by Debian/Ubuntu and you want to go back to the version provided by your distribution, you can use the “Package > Force version…” menu entry.

Click here to subscribe to my free newsletter and get my monthly analysis on what’s going on in Debian and Ubuntu. Or just follow along via the RSS feed, Identi.ca, Twitter or Facebook.

Debian Cleanup Tip #2: Get rid of obsolete packages

Last week, we learned to remove useless configuration files. This week, we’re going to take care of obsolete packages.

An obsolete package is a package who is no longer provided by any of the APT repositories listed in /etc/apt/source.lists (and /etc/apt/sources.list.d/). There can be multiple reasons why a package is no longer available in the repository (or at least not under the same name) :

  • the upstream author stopped maintaining the software a long time ago, nobody else took over and the Debian maintainer preferred to remove the package from Debian. Usually there are alternatives in the Debian archive.
  • the package was orphaned in Debian since a long time, nobody took over and it had very few users. The Debian QA team might have asked its removal.
  • the latest version of the software might have been packaged under a new package name. Either because the amount of changes was so important that it was preferred to not upgrade automatically to the latest version (it has been the case with request-tracker and nagios, they both embed a version number in their package names), or simply because the maintainer wants to let the user install several versions at the same time (that’s the case for example with the Linux kernel, the python interpreter and many libraries).
  • the software has been renamed, the maintainer renamed the packages and kept transitional packages under the old name for one release. Then the transitional packages have been removed.

In any case, it’s never a good idea to keep obsolete packages around: they do not benefit from security updates and they might cause problems during upgrades if they depend on other packages that should be removed to complete the upgrade.

You could blindly remove them with aptitude purge ~o (or aptitude purge ?obsolete) but you might want to first verify what those package are. There might be some packages that you have manually installed, that are not part of any current APT repository, and that you want to keep around nevertheless (I have skype, dropbox and a few personal packages for example). You can get the list with aptitude search ?obsolete

With the graphical package manager (Synaptic), you can find the list of obsolete packages by clicking on the “Status” button and selecting “Installed (local or obsolete)”. You can then go through the list and decide for each package whether you want to keep it or not.

Follow me on Identi.ca, Twitter and Facebook. Or subscribe to this blog by RSS or by email.