Debian Cleanup Tip #5: identify cruft that can be removed from your Debian system

Last week we learned how to identify and restore packages whose files have been corrupted. This time we’ll concentrate ourselves on the non-packaged files…

Non-packaged files

They are files which are not provided by a Debian package, or in other words, files where dpkg --search finds no associated package:

$ dpkg --search /srv/cvs
dpkg-query: no path found matching pattern /srv/cvs

You always have such files on your system, at least all your own files in /home. But many daemons also create files as part of their work (and they are usually stored in /var): internal files for a database server, mail spool for a mail server, etc. Those are normal and you want to leave them alone.

But you might have non-packaged files in /usr and that should not be the case if you install everything from packages. It would thus be useful to be able to list those files in order to detect a software that has been manually installed.

Manually installed software is not a good idea

Such an installation might cause troubles for example by taking precedence over the same software provided in a Debian package. Over time the local installation will not be upgraded while the packaged one will.

The other packages which depend on this software will believe they have the latest version since their dependency is satisfied but in fact they are using the older version since it takes precedence.

So you want to get rid of those? Let’s see how we can find them.

Use cruft to identify non-packaged files

As I explained above, there are many non-packaged files that are legitimate and that you don’t want to remove. That’s why cruft does something more elaborated than a scan of the filesystem and a check of dpkg’s database.

It provides a way for packages to say which files they might legitimately create during run-time and that cruft should not report. And it knows of many such files. But it’s far from exhaustive and definitely not up-to-date.

So you should always take its output with suspicion and consider twice where the file came from. Do not trust it blindly to remove the files… you have been warned.

How to use cruft

You should give it a list of directories to ignore to reduce the noise in the output, for example like this:

$ sudo cruft -d / -r report --ignore /home --ignore /var --ignore /tmp
$ less report
cruft report: mercredi 23 février 2011, 15:45:34 (UTC+0100)

---- missing: ALTERNATIVES ----
        /etc/alternatives/cli-csc.1.gz
        /usr/share/man/man1/cli-csc.1.gz
---- missing: dpkg ----
        /etc/xdg/autostart/gnome-power-manager.desktop
        /usr/lib/libpython2.6_d.so.1.0-gdb.py
        /usr/share/fonts/X11/100dpi
        /usr/share/fonts/X11/75dpi
---- unexplained: / ----
        /boot
        /dev
        /etc/.java
        /etc/.java/.systemPrefs
[...]
        /usr/lib/pymodules/python2.6
        /usr/lib/pymodules/python2.6/.path
        /usr/lib/pymodules/python2.6/Brlapi-0.5.5.egg-info
[...]

Note that it doesn’t traverse filesystems so if your /usr is on another partition than /, you will need to use the option -d "/ /usr" to have it scan both.

Analyze the report

Now you can quietly go through the report that has been generated and decide which files need to be removed or not. The report also contains missing files (files which should exist according to the dpkg database but which are not there) but the bulk of the listing will be in the “unexplained” section: files which are not part of any package (and whose presence is not explained by any other explain script that packages can ship).

Again take this with great suspicion, and you should rather not delete a file if you don’t know it got there in the first place. For instance, on my system it lists many files below /usr/lib/pymodules/ and those are legitimate: they come from Debian packages but they are copied there dynamically from /usr/{lib,share}/pyshared in order to support multiple python versions. If you remove those files, you effectively break your system.

You will also find many .pyc files created by python packages, they are a byte-compiled version of the corresponding .py file. Removing them breaks nothing but you loose a bit of performance.

On the opposite, most of the files below /usr/local/ are likely the result of some manual software installation and those should be safe to remove (if you know that you are not using the corresponding software).

Conclusion: useful but needs work

In summary, you can use cruft to identify non-packaged files and maybe learn a bit more about what got manually installed on the system, but it requires some patience to go through the report as many of the files reported are false positives.

Yes, cruft badly needs supplementary volunteers to cope with the many ways packages legitimately generate non-packaged files. It’s not even complicated work: the package is mostly in shell and in Perl, and /usr/share/doc/cruft/README.gz explains how it all works.

Do you want to read more tutorials like this one? Click here to subscribe to my free newsletter, you can opt to receive future articles by email.

Debian Cleanup Tip #4: find broken packages and reinstall them

Last week, we learned to get rid of third-party packages, now we’re going one step further: we’ll verify if the files of the installed packages are still exactly like they were when they got installed.

If you’re a tinkerer and hand-edit some files for some quick tests, or if you tend to re-install newer versions of some packages from the sources, you might have overwritten some packaged files and it would be good to be able to detect this (and remedy to the problem). debsums is the tool that makes it possible.

Use debsums to identify modified files

I often use debsums when I take over the maintenance of a Debian server because I want to verify which files have been modified by the former administrator.

Without any argument, debsums is very verbose, it will list every installed file (except configuration files) and tells whether it’s unmodified (“OK”) or not (“FAILED”).

$ sudo debsums
/usr/bin/a2ps                                               OK
[...]

With the --all option, it will verify all files including configuration files. With --config it will verify only the configuration files.

With the --changed option, debsums will only list modified files among those inspected. The following invocation will thus list all files which have been modified on the system and which are not configuration files.

$ sudo debsums --changed
/usr/lib/perl5/AptPkg/Config.pm
/usr/lib/perl5/AptPkg.pm
[...]

Find out the package affected and reinstall it

debsums told me that /usr/lib/perl5/AptPkg.pm was modified. Indeed I remember having manually installed a modified version of that perl module for a quick test.

I find out the affected package with dpkg --search /usr/lib/perl5/AptPkg.pm: it’s libapt-pkg-perl.

Now I just have to reinstall this package to overwrite the modified files with the original ones:

$ sudo aptitude reinstall libapt-pkg-perl
[...]
# Or with apt-get
$ sudo apt-get --reinstall install libapt-pkg-perl
[...]

You might have to repeat the process until debsums no longer reports any modified file.

Do you want to read more tutorials like this one? Click here to subscribe to my free newsletter, you can opt to receive future articles by email.

Debian Cleanup Tip #3: get rid of third-party packages

Last week, we learned how to get rid of obsolete packages. This time, we’re going to learn how to bring back your computer to a state close to a “pure” Ubuntu/Debian installation.

Thanks to the power of APT, it’s easy to add new external repositories and install supplementary software. Unfortunately some of those are not very well maintained. They might contain crappy packages or they might simply not be updated. An external package which was initially working well, can become a burden on system maintenance because it will be interfering with regular updates (for example by requiring a package that should be removed in newer versions of the system).

So my goal for today is to teach you how to identify the packages on your system that are not coming from Debian or Ubuntu. So that you can go through them from time to time and keep only those that you really need. Obsolete packages are a subset of those, but I’ll leave them alone. We took care of them last week.

Each (well-formed) APT repository comes with a “Release” file describing it (example). They provide some values that can be used by APT to identify packages contained in the repository. All official Debian repositories are documented with Origin=Debian (and Origin=Ubuntu for Ubuntu). You can verify the origin value associated to each repository (if any) in the output of apt-cache policy:

[...]
 500 http://ftp.debian.org/debian/ lenny/main i386 Packages
     release v=5.0.8,o=Debian,a=stable,n=lenny,l=Debian,c=main
     origin ftp.debian.org
[...]

From there on, we can simply ask aptitude to compute a list of packages which are both installed and not available in an official Debian repository:

$ aptitude search '?narrow(?installed, !?origin(Debian))!?obsolete'
or
$ aptitude search '~S ~i !~ODebian !~o'

You can replace “search” with “purge” or “remove” if you want to get rid of all the packages listed. But you’re more likely to want to remove only a subset of carefully chosen packages… you’re probably still using some of the software that you installed from external repositories.

With synaptic, you can also browse the content of each repository. Click on the “Origin” button and you have a list of repositories. You can go through the non-Debian repositories and look which packages are installed and up-to-date.

But you can do better, you can create a custom view. Click on the menu entry “Settings > Filter”. Click on “New” to create a new filter and name it “External packages”. Unselect everything in the “Status” tab and keep only “Installed”.

Go in the “Properties” tab and here add a new entry “Origin” “Excludes” “ftp.debian.org”. In fact you must replace “ftp.debian.org” with the hostname of your Debian/Ubuntu mirror. The one that appears on the “origin” line in the output of apt-cache policy (see the excerpt quoted above in this article).

Note that the term “Origin” is used to refer to two different things, a field in the release file but also the name of the host for an APT repository. It’s a bit confusing if you don’t pay attention.

Close the filters window with OK. You now have a new listing of “External packages” under the “Custom Filters” screen. You can see which packages are installed and up-to-date and decide whether you really want to keep it. If the package is also provided by Debian/Ubuntu and you want to go back to the version provided by your distribution, you can use the “Package > Force version…” menu entry.

Click here to subscribe to my free newsletter and get my monthly analysis on what’s going on in Debian and Ubuntu. Or just follow along via the RSS feed, Identi.ca, Twitter or Facebook.

Debian Cleanup Tip #2: Get rid of obsolete packages

Last week, we learned to remove useless configuration files. This week, we’re going to take care of obsolete packages.

An obsolete package is a package who is no longer provided by any of the APT repositories listed in /etc/apt/source.lists (and /etc/apt/sources.list.d/). There can be multiple reasons why a package is no longer available in the repository (or at least not under the same name) :

  • the upstream author stopped maintaining the software a long time ago, nobody else took over and the Debian maintainer preferred to remove the package from Debian. Usually there are alternatives in the Debian archive.
  • the package was orphaned in Debian since a long time, nobody took over and it had very few users. The Debian QA team might have asked its removal.
  • the latest version of the software might have been packaged under a new package name. Either because the amount of changes was so important that it was preferred to not upgrade automatically to the latest version (it has been the case with request-tracker and nagios, they both embed a version number in their package names), or simply because the maintainer wants to let the user install several versions at the same time (that’s the case for example with the Linux kernel, the python interpreter and many libraries).
  • the software has been renamed, the maintainer renamed the packages and kept transitional packages under the old name for one release. Then the transitional packages have been removed.

In any case, it’s never a good idea to keep obsolete packages around: they do not benefit from security updates and they might cause problems during upgrades if they depend on other packages that should be removed to complete the upgrade.

You could blindly remove them with aptitude purge ~o (or aptitude purge ?obsolete) but you might want to first verify what those package are. There might be some packages that you have manually installed, that are not part of any current APT repository, and that you want to keep around nevertheless (I have skype, dropbox and a few personal packages for example). You can get the list with aptitude search ?obsolete

With the graphical package manager (Synaptic), you can find the list of obsolete packages by clicking on the “Status” button and selecting “Installed (local or obsolete)”. You can then go through the list and decide for each package whether you want to keep it or not.

Follow me on Identi.ca, Twitter and Facebook. Or subscribe to this blog by RSS or by email.

Debian Cleanup Tip #1: Get rid of useless configuration files

If you like to keep your place clean, you probably want to do the same with your computer. I’m going to show you a few tips over the next 4 weeks so that you can keep your Debian/Ubuntu system free of dust!

Over time the set of packages that is installed on your system changes, either because you install and remove stuff, or because the distribution evolved (and you upgraded your system to the latest version).

But the Debian packaging system is designed to keep configuration files when a package is removed. That way if you reinstall it, you won’t have to redo the configuration. That’s a nice feature but what if you will never reinstall those packages?

Then those configuration files become clutter that you would rather get rid of. In some cases, those files lying around might have unwanted side-effects (recent example: it can block the switch to a dependency-based boot sequence because obsolete init scripts without the required dependencies are still present).

The solution is to “purge” all packages which are in the “config-files” state. With aptitude you can do aptitude purge ~c (or aptitude purge ?config-files). Replace “purge” by “search” if you only want to see a list of the affected packages.

If you want a machine-friendly list of the packages in that state, you could use one of those commands (and then pass the result to apt-get if you don’t have aptitude available):

$ grep-status -n -sPackage -FStatus config-files
[...]
$ dpkg-query -f '${Package} ${Status}\n' -W | grep config-files$ | cut -d" " -f1
[...]

Note that grep-status is part of the dctrl-tools package.

Of course you can also use graphical package managers, like Synaptic. Click on the “Status” button on the bottom left, then on “Not installed (residual config)” and you have a list of packages that you can purge. You can select them all, right click and pick “Mark for Complete Removal”. See the screenshot below. The last step is to click on “Apply” to get the packages purged.

Synaptic purging residul config files

Do you want to read more tutorials like this one? Click here to subscribe to my free newsletter, you can opt to receive future articles by email.

5 reasons why Debian Unstable does not deserve its name

Debian Unstable (also known as sid) is one of the 3 distributions that Debian provides (along with Stable and Testing).

It’s not conceived as a product for end-users, instead it’s the place where contributors are uploading newer packages. Daily. Yes that means that Unstable is a quickly moving target and it’s not for everybody. But you can use it and your computer won’t explode.

1. It contains mainly stable versions of the software

Yes, you read it right. Unstable is not full of development versions of the various software. It happens on some software but then it’s usually a conscious decision of the maintainer who believes that this specific version is already better than the previous one.

The packages in sid are supposed to migrate to testing, the place where the next Debian stable release is prepared. So maintainers are advised to only upload stuff that is of release quality, the rest should be uploaded to experimental instead.

2. It doesn’t break badly every other day

Breakages happen but they are not a big deal usually. It has been long time since I could not reboot my computer after an upgrade or since the graphical interface was no longer working. The kind of breakages that you have is that one software stops working, or triggers an annoying bug, or that a few packages are uninstallable.

In most cases, you can save yourself by downgrading to the version available in Testing. Or by finding a work-around in the bug tracking system. Or by not upgrading because you have apt-listbugs installed and you have been warned about the problem.

3. It’s the basis of other distributions

If Debian Unstable was really so bad, it would not be a good basis to build a derivative distribution, isn’t it? But Ubuntu and SiduxAptosid (to name only two) are based on Debian Sid.

4. It’s not inherently less secure than Stable or Testing

High impact security vulnerabilities will usually be quickly fixed in Stable and Unstable. The stable upload is done by the security team while the unstable one is made by the maintainer. Testing will usually get the fix through the package uploaded to Unstable, so testing users get security updates with a delay.

For less serious vulnerabilities, it’s entirely possible that stable does not get any update at all. In that case, unstable/testing users are better served since they will get the fix with the next upstream version anyway.

Of course, it happens that maintainers are busy or that something falls through the cracks, but there are other people watching RC bugs who will fix this if the maintainer doesn’t react at all.

5. I use it on my main computer

And many other people do the same. And you can do the same if you meet the criteria below:

  • you can work on the command-line (enough to downgrade a problematic package, to edit configuration files, etc.);
  • you know how to work with APT and multiple distributions in /etc/apt/sources.list;
  • you are able to read/write English so that you can read/file bug reports when needed;
  • you have another computer connected to the Internet that you can use to lookup documentation (or the bug tracking system, or the support mailing lists) when your usual computer is off-line for a reason that you don’t understand.

If you feel you are not ready for the jump, click here to subscribe to this blog (or here via the RSS feed), I’ll surely teach some of the required skills in future articles.

PS: All that said, if you have a working sid installation, do not upgrade it just before an important presentation, or before a trip. It will always break at the most annoying time. Unless you like to live dangerously, of course.

Howto to rebuild Debian packages

Being able to rebuild an existing Debian package is a very useful skill. It’s a prerequisite for many tasks that an admin might want to perform at some point: enable a feature that is disabled in the official Debian package, rebuild a source package for another suite (for example build a Debian Testing package for use on Debian Stable, we call that backporting), include a bug fix that upstream developers prepared, etc. Discover the 4 steps to rebuild a Debian package.

1. Download the source package

The preferred way to download source packages is to use APT. It can download them from the source repositories that you have configured in /etc/apt/sources.list, for example:

deb-src http://ftp.debian.org/debian unstable main contrib non-free
deb-src http://ftp.debian.org/debian testing main contrib non-free
deb-src http://ftp.debian.org/debian stable main contrib non-free

Note that the lines start with “deb-src” instead of the usual “deb”. This tells APT that we are interested in the source packages and not in the binary packages.

After an apt-get update you can use apt-get source publican to retrieve the latest version of the source package “publican”. You can also indicate the distribution where the source package must be fetched with the syntax “package/distribution“. apt-get source publican/testing will grab the source package publican in the testing distribution and extract it in the current directory (with dpkg-source -x, thus you need to have installed the dpkg-dev package).

$ apt-get source publican/testing
Reading package lists... Done
Building dependency tree
Reading state information... Done
NOTICE: 'publican' packaging is maintained in the 'Git' version control system at:
git://git.debian.org/collab-maint/publican.git
Need to get 727 kB of source archives.
Get:1 http://nas/debian/ squeeze/main publican 2.1-2 (dsc) [2253 B]
Get:2 http://nas/debian/ squeeze/main publican 2.1-2 (tar) [720 kB]
Get:3 http://nas/debian/ squeeze/main publican 2.1-2 (diff) [4728 B]
Fetched 727 kB in 0s (2970 kB/s)
dpkg-source: info: extracting publican in publican-2.1
dpkg-source: info: unpacking publican_2.1.orig.tar.gz
dpkg-source: info: unpacking publican_2.1-2.debian.tar.gz
$ ls -dF publican*
publican-2.1/                 publican_2.1-2.dsc
publican_2.1-2.debian.tar.gz  publican_2.1.orig.tar.gz

If you don’t want to use APT, or if the source package is not hosted in an APT source repository, you can download a complete source package with dget -u dsc-url where dsc-url is the URL of the .dsc file representing the source package. dget is provided by the devscripts package. Note that the -u option means that the origin of the source package is not verified before extraction.

2. Install the build-dependencies

Again APT can do the grunt work for you, you just have to use apt-get build-dep foo to install the build-dependencies for the last version of the source package foo. It supports the same syntactic sugar than apt-get source so that you can run apt-get build-dep publican/testing to install the build-dependencies required to build the testing version of the publican source package.

If you can’t use APT for this, enter the directory where the source package has been unpacked and run dpkg-checkbuilddeps. It will spit out a list of unmet build dependencies (if there are any, otherwise it will print nothing and you can go ahead safely). With a bit of copy and paste and a “apt-get install” invocation, you’ll install the required packages in a few seconds.

3. Do whatever changes you need

I won’t detail this step since it depends on your specific goal with the rebuild. You might have to edit debian/rules, or to apply a patch.

But one thing is sure, if you have made any change or have recompiled the package in a different environment, you should really change its version number. You can do this with “dch --local foo” (again from the devscripts package), replace “foo” by a short name identifying you as the supplier of the updated version. It will update debian/changelog and invite you to write a small entry documenting your change.

4. Build the package

The last step is also the simplest one now that everything is in place. You must be in the directory of the unpacked source package.
Now run either “debuild -us -uc” (recommended, requires the devscripts package) or directly “dpkg-buildpackage -us -uc”. The “-us -uc” options avoid the signature step in the build process that would generate a (harmless) failure at the end if you have no GPG key matching the name entered in the top entry of the Debian changelog.

$ cd publican-2.1
$ debuild -us -uc
 dpkg-buildpackage -rfakeroot -D -us -uc
dpkg-buildpackage: export CFLAGS from dpkg-buildflags (origin: vendor): -g -O2
dpkg-buildpackage: export CPPFLAGS from dpkg-buildflags (origin: vendor):
dpkg-buildpackage: export CXXFLAGS from dpkg-buildflags (origin: vendor): -g -O2
dpkg-buildpackage: export FFLAGS from dpkg-buildflags (origin: vendor): -g -O2
dpkg-buildpackage: export LDFLAGS from dpkg-buildflags (origin: vendor):
dpkg-buildpackage: source package publican
dpkg-buildpackage: source version 2.1-2rh1
dpkg-buildpackage: source changed by Raphaël Hertzog 
 dpkg-source --before-build publican-2.1
dpkg-buildpackage: host architecture i386
[...]
dpkg-deb: building package `publican' in `../publican_2.1-2rh1_all.deb'.
 dpkg-genchanges  >../publican_2.1-2rh1_i386.changes
dpkg-genchanges: not including original source code in upload
 dpkg-source --after-build publican-2.1
dpkg-buildpackage: binary and diff upload (original source NOT included)
Now running lintian...
Finished running lintian.

The build is over, the updated source and binary packages have been generated in the parent directory.

$ cd ..
$ ls -dF publican*
publican-2.1/                    publican_2.1-2rh1.dsc
publican_2.1-2.debian.tar.gz     publican_2.1-2rh1_i386.changes
publican_2.1-2.dsc               publican_2.1-2rh1_source.changes
publican_2.1-2rh1_all.deb        publican_2.1.orig.tar.gz
publican_2.1-2rh1.debian.tar.gz

Do you want to read more tutorials like this one? Click here to subscribe to my free newsletter, you can opt to receive future articles by email.

How Ubuntu builds up on Debian

I have been asked how Ubuntu relates to Debian, and how packages flow from one to the other. So here’s my attempt at clarifying the whole picture.

Where do the packages come from?

Most packages are created by Debian contributors and they are uploaded in Debian unstable (or Debian experimental). New packages are reviewed by the Debian ftpmasters before being accepted in the official archive. The packages are held in the NEW queue until the review is over, and the time spent there varies between a few hours and a few months (usually they are processed within one week or two).

Ubuntu imports all the official Debian packages, but they also add some packages of their own. About 7% of the Ubuntu packages are third-party software that have been packaged for Ubuntu but not for Debian.

What are the changes made by Ubuntu?

From all the source packages coming from Debian, 17% have additional changes made by Ubuntu. Many of them are part of the “main” repository, which is actively maintained by Canonical and Ubuntu core developers. The “universe” repository is usually closer to the official Debian packages.

Many of the changes made by Ubuntu are the results of the decisions taken during the Ubuntu Developer Summit in order to reach specific goals: provide a better user interface, offer faster boot times, become a better platform for third-party software developers, offer a good integration with their online services (Launchpad, Ubuntu One), etc. Other changes are simply the result of fixing bugs reported by Ubuntu users.

Note that even non-modified source packages will result in different binary packages for Ubuntu. That’s because Ubuntu has made changes to the build environment. They only support Intel-based computers with a 686-class (or newer) CPU, they enable some compiler options that Debian doesn’t, etc. And all binary packages are modified by a program called pkgbinarymangler.

Ubuntu’s release cycle and the relation with Debian

Ubuntu releases every 6 months (that’s what time based releases is about). Debian has a very different schedule. How does Ubuntu manage to reuse Debian’s work?

Ubuntu imports packages from Debian unstable (even experimental sometimes) to get the newest packages. If the Ubuntu package already has Ubuntu-specific changes, they merge their changes in the updated Debian package. Otherwise the Debian package is simply grabbed and rebuilt in Ubuntu. This works well because Debian unstable is much more usable than the name suggests. And this process only goes on during the first 2 months of the cycle (until the Debian Import Freeze), so there’s plenty of time afterward to fix the biggest problems.

In the third and fourth month, it’s still possible to pick updated packages from Debian but it must be requested by a developer, it won’t be done automatically. At the end of the fourth month, the feature freeze is put in place.

The 2 months left are dedicated to bug fixing and polishing the distribution. There are various sub-freezes that happen in this period, you can check the Natty release schedule as an example. Picking updated packages from Debian is now the exception, it will only be allowed if the update on the Debian side is a bug-fix only release.

Credits: some figures taken from a talk of Lucas Nussbaum, they were collected based on the packages available in the Lucid Lynx release of Ubuntu.

Click here to subscribe to my newsletter and get my monthly update on what’s going on in Debian and Ubuntu.

Save disk space by excluding useless files with dpkg

Most packages contain files that you don’t need: for example translations in languages that you don’t understand, or documentation that you don’t read. Wouldn’t it be nice if you could get rid of them and save a few megabytes? Good news: since dpkg 1.15.8 you can!

dpkg has two options --path-include=glob-pattern and --path-exclude=glob-pattern that control what files are installed or not. The pattern work the same than what you’re used to on the shell (see the glob(7) manual page).

Passing those options on the command-line would be impractical, so the best way to use them is to put them in a file in /etc/dpkg/dpkg.cfg.d/. Beware, the order of the options does matter: when a file matches several options, the last one makes the decision.

A typical usage is to first exclude a directory and then to re-include parts of that directory that you want to keep. For example if you want to drop gettext translations and translated manual pages except French, you could put this in /etc/dpkg/dpkg.cfg.d/excludes:

# Drop locales except French
path-exclude=/usr/share/locale/*
path-include=/usr/share/locale/fr/*
path-include=/usr/share/locale/locale.alias

# Drop translated manual pages except French
path-exclude=/usr/share/man/*
path-include=/usr/share/man/man[1-9]/*
path-include=/usr/share/man/fr*/*

Note that the files will vanish progressively every time that a package is upgraded. If you want to save space immediately, you have to reinstall the packages present in your system. aptitude reinstall or apt-get --reinstall install might help. In theory with aptitude you can even do aptitude reinstall ~i but it tends to not work because one package is not available (either because it was installed manually or because the installed version has been superseded by a newer version on the mirror).

Found it useful? Click here to see how you can encourage me to provide more articles like this one.

5 reasons why a Debian package is more than a simple file archive

Folder with gearsYou’re probably manipulating Debian packages everyday, but do you know what those files are? This article will show you their bowels… Surely they are more than file archives otherwise we would just use TAR archives (you know those files ending with .tar.gz). Let’s have a look!

1. It’s two TAR file archives in an AR file archive!

A .deb file is actually an archive using the AR format, you can manipulate it with the ar command. This archive contains 3 files, you can check it yourself, download any .deb file and run “ar t” on it:

$ ar t gwibber_2.31.91-1_all.deb
debian-binary
control.tar.gz
data.tar.gz

debian-binary is a text file indicating the version of the format of the .deb file, the current version is “2.0″.

$ ar p gwibber_2.31.91-1_all.deb debian-binary
2.0

data.tar.gz contains the real files of the package, the content of that archive gets installed in your root directory when you run “dpkg --unpack“.

But the most interesting part—which truly makes .deb files more than a file archive—is the last file. control.tar.gz contains meta-information used by the package manager. What are they?

$ ar p gwibber_2.31.91-1_all.deb control.tar.gz | tar tzf -
./
./postinst
./prerm
./preinst
./postrm
./conffiles
./md5sums
./control

2. It contains meta-information defining the package and its relationships

The control file within the control.tar.gz archive is the most fundamental file. It contains basic information about the package like its name, its version, its description, the architecture it runs on, who is maintaining it and so on. It also contains dependency fields so that the package manager can ensure that everything needed by the package is installed before-hand. If you want to learn more about those fields, you can check Binary control files in the Debian Policy.

Those information end up in /var/lib/dpkg/status once the package is installed.

3. It contains maintainer scripts so that everything can just work out of the box

At various steps of the installation/upgrade/removal process, dpkg is executing the maintainer scripts provided by the package:

  • postinst: after installation
  • preinst: before installation
  • postrm: after removal
  • prerm: before removal

Note that this description is largely simplified. In fact the scripts are executed on many other occasions with different parameters. There’s an entire chapter of the Debian Policy dedicated to this topic. But you might find this wiki page easier to grasp: http://wiki.debian.org/MaintainerScripts.

While this looks scary, it’s a very important feature. It’s required to cope with non-backwards compatible upgrades, to provide automatic configuration, to create system users on the fly, etc.

4. Configuration files are special files

Unpacking a file archive overwrites the previous version of the files. This is the desired behavior when you upgrade a package, except for configuration files. You prefer not to loose your customizations, don’t you?

That’s why packages can list configuration files in the conffiles file provided by control.tar.gz. That way dpkg will deal with them in a special way.

5. You can always add new meta-information

And in fact many tools already exploit the possibility to provide supplementary files in control.tar.gz:

  • debsums use the md5sums file to ensure no files were accidentally modified
  • dpkg-shlibdeps uses shlibs and symbols files to generate dependencies on libraries
  • debconf uses config scripts to collect configuration information from the user

Once installed, those files are kept by dpkg in /var/lib/dpkg/info/package.* along with maintainer scripts.

If you want to read more articles like this one, click here to subscribe to my free newsletter. You can also follow me on Identi.ca, Twitter and Facebook.