Save disk space by excluding useless files with dpkg

Most packages contain files that you don’t need: for example translations in languages that you don’t understand, or documentation that you don’t read. Wouldn’t it be nice if you could get rid of them and save a few megabytes? Good news: since dpkg 1.15.8 you can!

dpkg has two options --path-include=glob-pattern and --path-exclude=glob-pattern that control what files are installed or not. The pattern work the same than what you’re used to on the shell (see the glob(7) manual page).

Passing those options on the command-line would be impractical, so the best way to use them is to put them in a file in /etc/dpkg/dpkg.cfg.d/. Beware, the order of the options does matter: when a file matches several options, the last one makes the decision.

A typical usage is to first exclude a directory and then to re-include parts of that directory that you want to keep. For example if you want to drop gettext translations and translated manual pages except French, you could put this in /etc/dpkg/dpkg.cfg.d/excludes:

# Drop locales except French
path-exclude=/usr/share/locale/*
path-include=/usr/share/locale/fr/*
path-include=/usr/share/locale/locale.alias

# Drop translated manual pages except French
path-exclude=/usr/share/man/*
path-include=/usr/share/man/man[1-9]/*
path-include=/usr/share/man/fr*/*

Note that the files will vanish progressively every time that a package is upgraded. If you want to save space immediately, you have to reinstall the packages present in your system. aptitude reinstall or apt-get --reinstall install might help. In theory with aptitude you can even do aptitude reinstall ~i but it tends to not work because one package is not available (either because it was installed manually or because the installed version has been superseded by a newer version on the mirror).

Found it useful? Click here to see how you can encourage me to provide more articles like this one.

Managing distribution-specific patches with a common source package

In the comments of the article explaining how to generate different dependencies on Debian and Ubuntu with a common source package, I got asked if it was possible to apply a patch only in some distribution. And indeed it is.

The source package format 3.0 (quilt) has a neat feature for this. Instead of unconditionally using debian/patches/series to look up patches, dpkg-source first tries to use debian/patches/vendor.series (where vendor is ubuntu, debian, etc.). Note that dpkg-source does not stack patches from multiple series file, it uses a single series file, the first that exists.

So what’s the best way to use this? Debian should always provide debian/patches/series, they are supposed to provide the default set of patches to use. Any derivative cooperating with Debian can maintain their own series files within the common VCS repository used for package maintenance. They can drop Debian-specific patches (say branding patches for example), and they can add their own on top of the remaining Debian patches.

It’s worth noting that it’s the job of the maintainers to keep both series files in sync when needed. dpkg-source offers no way to have stacked series files (or dependencies between them).

If you want to use quilt to edit an alternate series file, you can temporarily set the QUILT_SERIES environment variable to “vendor.series”. Just make sure to start from a clean state, i.e. no patches applied. Otherwise quilt will be confused by the sudden mismatch between the series file and its internal data (stored in the .pc directory).

Found it useful? Click here to see how you can encourage me to provide more articles like this one.

Correctly renaming a conffile in Debian package maintainer scripts

After having dealt with the removal of obsolete conffiles, I’ll now explain what you should do when a configuration file managed by dpkg must be renamed.

The problem

Let’s suppose that version 1.2 of the software stopped providing /etc/foo.conf. Instead it provides /etc/bar.conf because the configuration file got renamed. If you do nothing special, the new conffile will be installed with the default configuration, and the old one will stay around. Any customization made by the administrator are lost in the process (in fact they are not lost, they are still in foo.conf but they are unused).

Of course, you could do mv /etc/foo.conf /etc/bar.conf in the pre-installation script. But that’s not satisfactory: it will generate a spurious conffile prompt that the end-user will not understand.

The solution

In the preinst script, you have to verify if the old conffile has been modified by the administrator. If yes, you want to keep the file around. Otherwise you know you will be able to ditch it once the upgrade is over, and you rename it to /etc/foo.conf.dpkg-remove to remember this fact.

In the postinst script, you remove /etc/foo.conf.dpkg-remove. If the old conffile (/etc/foo.conf) still exists, it’s because it was modified by the administrator. You make a backup of the new conffile in /etc/bar.conf.dpkg-dist and rename the old one into /etc/bar.conf.

In the postrm, when called to abort an upgrade, you move /etc/foo.conf.dpkg-remove back to its original name.

In practice, use dpkg-maintscript-helper

dpkg-maintscript-helper can automate all those tasks. You just have to put the following snippet in the maintainer scripts (postinst, postrm, preinst):

if dpkg-maintscript-helper supports mv_conffile 2>/dev/null; then
    dpkg-maintscript-helper mv_conffile /etc/foo.conf /etc/bar.conf 1.1-3 -- "$@"
fi

In this example, I assumed that version 1.1-3 was the last version of the package that contained /etc/foo.conf (i.e. the last version released before 1.2-1 was packaged).

You can avoid the preliminary test if you pre-depend on “dpkg (>= 1.15.7.2)” or if enough time has passed to assume that everybody has a newer version anyway. You can learn all the details in dpkg-maintscript-helper’s manual page.

Debhelper support

Debhelper makes it easy to inject those commands in the maintainer scripts. You just have to provide debian/*.maintscript files. See dh_installdeb’s manual page for details.

Found it useful? Be sure to not miss other packaging tips (or lessons), click here to subscribe to my free newsletter and get new articles by email (just check “Send me blog updates”).

The right way to remove an obsolete conffile in a Debian package

A conffile is a configuration file managed by dpkg, I’m sure you remember the introductory article about conffiles. When your package stops providing a conffile, the file stays on disk and it’s recorded as obsolete by the package manager. It’s only removed during purge. If you want the file to go away, you have to remove it yourself within your package’s configuration scripts. You will now learn how to do this right.

When is that needed?

dpkg errs on the side of safety by not removing the file until purge but in most cases it’s best to remove it sooner so as to not confuse the user. In some cases, it’s even required because keeping the file could break the software (for example if the file is in a .d configuration directory, and if it contains directives that are either no longer supported by the new version or in conflict with other new configuration files).

What’s complicated in “rm”?

So you want to remove the conffile. Adding an “rm” command in debian/postinst sounds easy. Except it’s not the right thing to do. The conffile might contain customizations made by the administrator and you don’t want to wipe those. Instead you want to keep the file around so that he can get his changes back and do whatever is required with those.

The correct action is thus to move the file away in the prerm, to ensure it doesn’t disturb the new version. At the same time, you need to verify whether the conffile has been modified by the administrator and remember it for later. In the postinst, you need to remove the file if it’s unmodified, or keep it under a different name that doesn’t interfere with the software. In many cases adding a simple .dpkg-bak suffix is enough. For instance, run-parts ignore files that contain a dot, and many other software are configured to only include files with a certain extension—say *.conf. In the postrm, you have to remove the obsolete conffiles that were kept due to local changes and you should also restore the original conffile in case the upgrade obsoleting the conffile is aborted.

Automating everything with dpkg-maintscript-helper

Phewww… that’s a lot of things to do for a seemingly simple task. Fortunately everything can be automated with dpkg-maintscript-helper. Let’s assume you want to remove /etc/foo/conf.d/bar because it’s obsolete and you’re going to prepare a new version 1.2-1 with the appropriate code to remove the file on upgrade. You just have to put this snippet in the 3 relevant scripts (preinst, postinst, postrm):

if dpkg-maintscript-helper supports rm_conffile 2>/dev/null; then
    dpkg-maintscript-helper rm_conffile /etc/foo/conf.d/bar 1.2-1 -- "$@"
fi

You can avoid the preliminary test if you pre-depend on “dpkg (>= 1.15.7.2)” or if enough time has passed to assume that everybody has a newer version anyway. You can learn all the details in dpkg-maintscript-helper’s manual page.

Debhelper integration

debhelper makes it easy to inject those commands for you. You can provide debian/*.maintscript files. See dh_installdeb’s manual page for details.

I hope you found this article helpful. You can follow me on Identi.ca, Twitter and Facebook.

How to generate different dependencies on Debian and Ubuntu with a common source package

There are situations in which a given package needs to have different dependencies in Debian and in Ubuntu. Despite this difference it’s possible to keep a single source package that will build both variants of the package. Continue reading to discover a step-by-step explanation.

1. When is that needed?

While it is possible to have different dependencies depending on the distribution on which the package is built, it should be usually avoided when possible. This infrastructure should only be used as a last resort when there are no better alternatives.

Here are some examples of when it might be needed:

  • Ubuntu has some packages that Debian does not have (or vice-versa), and the resulting package would benefit from having them installed.
  • The package names differ between Ubuntu and Debian and it’s on purpose (i.e. the difference is a justified choice and not a mistake because both distributions failed to coordinate).
  • The packages are built differently in both distributions, and the run time dependencies are not the same due to this. Maybe some associated patches are only applied in Ubuntu.

2. Using substitution variables in debian/control

The dependency that varies between both distributions can’t be hardcoded in debian/control, instead you should put a substitution variable (substvar) that will be replaced at build time by dpkg-gencontrol. You can name it ${dist:Depends} for example:

[...]
Depends: bzip2, ${shlibs:Depends}, ${misc:Depends}, ${dist:Depends}
[...]

Note that you typically already have other substitution variables (${shlibs:Depends} comes from dpkg-shlibdeps and ${misc:Depends} from debhelper and its dh_* scripts).

3. dpkg-gencontrol needs a -V option

The value used to replace this new variable needs to be communicated to dpkg-gencontrol. You can use the -V option for this, the syntax would be something like this:

dpkg-gencontrol [...] -Vdist:Depends="foo (>= 2), bar"

If you use debhelper, you have to pass the option to dh_gencontrol after two dashes (--):

dh_gencontrol -- -Vdist:Depends="foo (>= 2), bar"

If you use CDBS, you can set the DEB_DH_GENCONTROL_ARGS_ALL make variable:

include /usr/share/cdbs/1/rules/debhelper.mk
DEB_DH_GENCONTROL_ARGS_ALL = -- -Vdist:Depends="foo (>= 2), bar"

The value given to dpkg-gencontrol is static in all those examples, now let’s see how we can use give a different value depending on the distribution that we’re targetting.

4. Using dpkg-vendor in debian/rules

dpkg-vendor is a small tool (provided by the dpkg-dev package) that parses the /etc/dpkg/origins/default file (provided by the base-files package) to know the current distribution and its ancestry. It can be used in debian/rules to adjust the behavior depending on the current distribution. You can check its man-page to learn about the various options supported but we’re only going to use --derives-from <vendor> in this sample. With this option the script exits with zero if the current distribution is or derives from the indicated distribution, or with 1 otherwise.

Now combining all together, we can use dpkg-vendor to dynamically define the content of the substitution variable in debian/rules. Let’s suppose that you want ${dist:Depends} to be “foo (>= 2)” on Ubuntu (and its derivatives) and “bar” everywhere else. Using debhelper’s 7 tiny rules file, this example could be:

ifeq ($(shell dpkg-vendor --derives-from Ubuntu && echo yes),yes)
	SUBSTVARS = -Vdist:Depends="foo (>= 2)"
else
	SUBSTVARS = -Vdist:Depends="bar"
endif

%:
	dh $@

override_dh_gencontrol:
	dh_gencontrol -- $(SUBSTVARS)

If you use CDBS, it could be this:

include /usr/share/cdbs/1/rules/debhelper.mk
ifeq ($(shell dpkg-vendor --derives-from Ubuntu && echo yes),yes)
	DEB_DH_GENCONTROL_ARGS_ALL = -- -Vdist:Depends="foo (>= 2)"
else
	DEB_DH_GENCONTROL_ARGS_ALL = -- -Vdist:Depends="bar"
endif

Do you want to read more tutorials like this one? Click here to subscribe to my free newsletter, you can opt to receive future articles by email.

How to create Debian packages with alternative compression methods

While gzip is the standard Unix tool when it comes to compression, there are other tools available and some of them are performing better than gzip in terms of compression ratio. This article will explain where you can make use of them in your Debian packaging work.

In the source package

A source package is composed of multiple files. The .dsc file is always uncompressed and it’s fine since it’s a small textual file. The upstream tarballs can be compressed with gzip (orig.tar.gz), bzip2 (orig.tar.bz2), lzma (orig.tar.lzma) or xz (orig.tar.xz), so choose the one that you want if upstream provides the tarball compressed with multiple tools. Put it at the right place and dpkg-source will automatically use it. Note however that packages using source format “1.0” are restricted to gzip, and the main Debian archive currently only allows gzip and bzip2 (xz might be allowed later) even if the source format “3.0 (quilt)” supports all of them.

The debian packaging files are provided either in a .diff.gz file for source format “1.0” (again only gzip is supported) or in a .debian.tar file for source format “3.0 (quilt)”. The latter tarball can be compressed with the tool of your choice, you just have to tell dpkg-source which one to use (see below, note that gzip is the default).

In a native package, dpkg-source must generate the main tarball and you can instruct it to use another tool than gzip with the --compression option. That option is usually put in debian/source/options:

# Use bzip2 instead of gzip
compression = "bzip2"
compression-level = 9

For “3.0 (quilt)” source packages, this option is not very useful as the debian tarball that gets compressed is usually not very large. But some maintainers like to use the same compression tool for the upstream tarball and the debian tarball, so you can use this option to harmonize both.

In native packages, it’s much more interesting: for instance the size of dpkg’s source package has been reduced of 30% by switching to bzip2, saving 2Mb of disk.

In the binary packages

.deb files also contain compressed tar archives and by default they use gzip as well:

$ ar t dpkg_1.15.9_i386.deb 
debian-binary
control.tar.gz
data.tar.gz

data.tar.gz is the archive that contains all the files to be installed and it’s the one that you can compress with another tool if you want. Again this is mostly interesting for (very) large packages where the size difference clearly justifies deviating from the default compression tool. Try it out and see how many megabytes you can shove. Another aspect that you must keep in mind is that those alternative tools might use important amount of memory to do their job, both for compression and decompression. So if your package is meant to be installed on embedded platforms, or if you want to build your package on low-end hardware with few memory, you might want to stick with gzip.

Now how do you change the compression tool? Easy, dpkg-deb supports a -Z option, so you just have to pass “-Zbzip2″ for example. You can also pass “-z6″ for example to change the compression level to 6 (it’s interesting because a lower compression level might require less memory depending on the tool used). The dpkg-deb invocation is typically hidden behind the call to dh_builddeb in your debian/rules so you have to replace that invocation with “dh_builddeb -- -Zbzip2“.

If you are using a debhelper 7 tiny rules files, you have to add an override like in this example:

%:
	dh $@

override_dh_builddeb:
	dh_builddeb -- -Zbzip2

If you are using CDBS, you have to set the variable DEB_DH_BUILDDEB_ARGS:

include /usr/share/cdbs/1/rules/debhelper.mk
[...]
DEB_DH_BUILDDEB_ARGS = -- -Zbzip2

I hope you found this article helpful. Follow me on identi.ca or on twitter.

How to customize dpkg-source’s behaviour in your Debian source package

dpkg-source is the program that generates the Debian source package when a new package version is built. It offers many interesting command-line options but they are often not used because people don’t know how to ensure that they are used every time the package is built. Let’s fill that gap!

It is possible to forward some options to dpkg-source by typing them on the dpkg-buildpackage command line but you’d have to remember to type them every time. You could create a shell alias to avoid typing them but then you can’t have different options for different packages. Not very practical.

The proper solution has been implemented last year (in dpkg 1.15.5). It is now possible to put options in debian/source/options. Any long option (those starting with “--“) can be put in that file, one option per line with the leading “--” stripped.

Here’s an example:

# Bzip2 compression for debian.tar
compression = "bzip2"
compression-level = 7
# Do not generate diff for changes in config.(sub|guess)
extend-diff-ignore = "(^|/)config.(sub|guess)$"

Notice that spaces around the equal sign are possible contrary on the command line. You can use quotes around the value but it’s not required.

The debian/source/options file is part of the source package so if someone else grabs the resulting source package and rebuilds everything, they will use the options that you defined in that file.

You can also use debian/source/local-options but this time the file will not be included in the resulting source package. This is interesting for options that you want to use when you build from the VCS (Version Control Repository, aka git/svn/bzr/etc.) but that people downloading the resulting source package should not have. Some options (like --unapply-patches) are only allowed in that file to ensure a consistent experience for users of source packages.

You can learn more about the existing options in the dpkg-source manual page. Read it, I’m sure you’ll learn something. Did you know that you can tell dpkg-source to abort if you have upstream changes not managed by an existing patch in debian/patches? It’s --abort-on-upstream-changes and it’s only allowed in debian/source/local-options.

Be sure to subscribe to the RSS feed or to the email newsletter to not miss useful documentation for Debian contributors!

How to use multiple upstream tarballs in Debian source packages?

Since the introduction of the “3.0 (quilt)” source format, it is now possible to integrate multiple upstream tarballs in Debian source packages. This article will show you how to do the same with your own package shall you need it. It’s quite useful to easily integrate supplementary plugins, translations, or documentation that the upstream developers are providing in separate tarballs.

Step by step explanation

We’ll take the spamassassin source package as an example. The upstream version is 3.3.1. The main upstream tarball is named as usual (spamassassin_3.3.1.orig.tar.gz) and contains the top directory of our source package. We already have a debian directory because the package is not new.

Upstream provides spamassassin rules in a separate tarball named Mail-SpamAssassin-rules-3.3.1.r901671.tgz. We grab it, rename it to spamassassin_3.3.1.orig-pkgrules.tar.gz and put it next to the main tarball. The “pkgrules” part is the component name that we choose to identify the tarball, it’s also the name of the directory in which it will be extracted inside the source package. For now that directory doesn’t exist yet so we must create it.

$ mv Mail-SpamAssassin-rules-3.3.1.r901671.tgz spamassassin_3.3.1.orig-pkgrules.tar.gz
$ cd spamassassin-3.3.1
$ mkdir pkgrules
$ tar -C pkgrules -zxf ../spamassassin_3.3.1.orig-pkgrules.tar.gz

This is already enough, the next time that you will build the source package, the supplementary tarball will be automatically integrated in the generated source package.

$ dpkg-buildpackage -S
[...]
 dpkg-source -b spamassassin-3.3.1
dpkg-source: info: using source format `3.0 (quilt)'
dpkg-source: info: building spamassassin using existing ./spamassassin_3.3.1.orig-pkgrules.tar.gz ./spamassassin_3.3.1.orig.tar.gz
dpkg-source: info: building spamassassin in spamassassin_3.3.1-1.debian.tar.gz
dpkg-source: info: building spamassassin in spamassassin_3.3.1-1.dsc

The supplementary tarball is now part of the source package but we’re not making anything useful out of it. We have to modify debian/rules (or debian/spamassin.install) to install the new files in the binary package.

A special case: bundling related software

In very rare cases, you might want to create a bundle of several software (small perl modules for example) and you don’t have any main tarball, you only have several small tarballs. Rename all the tarballs following the same logic as above and when building the source package you can ask dpkg-source to create an empty (and fake) main archive for you with the option --create-empty-orig:

$ dpkg-buildpackage -S --source-option=--create-empty-orig

Use with care as the version number you give to the bundle is what users will see and it’s likely unrelated to the version number of each individual software.

Common mistakes

Forgetting to extract the supplementary tarball

If you forget to extract the content of the supplementary tarball in the pkgrules directory, dpkg-source will emit lots of warnings about those files being deleted. In fact, you did not delete them but you only forgot to create them in the first place.

dpkg-source: warning: ignoring deletion of directory pkgrules
dpkg-source: warning: ignoring deletion of file pkgrules/20_fake_helo_tests.cf
dpkg-source: warning: ignoring deletion of file pkgrules/60_shortcircuit.cf
[...]

Using a bad version number for the supplementary tarball

Sometimes the supplementary tarball has a version of its own that does not match the upstream version. You must still name the file in a way that matches the upstream version of the main tarball otherwise it will not be picked up by dpkg-source and it will generate a new patch in debian/patches/ containing the whole new directory.

It’s possible to encode the version number of the supplementary tarball in the component name (in our example above we could have picked “pkgrules-r901671″ as component name) but this means that the name of the associated directory will regularly change and you must adapt your packaging rules to cope with this.

However this last trick has the benefit of being able to update the additional tarball without bumping the upstream version. A sourceful upload of a new revision of the package will be accepted by the archive: the main tarball is ignored since it’s unchanged but the supplementary tarball is taken since it’s a new file for the archive (it has a different filename).

Be sure to move away old versions of the additional tarball when you do that if you don’t want to upload several versions of the same tarball by mistake!

Misextracting the supplementary tarball

dpkg-source is very smart when it extracts the supplementary tarball and you should be as well when you manually extract it.

If the tarball contains only a single top-level directory, that directory is extracted, renamed to match the component name and moved in the source package directory.
If the tarball contains several top-level files or directories, then the target directory is first created and the content of the archive is directly extracted into that directory.

Here are commands to install the files in both cases (we’re already in the source package directory):

$ mkdir pkgrules
# Archive contains a single top-level directory
$ tar -C pkgrules --strip-components=1 -zxf ../spamassassin_3.3.1.orig-pkgrules.tar.gz
# Archive contains many top-level entries
$ tar -C pkgrules -zxf ../spamassassin_3.3.1.orig-pkgrules.tar.gz