5 reasons why a Debian package is more than a simple file archive

Folder with gearsYou’re probably manipulating Debian packages everyday, but do you know what those files are? This article will show you their bowels… Surely they are more than file archives otherwise we would just use TAR archives (you know those files ending with .tar.gz). Let’s have a look!

1. It’s two TAR file archives in an AR file archive!

A .deb file is actually an archive using the AR format, you can manipulate it with the ar command. This archive contains 3 files, you can check it yourself, download any .deb file and run “ar t” on it:

$ ar t gwibber_2.31.91-1_all.deb
debian-binary
control.tar.gz
data.tar.gz

debian-binary is a text file indicating the version of the format of the .deb file, the current version is “2.0″.

$ ar p gwibber_2.31.91-1_all.deb debian-binary
2.0

data.tar.gz contains the real files of the package, the content of that archive gets installed in your root directory when you run “dpkg --unpack“.

But the most interesting part—which truly makes .deb files more than a file archive—is the last file. control.tar.gz contains meta-information used by the package manager. What are they?

$ ar p gwibber_2.31.91-1_all.deb control.tar.gz | tar tzf -
./
./postinst
./prerm
./preinst
./postrm
./conffiles
./md5sums
./control

2. It contains meta-information defining the package and its relationships

The control file within the control.tar.gz archive is the most fundamental file. It contains basic information about the package like its name, its version, its description, the architecture it runs on, who is maintaining it and so on. It also contains dependency fields so that the package manager can ensure that everything needed by the package is installed before-hand. If you want to learn more about those fields, you can check Binary control files in the Debian Policy.

Those information end up in /var/lib/dpkg/status once the package is installed.

3. It contains maintainer scripts so that everything can just work out of the box

At various steps of the installation/upgrade/removal process, dpkg is executing the maintainer scripts provided by the package:

  • postinst: after installation
  • preinst: before installation
  • postrm: after removal
  • prerm: before removal

Note that this description is largely simplified. In fact the scripts are executed on many other occasions with different parameters. There’s an entire chapter of the Debian Policy dedicated to this topic. But you might find this wiki page easier to grasp: http://wiki.debian.org/MaintainerScripts.

While this looks scary, it’s a very important feature. It’s required to cope with non-backwards compatible upgrades, to provide automatic configuration, to create system users on the fly, etc.

4. Configuration files are special files

Unpacking a file archive overwrites the previous version of the files. This is the desired behavior when you upgrade a package, except for configuration files. You prefer not to loose your customizations, don’t you?

That’s why packages can list configuration files in the conffiles file provided by control.tar.gz. That way dpkg will deal with them in a special way.

5. You can always add new meta-information

And in fact many tools already exploit the possibility to provide supplementary files in control.tar.gz:

  • debsums use the md5sums file to ensure no files were accidentally modified
  • dpkg-shlibdeps uses shlibs and symbols files to generate dependencies on libraries
  • debconf uses config scripts to collect configuration information from the user

Once installed, those files are kept by dpkg in /var/lib/dpkg/info/package.* along with maintainer scripts.

If you want to read more articles like this one, click here to subscribe to my free newsletter. You can also follow me on Identi.ca, Twitter and Facebook.