5 reasons why a Debian package is more than a simple file archive

Folder with gearsYou’re probably manipulating Debian packages everyday, but do you know what those files are? This article will show you their bowels… Surely they are more than file archives otherwise we would just use TAR archives (you know those files ending with .tar.gz). Let’s have a look!

1. It’s two TAR file archives in an AR file archive!

A .deb file is actually an archive using the AR format, you can manipulate it with the ar command. This archive contains 3 files, you can check it yourself, download any .deb file and run “ar t” on it:

$ ar t gwibber_2.31.91-1_all.deb

debian-binary is a text file indicating the version of the format of the .deb file, the current version is “2.0”.

$ ar p gwibber_2.31.91-1_all.deb debian-binary

data.tar.gz contains the real files of the package, the content of that archive gets installed in your root directory when you run “dpkg --unpack“.

But the most interesting part—which truly makes .deb files more than a file archive—is the last file. control.tar.gz contains meta-information used by the package manager. What are they?

$ ar p gwibber_2.31.91-1_all.deb control.tar.gz | tar tzf -

2. It contains meta-information defining the package and its relationships

The control file within the control.tar.gz archive is the most fundamental file. It contains basic information about the package like its name, its version, its description, the architecture it runs on, who is maintaining it and so on. It also contains dependency fields so that the package manager can ensure that everything needed by the package is installed before-hand. If you want to learn more about those fields, you can check Binary control files in the Debian Policy.

Those information end up in /var/lib/dpkg/status once the package is installed.

3. It contains maintainer scripts so that everything can just work out of the box

At various steps of the installation/upgrade/removal process, dpkg is executing the maintainer scripts provided by the package:

  • postinst: after installation
  • preinst: before installation
  • postrm: after removal
  • prerm: before removal

Note that this description is largely simplified. In fact the scripts are executed on many other occasions with different parameters. There’s an entire chapter of the Debian Policy dedicated to this topic. But you might find this wiki page easier to grasp: http://wiki.debian.org/MaintainerScripts.

While this looks scary, it’s a very important feature. It’s required to cope with non-backwards compatible upgrades, to provide automatic configuration, to create system users on the fly, etc.

4. Configuration files are special files

Unpacking a file archive overwrites the previous version of the files. This is the desired behavior when you upgrade a package, except for configuration files. You prefer not to loose your customizations, don’t you?

That’s why packages can list configuration files in the conffiles file provided by control.tar.gz. That way dpkg will deal with them in a special way.

5. You can always add new meta-information

And in fact many tools already exploit the possibility to provide supplementary files in control.tar.gz:

  • debsums use the md5sums file to ensure no files were accidentally modified
  • dpkg-shlibdeps uses shlibs and symbols files to generate dependencies on libraries
  • debconf uses config scripts to collect configuration information from the user

Once installed, those files are kept by dpkg in /var/lib/dpkg/info/package.* along with maintainer scripts.

If you want to read more articles like this one, click here to subscribe to my free newsletter. You can also follow me on Identi.ca, Twitter and Facebook.


  1. marc says

    Maybe this is a dumb question, but why is .deb using ‘ar’ for the main file and ‘tar’ for embedded archives ? Obviously, there must be a good reason, but I guess it 🙂

    • says

      I don’t know for sure but I guess it’s because ar is more lightweight than tar. We don’t really need permissions/owners/etc. on the debian-binary, control.tar.gz and data.tar.gz files.

      • says

        From what I gathered the difference is that the “tar” format is binary and uses block sizes of >1024 bytes, while “ar” has a less portable plain text format header of just 59 bytes. This way the outer shell of a .deb archive is smaller and doesn’t need to be compressed.

  2. says

    There is one thing to “manipulating” the .deb file with ar, though: Please be aware that the GNU version of ar does produce a .deb package that isn’t compatible with the archive tools. The GNU ar tools adds a / after the filename inside the archive, which dpkg-build doesn’t.

    So actually inspecting would be proper – manipulating it will get you in troubles. 😉

    • says

      So even if dpkg-deb does not add trailing / to generated .debs it has supported ar archives with them since 1999 (commit id 16c0f50ed6826cd064510101c60ba98a582759dd). I’m not sure what you refer to with “archive tools”, but anything not supporting trailing slashes while not using dpkg-deb is buggy IMO. The current .deb format supported by dpkg is documented in detail in deb(5). The only problematic case is dpkg-split which didn’t support trailing / until recently, fixed since 1.15.6 (commit id e5c584abd37b59ba4d7cda44f7bad7c98dbd075b).

      So although not encouraged, being able to handle (including creating) .deb archives with generic Unix tools is something that should be supported, and it was one of the reasons for the format being designed that way back then.

  3. Tony Palma says

    I have a question, dpkg-shlibdeps uses shlibs and symbols files for a single package or uses shlibs files or symbols files for a single package?

    • says

      Tony, I’m not sure that I understand your question. dpkg-shlibdeps prefers symbols files over shlibs files when there’s one. But then it can use both while generating the dependencies for a given binary since it might use multiple libraries: for example one with only shlibs, and one with symbols+shlibs.

  4. Morgan says

    I wish the debian build system was as useful as Arch Linux’s PGKBUILD.

    For example, If I want a later version of a package in Arch I just change the version number in the PGKBUILD file, get the new md5 info and build package.

    There is no way that will work in the debian build system – you cannot get a later version by just spending a minute or 2 …..

    I also hate the way that you can’t define your CFLAGS system wide so that every package you make uses them (often in deb packages the CFLAGS are built into the debian rules/control script)

    • says

      Morgan, in many cases, you can easily get the next upstream version with only a few commands:

      $ wget ... -O foo_2.0.orig.tar.gz
      $ tar zxf foo_2.0.orig.tar.gz
      $ cp -a foo-1.0/debian foo-2.0/
      $ cd foo-2.0 && dch -v 2.0-1 "New upstream release"
      $ debuild
    • toots says

      Debian packages a pre-compiled binary packages, ARCH packages are built from source on your system, like macports or gentoo.

      This is an essential distinction and I don’t think you can really compare them the way you do. Each of them has its advantages and issues.

      For instance, with ARCH linux, how do you deal with new libraries’ ABI/API? Say you bump the version and m5sums in PKGBAR and rebuild but it turns out that the binary has different symbols..

      This is just one example. Not to argue in favor of one or another, again, just to show the differences 🙂

      • morgan says

        Actually Arch uses pre-compiled binary files also – unlike Gentoo.. So the maintainers will we recompiling packages for new libraries – this does occasionally break things (although I have less issues in Arch overall compared to other distros)

        You ‘can’ easily compile a version (your CFLAGS/newer version, etc) if you wish – adding patches is so easy…

        One bad thing about Arch is they tend to ignore old versions of libraries (although its so easy to make a package with them…)

        I have even become a maintainer of a AUR package (latest Nvidia driver for latest realtime kernel) (http://aur.archlinux.org/packages.php?ID=12132)
        – I would not have a clue how do to the same in Debian – in Arch its just so easy.

  5. Limaunion says

    Hi! thanks for your article! I’ve been using .deb packages for a long time but never knew about this kind of details. Do you know if there’re plans in order to improve the package managers like aptitude? what improvements from a user perspective can we expect? Thanks.

    • says

      There are always improvements planned but it’s difficult to know what will come next. The bug trackers of apt/aptitude are full of wishlist entries. But the best way to know is to ask the developers… I might do this with an official interview. Subscribe to this blog and you’ll have your answers at some point. 🙂

  6. says

    Please continue with this kind of article. It’s pretty easy to understand!
    Maybe, someday, i will become a Debian Developer. I’m so excited. 🙂
    Thank you very very very much!

    • says

      Thank you for your feedback, I will definitely continue. Including posts that are relevant for people who want to start contributing to Debian. I hope you will pursue your goal of becoming a regular contributor. 🙂

  7. Zhichang Yu says

    I have a question. I notice that multiple runs of dpkg-buildpackage generate different .deb packages (different MD5 digests). Do you know why?

    • says

      Zhichang, because the contents differ. 🙂

      Tar archives embed timestamps of the files. And many files are generated during the build (all those than end up in the control.tar.gz for example) so the at least the timestamps differ. And even the gzip compression layer embeds a timestamp by default (see the -n option in the manual page).