apt-get install debian-wizard

Insider infos, master your Debian/Ubuntu distribution

  • About
    • About this blog
    • About me
    • My free software history
  • Support my work
  • Get the newsletter
  • More stuff
    • Support Debian Contributors
    • Other sites
      • My company
      • French Blog about Free Software
      • Personal Website (French)
  • Mastering Debian
  • Contributing 101
  • Packaging Tutorials
You are here: Home / Documentation / 5 reasons why a Debian package is more than a simple file archive

5 reasons why a Debian package is more than a simple file archive

November 8, 2010 by Raphaël Hertzog

Folder with gearsYou’re probably manipulating Debian packages everyday, but do you know what those files are? This article will show you their bowels… Surely they are more than file archives otherwise we would just use TAR archives (you know those files ending with .tar.gz). Let’s have a look!

1. It’s two TAR file archives in an AR file archive!

A .deb file is actually an archive using the AR format, you can manipulate it with the ar command. This archive contains 3 files, you can check it yourself, download any .deb file and run “ar t” on it:

$ ar t gwibber_2.31.91-1_all.deb
debian-binary
control.tar.gz
data.tar.gz

debian-binary is a text file indicating the version of the format of the .deb file, the current version is “2.0”.

$ ar p gwibber_2.31.91-1_all.deb debian-binary
2.0

data.tar.gz contains the real files of the package, the content of that archive gets installed in your root directory when you run “dpkg --unpack“.

But the most interesting part—which truly makes .deb files more than a file archive—is the last file. control.tar.gz contains meta-information used by the package manager. What are they?

$ ar p gwibber_2.31.91-1_all.deb control.tar.gz | tar tzf -
./
./postinst
./prerm
./preinst
./postrm
./conffiles
./md5sums
./control

2. It contains meta-information defining the package and its relationships

The control file within the control.tar.gz archive is the most fundamental file. It contains basic information about the package like its name, its version, its description, the architecture it runs on, who is maintaining it and so on. It also contains dependency fields so that the package manager can ensure that everything needed by the package is installed before-hand. If you want to learn more about those fields, you can check Binary control files in the Debian Policy.

Those information end up in /var/lib/dpkg/status once the package is installed.

3. It contains maintainer scripts so that everything can just work out of the box

At various steps of the installation/upgrade/removal process, dpkg is executing the maintainer scripts provided by the package:

  • postinst: after installation
  • preinst: before installation
  • postrm: after removal
  • prerm: before removal

Note that this description is largely simplified. In fact the scripts are executed on many other occasions with different parameters. There’s an entire chapter of the Debian Policy dedicated to this topic. But you might find this wiki page easier to grasp: http://wiki.debian.org/MaintainerScripts.

While this looks scary, it’s a very important feature. It’s required to cope with non-backwards compatible upgrades, to provide automatic configuration, to create system users on the fly, etc.

4. Configuration files are special files

Unpacking a file archive overwrites the previous version of the files. This is the desired behavior when you upgrade a package, except for configuration files. You prefer not to loose your customizations, don’t you?

That’s why packages can list configuration files in the conffiles file provided by control.tar.gz. That way dpkg will deal with them in a special way.

5. You can always add new meta-information

And in fact many tools already exploit the possibility to provide supplementary files in control.tar.gz:

  • debsums use the md5sums file to ensure no files were accidentally modified
  • dpkg-shlibdeps uses shlibs and symbols files to generate dependencies on libraries
  • debconf uses config scripts to collect configuration information from the user

Once installed, those files are kept by dpkg in /var/lib/dpkg/info/package.* along with maintainer scripts.

If you want to read more articles like this one, click here to subscribe to my free newsletter. You can also follow me on Identi.ca, Twitter and Facebook.

Filed Under: Documentation, User Documentation Tagged With: ar, conffile, control, deb, Debian, maintainer scripts, Reference, Ubuntu

Comments

  1. marc says

    November 8, 2010 at 11:11 am

    Maybe this is a dumb question, but why is .deb using ‘ar’ for the main file and ‘tar’ for embedded archives ? Obviously, there must be a good reason, but I guess it 🙂

    • Raphaël Hertzog says

      November 8, 2010 at 3:40 pm

      I don’t know for sure but I guess it’s because ar is more lightweight than tar. We don’t really need permissions/owners/etc. on the debian-binary, control.tar.gz and data.tar.gz files.

      • mario says

        November 8, 2010 at 8:09 pm

        From what I gathered the difference is that the “tar” format is binary and uses block sizes of >1024 bytes, while “ar” has a less portable plain text format header of just 59 bytes. This way the outer shell of a .deb archive is smaller and doesn’t need to be compressed.

  2. Gerfried Fuchs says

    November 8, 2010 at 11:54 am

    There is one thing to “manipulating” the .deb file with ar, though: Please be aware that the GNU version of ar does produce a .deb package that isn’t compatible with the archive tools. The GNU ar tools adds a / after the filename inside the archive, which dpkg-build doesn’t.

    So actually inspecting would be proper – manipulating it will get you in troubles. 😉

    • Guillem Jover says

      November 16, 2010 at 9:40 pm

      So even if dpkg-deb does not add trailing / to generated .debs it has supported ar archives with them since 1999 (commit id 16c0f50ed6826cd064510101c60ba98a582759dd). I’m not sure what you refer to with “archive tools”, but anything not supporting trailing slashes while not using dpkg-deb is buggy IMO. The current .deb format supported by dpkg is documented in detail in deb(5). The only problematic case is dpkg-split which didn’t support trailing / until recently, fixed since 1.15.6 (commit id e5c584abd37b59ba4d7cda44f7bad7c98dbd075b).

      So although not encouraged, being able to handle (including creating) .deb archives with generic Unix tools is something that should be supported, and it was one of the reasons for the format being designed that way back then.

  3. Tony Palma says

    November 8, 2010 at 1:51 pm

    I have a question, dpkg-shlibdeps uses shlibs and symbols files for a single package or uses shlibs files or symbols files for a single package?

    • Raphaël Hertzog says

      November 8, 2010 at 3:42 pm

      Tony, I’m not sure that I understand your question. dpkg-shlibdeps prefers symbols files over shlibs files when there’s one. But then it can use both while generating the dependencies for a given binary since it might use multiple libraries: for example one with only shlibs, and one with symbols+shlibs.

  4. Morgan says

    November 8, 2010 at 1:57 pm

    I wish the debian build system was as useful as Arch Linux’s PGKBUILD.

    For example, If I want a later version of a package in Arch I just change the version number in the PGKBUILD file, get the new md5 info and build package.

    There is no way that will work in the debian build system – you cannot get a later version by just spending a minute or 2 …..

    I also hate the way that you can’t define your CFLAGS system wide so that every package you make uses them (often in deb packages the CFLAGS are built into the debian rules/control script)

    • Raphaël Hertzog says

      November 8, 2010 at 3:47 pm

      Morgan, in many cases, you can easily get the next upstream version with only a few commands:

      $ wget ... -O foo_2.0.orig.tar.gz
      $ tar zxf foo_2.0.orig.tar.gz
      $ cp -a foo-1.0/debian foo-2.0/
      $ cd foo-2.0 && dch -v 2.0-1 "New upstream release"
      $ debuild
      
    • toots says

      November 11, 2010 at 10:05 pm

      Debian packages a pre-compiled binary packages, ARCH packages are built from source on your system, like macports or gentoo.

      This is an essential distinction and I don’t think you can really compare them the way you do. Each of them has its advantages and issues.

      For instance, with ARCH linux, how do you deal with new libraries’ ABI/API? Say you bump the version and m5sums in PKGBAR and rebuild but it turns out that the binary has different symbols..

      This is just one example. Not to argue in favor of one or another, again, just to show the differences 🙂

      • morgan says

        November 12, 2010 at 11:21 am

        Actually Arch uses pre-compiled binary files also – unlike Gentoo.. So the maintainers will we recompiling packages for new libraries – this does occasionally break things (although I have less issues in Arch overall compared to other distros)

        You ‘can’ easily compile a version (your CFLAGS/newer version, etc) if you wish – adding patches is so easy…

        One bad thing about Arch is they tend to ignore old versions of libraries (although its so easy to make a package with them…)

        I have even become a maintainer of a AUR package (latest Nvidia driver for latest realtime kernel) (http://aur.archlinux.org/packages.php?ID=12132)
        – I would not have a clue how do to the same in Debian – in Arch its just so easy.

  5. trapDoor says

    November 8, 2010 at 2:27 pm

    Do want that folder-with-gears icon, best in svg format! Thank you so much in advance.

    • Raphaël Hertzog says

      November 8, 2010 at 3:43 pm

      Trapdoor, it’s not a free picture unfortunately, I bought it on istockphoto.com.

      • trapDoor says

        November 8, 2010 at 3:50 pm

        That sucks 🙁

      • psi says

        November 8, 2010 at 5:24 pm

        The gears seem to be deadlocked (larger nearer gear covers it a little, but looks like those three in the next plane touch).

        • trapDoor says

          November 8, 2010 at 7:13 pm

          I would try to fix that if I had it in vectors 🙂 Assuming that each gear is a separate object and that the invisible parts (covered by other objects) are not actually missing.

  6. Limaunion says

    November 8, 2010 at 7:45 pm

    Hi! thanks for your article! I’ve been using .deb packages for a long time but never knew about this kind of details. Do you know if there’re plans in order to improve the package managers like aptitude? what improvements from a user perspective can we expect? Thanks.

    • Raphaël Hertzog says

      November 8, 2010 at 10:03 pm

      There are always improvements planned but it’s difficult to know what will come next. The bug trackers of apt/aptitude are full of wishlist entries. But the best way to know is to ask the developers… I might do this with an official interview. Subscribe to this blog and you’ll have your answers at some point. 🙂

  7. Willian says

    November 9, 2010 at 12:14 pm

    Please continue with this kind of article. It’s pretty easy to understand!
    Maybe, someday, i will become a Debian Developer. I’m so excited. 🙂
    Thank you very very very much!

    • Raphaël Hertzog says

      November 9, 2010 at 12:21 pm

      Thank you for your feedback, I will definitely continue. Including posts that are relevant for people who want to start contributing to Debian. I hope you will pursue your goal of becoming a regular contributor. 🙂

  8. Zhichang Yu says

    November 19, 2010 at 8:59 am

    I have a question. I notice that multiple runs of dpkg-buildpackage generate different .deb packages (different MD5 digests). Do you know why?

    • Raphaël Hertzog says

      November 19, 2010 at 10:04 am

      Zhichang, because the contents differ. 🙂

      Tar archives embed timestamps of the files. And many files are generated during the build (all those than end up in the control.tar.gz for example) so the at least the timestamps differ. And even the gzip compression layer embeds a timestamp by default (see the -n option in the manual page).

Trackbacks

  1. 5 reasons why a Debian package is more than a simple file archive | Debian-News.net - Your one stop for news about Debian says:
    November 8, 2010 at 8:07 pm

    […] You’re probably manipulating Debian packages everyday, but do you know what those files are? This article will show you their bowels… Surely they are more than file archives otherwise we would just use TAR archives (you know those files ending with .tar.gz). Let’s have a look! More here […]

Get the Debian Handbook

Available as paperback and as ebook.
Book cover

Email newsletter

Get updates and exclusive content by email, join the Debian Supporters Guild:

Follow me

  • Email
  • Facebook
  • GitHub
  • RSS
  • Twitter

Discover my French books

Planets

  • Planet Debian

Archives

I write software, books and documentation. I'm a Debian developer since 1998 and run my own company. I want to share my passion and knowledge of the Debian ecosystem. Read More…

Tags

3.0 (quilt) Activity summary APT aptitude Blog Book Cleanup conffile Contributing CUT d-i Debconf Debian Debian France Debian Handbook Debian Live Distro Tracker dpkg dpkg-source Flattr Flattr FOSS Freexian Funding Git GNOME GSOC HOWTO Interview LTS Me Multiarch nautilus-dropbox News Packaging pkg-security Programming PTS publican python-django Reference release rolling synaptic Ubuntu WordPress

Recent Posts

  • Freexian is looking to expand its team with more Debian contributors
  • Freexian’s report about Debian Long Term Support, July 2022
  • Freexian’s report about Debian Long Term Support, June 2022
  • Freexian’s report about Debian Long Term Support, May 2022
  • Freexian’s report about Debian Long Term Support, April 2022

Copyright © 2005-2021 Raphaël Hertzog