You’re probably manipulating Debian packages everyday, but do you know what those files are? This article will show you their bowels… Surely they are more than file archives otherwise we would just use TAR archives (you know those files ending with .tar.gz). Let’s have a look!
1. It’s two TAR file archives in an AR file archive!
A .deb file is actually an archive using the AR format, you can manipulate it with the ar command. This archive contains 3 files, you can check it yourself, download any .deb file and run “ar t” on it:
$ ar t gwibber_2.31.91-1_all.deb debian-binary control.tar.gz data.tar.gz
debian-binary
is a text file indicating the version of the format of the .deb file, the current version is “2.0”.
$ ar p gwibber_2.31.91-1_all.deb debian-binary 2.0
data.tar.gz
contains the real files of the package, the content of that archive gets installed in your root directory when you run “dpkg --unpack
“.
But the most interesting part—which truly makes .deb files more than a file archive—is the last file. control.tar.gz
contains meta-information used by the package manager. What are they?
$ ar p gwibber_2.31.91-1_all.deb control.tar.gz | tar tzf - ./ ./postinst ./prerm ./preinst ./postrm ./conffiles ./md5sums ./control
2. It contains meta-information defining the package and its relationships
The control
file within the control.tar.gz
archive is the most fundamental file. It contains basic information about the package like its name, its version, its description, the architecture it runs on, who is maintaining it and so on. It also contains dependency fields so that the package manager can ensure that everything needed by the package is installed before-hand. If you want to learn more about those fields, you can check Binary control files in the Debian Policy.
Those information end up in /var/lib/dpkg/status
once the package is installed.
3. It contains maintainer scripts so that everything can just work out of the box
At various steps of the installation/upgrade/removal process, dpkg is executing the maintainer scripts provided by the package:
postinst
: after installationpreinst
: before installationpostrm
: after removalprerm
: before removal
Note that this description is largely simplified. In fact the scripts are executed on many other occasions with different parameters. There’s an entire chapter of the Debian Policy dedicated to this topic. But you might find this wiki page easier to grasp: http://wiki.debian.org/MaintainerScripts.
While this looks scary, it’s a very important feature. It’s required to cope with non-backwards compatible upgrades, to provide automatic configuration, to create system users on the fly, etc.
4. Configuration files are special files
Unpacking a file archive overwrites the previous version of the files. This is the desired behavior when you upgrade a package, except for configuration files. You prefer not to loose your customizations, don’t you?
That’s why packages can list configuration files in the conffiles
file provided by control.tar.gz
. That way dpkg will deal with them in a special way.
5. You can always add new meta-information
And in fact many tools already exploit the possibility to provide supplementary files in control.tar.gz
:
- debsums use the
md5sums
file to ensure no files were accidentally modified - dpkg-shlibdeps uses
shlibs
andsymbols
files to generate dependencies on libraries - debconf uses
config
scripts to collect configuration information from the user
Once installed, those files are kept by dpkg in /var/lib/dpkg/info/package.*
along with maintainer scripts.
If you want to read more articles like this one, click here to subscribe to my free newsletter. You can also follow me on Identi.ca, Twitter and Facebook.
marc says
Maybe this is a dumb question, but why is .deb using ‘ar’ for the main file and ‘tar’ for embedded archives ? Obviously, there must be a good reason, but I guess it 🙂
Raphaël Hertzog says
I don’t know for sure but I guess it’s because ar is more lightweight than tar. We don’t really need permissions/owners/etc. on the debian-binary, control.tar.gz and data.tar.gz files.
mario says
From what I gathered the difference is that the “tar” format is binary and uses block sizes of >1024 bytes, while “ar” has a less portable plain text format header of just 59 bytes. This way the outer shell of a .deb archive is smaller and doesn’t need to be compressed.
Gerfried Fuchs says
There is one thing to “manipulating” the .deb file with ar, though: Please be aware that the GNU version of ar does produce a .deb package that isn’t compatible with the archive tools. The GNU ar tools adds a / after the filename inside the archive, which dpkg-build doesn’t.
So actually inspecting would be proper – manipulating it will get you in troubles. 😉
Guillem Jover says
So even if dpkg-deb does not add trailing / to generated .debs it has supported ar archives with them since 1999 (commit id 16c0f50ed6826cd064510101c60ba98a582759dd). I’m not sure what you refer to with “archive tools”, but anything not supporting trailing slashes while not using dpkg-deb is buggy IMO. The current .deb format supported by dpkg is documented in detail in deb(5). The only problematic case is dpkg-split which didn’t support trailing / until recently, fixed since 1.15.6 (commit id e5c584abd37b59ba4d7cda44f7bad7c98dbd075b).
So although not encouraged, being able to handle (including creating) .deb archives with generic Unix tools is something that should be supported, and it was one of the reasons for the format being designed that way back then.
Tony Palma says
I have a question, dpkg-shlibdeps uses shlibs and symbols files for a single package or uses shlibs files or symbols files for a single package?
Raphaël Hertzog says
Tony, I’m not sure that I understand your question. dpkg-shlibdeps prefers symbols files over shlibs files when there’s one. But then it can use both while generating the dependencies for a given binary since it might use multiple libraries: for example one with only shlibs, and one with symbols+shlibs.
Morgan says
I wish the debian build system was as useful as Arch Linux’s PGKBUILD.
For example, If I want a later version of a package in Arch I just change the version number in the PGKBUILD file, get the new md5 info and build package.
There is no way that will work in the debian build system – you cannot get a later version by just spending a minute or 2 …..
I also hate the way that you can’t define your CFLAGS system wide so that every package you make uses them (often in deb packages the CFLAGS are built into the debian rules/control script)
Raphaël Hertzog says
Morgan, in many cases, you can easily get the next upstream version with only a few commands:
toots says
Debian packages a pre-compiled binary packages, ARCH packages are built from source on your system, like macports or gentoo.
This is an essential distinction and I don’t think you can really compare them the way you do. Each of them has its advantages and issues.
For instance, with ARCH linux, how do you deal with new libraries’ ABI/API? Say you bump the version and m5sums in PKGBAR and rebuild but it turns out that the binary has different symbols..
This is just one example. Not to argue in favor of one or another, again, just to show the differences 🙂
morgan says
Actually Arch uses pre-compiled binary files also – unlike Gentoo.. So the maintainers will we recompiling packages for new libraries – this does occasionally break things (although I have less issues in Arch overall compared to other distros)
You ‘can’ easily compile a version (your CFLAGS/newer version, etc) if you wish – adding patches is so easy…
One bad thing about Arch is they tend to ignore old versions of libraries (although its so easy to make a package with them…)
I have even become a maintainer of a AUR package (latest Nvidia driver for latest realtime kernel) (http://aur.archlinux.org/packages.php?ID=12132)
– I would not have a clue how do to the same in Debian – in Arch its just so easy.
trapDoor says
Do want that folder-with-gears icon, best in svg format! Thank you so much in advance.
Raphaël Hertzog says
Trapdoor, it’s not a free picture unfortunately, I bought it on istockphoto.com.
trapDoor says
That sucks 🙁
psi says
The gears seem to be deadlocked (larger nearer gear covers it a little, but looks like those three in the next plane touch).
trapDoor says
I would try to fix that if I had it in vectors 🙂 Assuming that each gear is a separate object and that the invisible parts (covered by other objects) are not actually missing.
Limaunion says
Hi! thanks for your article! I’ve been using .deb packages for a long time but never knew about this kind of details. Do you know if there’re plans in order to improve the package managers like aptitude? what improvements from a user perspective can we expect? Thanks.
Raphaël Hertzog says
There are always improvements planned but it’s difficult to know what will come next. The bug trackers of apt/aptitude are full of wishlist entries. But the best way to know is to ask the developers… I might do this with an official interview. Subscribe to this blog and you’ll have your answers at some point. 🙂
Willian says
Please continue with this kind of article. It’s pretty easy to understand!
Maybe, someday, i will become a Debian Developer. I’m so excited. 🙂
Thank you very very very much!
Raphaël Hertzog says
Thank you for your feedback, I will definitely continue. Including posts that are relevant for people who want to start contributing to Debian. I hope you will pursue your goal of becoming a regular contributor. 🙂
Zhichang Yu says
I have a question. I notice that multiple runs of dpkg-buildpackage generate different .deb packages (different MD5 digests). Do you know why?
Raphaël Hertzog says
Zhichang, because the contents differ. 🙂
Tar archives embed timestamps of the files. And many files are generated during the build (all those than end up in the control.tar.gz for example) so the at least the timestamps differ. And even the gzip compression layer embeds a timestamp by default (see the -n option in the manual page).