Debian Cleanup Tip #5: identify cruft that can be removed from your Debian system

Last week we learned how to identify and restore packages whose files have been corrupted. This time we’ll concentrate ourselves on the non-packaged files…

Non-packaged files

They are files which are not provided by a Debian package, or in other words, files where dpkg --search finds no associated package:

$ dpkg --search /srv/cvs
dpkg-query: no path found matching pattern /srv/cvs

You always have such files on your system, at least all your own files in /home. But many daemons also create files as part of their work (and they are usually stored in /var): internal files for a database server, mail spool for a mail server, etc. Those are normal and you want to leave them alone.

But you might have non-packaged files in /usr and that should not be the case if you install everything from packages. It would thus be useful to be able to list those files in order to detect a software that has been manually installed.

Manually installed software is not a good idea

Such an installation might cause troubles for example by taking precedence over the same software provided in a Debian package. Over time the local installation will not be upgraded while the packaged one will.

The other packages which depend on this software will believe they have the latest version since their dependency is satisfied but in fact they are using the older version since it takes precedence.

So you want to get rid of those? Let’s see how we can find them.

Use cruft to identify non-packaged files

As I explained above, there are many non-packaged files that are legitimate and that you don’t want to remove. That’s why cruft does something more elaborated than a scan of the filesystem and a check of dpkg’s database.

It provides a way for packages to say which files they might legitimately create during run-time and that cruft should not report. And it knows of many such files. But it’s far from exhaustive and definitely not up-to-date.

So you should always take its output with suspicion and consider twice where the file came from. Do not trust it blindly to remove the files… you have been warned.

How to use cruft

You should give it a list of directories to ignore to reduce the noise in the output, for example like this:

$ sudo cruft -d / -r report --ignore /home --ignore /var --ignore /tmp
$ less report
cruft report: mercredi 23 février 2011, 15:45:34 (UTC+0100)

---- missing: ALTERNATIVES ----
        /etc/alternatives/cli-csc.1.gz
        /usr/share/man/man1/cli-csc.1.gz
---- missing: dpkg ----
        /etc/xdg/autostart/gnome-power-manager.desktop
        /usr/lib/libpython2.6_d.so.1.0-gdb.py
        /usr/share/fonts/X11/100dpi
        /usr/share/fonts/X11/75dpi
---- unexplained: / ----
        /boot
        /dev
        /etc/.java
        /etc/.java/.systemPrefs
[...]
        /usr/lib/pymodules/python2.6
        /usr/lib/pymodules/python2.6/.path
        /usr/lib/pymodules/python2.6/Brlapi-0.5.5.egg-info
[...]

Note that it doesn’t traverse filesystems so if your /usr is on another partition than /, you will need to use the option -d "/ /usr" to have it scan both.

Analyze the report

Now you can quietly go through the report that has been generated and decide which files need to be removed or not. The report also contains missing files (files which should exist according to the dpkg database but which are not there) but the bulk of the listing will be in the “unexplained” section: files which are not part of any package (and whose presence is not explained by any other explain script that packages can ship).

Again take this with great suspicion, and you should rather not delete a file if you don’t know it got there in the first place. For instance, on my system it lists many files below /usr/lib/pymodules/ and those are legitimate: they come from Debian packages but they are copied there dynamically from /usr/{lib,share}/pyshared in order to support multiple python versions. If you remove those files, you effectively break your system.

You will also find many .pyc files created by python packages, they are a byte-compiled version of the corresponding .py file. Removing them breaks nothing but you loose a bit of performance.

On the opposite, most of the files below /usr/local/ are likely the result of some manual software installation and those should be safe to remove (if you know that you are not using the corresponding software).

Conclusion: useful but needs work

In summary, you can use cruft to identify non-packaged files and maybe learn a bit more about what got manually installed on the system, but it requires some patience to go through the report as many of the files reported are false positives.

Yes, cruft badly needs supplementary volunteers to cope with the many ways packages legitimately generate non-packaged files. It’s not even complicated work: the package is mostly in shell and in Perl, and /usr/share/doc/cruft/README.gz explains how it all works.

Do you want to read more tutorials like this one? Click here to subscribe to my free newsletter, you can opt to receive future articles by email.

February 2011 wrap up

February has been again a busy month for me. Here’s a quick summary of what I did:

Multi-Arch work

I have spent many days implementing and refining dpkg’s Multi-Arch support with Guillem Jover (dpkg co-maintainer) and Steve Langasek (beta-tester of my code ;-)). Early testers can try what’s in my latest pu/multiarch/snapshot/* branch in my personal git repository.

A Debian DVD shop

I’m always exploring new options to fund my Debian work (besides direct donations) and this month—with the Debian Squeeze release—I saw an opportunity in selling Debian DVD. Nobody provides DVD with included firmwares and quite a few people would like to avoid the SpaceFun theme. So I built unofficial Debian DVDs that integrate firmware and that install a system with the old theme (MoreBlue Orbit). Click here to learn more about my unofficial DVDs.

On my blog

In my “People behind Debian” series, I interviewed Mike Hommey (Iceweasel maintainer) and Maximiliam Attems (member of the kernel team).

I started a “Debian Cleanup Tip” series and already published 4 installments:

For contributors, I wrote two articles: the first gives a set of (suggested) best practices for sponsoring Debian packages and adapted my article as a patch for the Developers Reference. In the second article, I shared some personal advice for people who are considering participating on Debian mailing list: 7 mistakes to avoid when participating to Debian mailing lists.

Click here to subscribe to my free newsletter and get my monthly analysis on what’s going on in Debian and Ubuntu. Or just follow along via the RSS feed, Identi.ca, Twitter or Facebook.

7 mistakes to avoid when participating to Debian mailing lists

You’re eager to start contributing to Debian, your first action is to subscribe to some high-profile mailing lists (like debian-devel and debian-project) to get a feel of the community. You read the mails for a few days and then you find out that you could participate to the discussions, it’s a simple first step after all. True enough.

That said, it’s not as easy as it looks like. There are many mistakes that you should avoid:

  1. Don’t fall in the trap where your mailing list participation is your sole contribution to Debian. If you want people to give credit to your messages, you should already be doing something else for Debian.
  2. Don’t participate more than once a day to a given thread. There are many people subscribed, you should leave room for other people to express their point of view. You can always follow up one day after and reply to several messages at once if you believe you still have something new to add to the discussion.
  3. Don’t reply to off-topic threads. Someone asked a simple question and someone else pointed out that his message was off-topic. Don’t reply, or if you really need to, do it on the correct list or with a private response.
  4. Don’t ask questions unless it’s useful to bring the discussion forward. Development lists are not here to fill the gaps in your knowledge. We already have debian-mentors for this. Furthermore there’s no better way to learn than to find yourself the answers to your questions. 🙂
  5. Don’t believe your opinion is so important. We’re all very opinionated and discussions that consist only of contradicting opinions tend to go nowhere. Thus don’t give your opinion unless you can back it up with new facts or another experience.
  6. Don’t participate to all threads. There are surely some topics where you are more knowledgeable than others, participate where you add the most value and leave the others threads to the other experts (and learn by reading them).
  7. Don’t hide your identity. In Debian we like to know each other. Use your real name and not some anonymous nickname. You need to be able to stand up behind your words, otherwise you’re not credible.

I have myself been guilty of several of those when I started… I invite you to follow my recommendations to ensure our mailing lists remain pleasant to read and an effective discussion place.

You should follow me on Identi.ca, Twitter and Facebook. Or subscribe to this blog by RSS or by email.

Discover my Debian DVD shop

After a private launch (with discounted prices) for my newsletter subscribers, it’s now time to open my Debian DVD shop to the public.

I did not want to become yet another DVD reseller, so my DVDs are different and better. Here’s why you want to get one (or more):

  1. it’s easier to install Debian with my DVDs since they provide all the (non-free) firmwares that have been stripped and that you’re supposed to provide on a USB key;
  2. the installed system features the former theme (MoreBlue Orbit) and not SpaceFun (although you can reactivate SpaceFun easily if you prefer it);
  3. 100% of the benefits are reinvested into Debian (90% to fund my Debian work, 10% given back to Debian to fund work meetings)
  4. they are provided in a beautiful DVD case and despite this they are not expensive (between $3.49 and $5.49)

Click here to learn more about my DVD offer.

PS: Click here and join my newsletter to not miss other opportunities.