Last week we learned how to identify and restore packages whose files have been corrupted. This time we’ll concentrate ourselves on the non-packaged files…
Non-packaged files
They are files which are not provided by a Debian package, or in other words, files where dpkg --search
finds no associated package:
$ dpkg --search /srv/cvs dpkg-query: no path found matching pattern /srv/cvs
You always have such files on your system, at least all your own files in /home. But many daemons also create files as part of their work (and they are usually stored in /var): internal files for a database server, mail spool for a mail server, etc. Those are normal and you want to leave them alone.
But you might have non-packaged files in /usr and that should not be the case if you install everything from packages. It would thus be useful to be able to list those files in order to detect a software that has been manually installed.
Manually installed software is not a good idea
Such an installation might cause troubles for example by taking precedence over the same software provided in a Debian package. Over time the local installation will not be upgraded while the packaged one will.
The other packages which depend on this software will believe they have the latest version since their dependency is satisfied but in fact they are using the older version since it takes precedence.
So you want to get rid of those? Let’s see how we can find them.
Use cruft to identify non-packaged files
As I explained above, there are many non-packaged files that are legitimate and that you don’t want to remove. That’s why cruft does something more elaborated than a scan of the filesystem and a check of dpkg’s database.
It provides a way for packages to say which files they might legitimately create during run-time and that cruft should not report. And it knows of many such files. But it’s far from exhaustive and definitely not up-to-date.
So you should always take its output with suspicion and consider twice where the file came from. Do not trust it blindly to remove the files… you have been warned.
How to use cruft
You should give it a list of directories to ignore to reduce the noise in the output, for example like this:
$ sudo cruft -d / -r report --ignore /home --ignore /var --ignore /tmp $ less report cruft report: mercredi 23 février 2011, 15:45:34 (UTC+0100) ---- missing: ALTERNATIVES ---- /etc/alternatives/cli-csc.1.gz /usr/share/man/man1/cli-csc.1.gz ---- missing: dpkg ---- /etc/xdg/autostart/gnome-power-manager.desktop /usr/lib/libpython2.6_d.so.1.0-gdb.py /usr/share/fonts/X11/100dpi /usr/share/fonts/X11/75dpi ---- unexplained: / ---- /boot /dev /etc/.java /etc/.java/.systemPrefs [...] /usr/lib/pymodules/python2.6 /usr/lib/pymodules/python2.6/.path /usr/lib/pymodules/python2.6/Brlapi-0.5.5.egg-info [...]
Note that it doesn’t traverse filesystems so if your /usr is on another partition than /, you will need to use the option -d "/ /usr"
to have it scan both.
Analyze the report
Now you can quietly go through the report that has been generated and decide which files need to be removed or not. The report also contains missing files (files which should exist according to the dpkg database but which are not there) but the bulk of the listing will be in the “unexplained” section: files which are not part of any package (and whose presence is not explained by any other explain script that packages can ship).
Again take this with great suspicion, and you should rather not delete a file if you don’t know it got there in the first place. For instance, on my system it lists many files below /usr/lib/pymodules/ and those are legitimate: they come from Debian packages but they are copied there dynamically from /usr/{lib,share}/pyshared in order to support multiple python versions. If you remove those files, you effectively break your system.
You will also find many .pyc files created by python packages, they are a byte-compiled version of the corresponding .py file. Removing them breaks nothing but you loose a bit of performance.
On the opposite, most of the files below /usr/local/ are likely the result of some manual software installation and those should be safe to remove (if you know that you are not using the corresponding software).
Conclusion: useful but needs work
In summary, you can use cruft to identify non-packaged files and maybe learn a bit more about what got manually installed on the system, but it requires some patience to go through the report as many of the files reported are false positives.
Yes, cruft badly needs supplementary volunteers to cope with the many ways packages legitimately generate non-packaged files. It’s not even complicated work: the package is mostly in shell and in Perl, and /usr/share/doc/cruft/README.gz explains how it all works.
Do you want to read more tutorials like this one? Click here to subscribe to my free newsletter, you can opt to receive future articles by email.
syko says
Thank you so much for this series.
I was doing great with my Ubuntu 10.10 until this Step 5. Maybe I didn’t exclude enough (I used your input ex.) but my “unexplained” section was huge and this is a pretty fresh install. Oh well, back to researching all that output.
p.s. Thanks also for offering the usable Debian DVD. Mine is on its way. 😉
Kevin Benko says
Here’s an issue that might be of some concern:
I have some packages that I build from source via subversion/git, the most common ones being the E-17 (enlightenment) windowing manager and WINE. Now, I know that these packages are available through the Debian repositories, but the most recent versions are only available via building them myself.
So, I’ve got some libraries hanging around that apt doesn’t even know about.
Fortunately, there’s the “make uninstall” command, but some other projects out there might not be so kind as to include an uninstall option in their sources, so we’ll need some generalized method that is, hopefully, more efficient than source-diving through the makefile to clean up after this type of packages.