Toying yet more with compressing rdiff-backup repositories with hard-linking. Turns out there's already someone who's done it. My script for hard-link finding is different than the (packaged) one he's using, but it seems to have the same basic idea (I use filecmp, though, rather than reading the files myself, and I don't check owner/group/mode differences, as they seem to be managed externally by rdiff-backup).
Of course, having done all that and actually run the compression... it's disappointing. I see a solid handful of MBs disappear, but since I'm excluding the source-code directories (bzr/svn/cvs checkouts) already, the savings is rather uninspiring. There are hundreds of MBs of duplicated files in the check-outs (4 or 5 checkouts of the same set of dozens of images, for instance), but the virtualenv packages don't seem to add up to much duplication (39MBs for the OpenGL and one client's work-spaces). Basically, while the virtualenvs are hundreds of MBs, most of that space, it turns out is actually in the custom sources, not the dependencies.
Playing around still further, I played with excluding Firefox cache files, for instance... and discovered that the "url classifier" files are twice the size... maybe I need to exclude them too... or maybe I need to just go to the dratted computer store and buy a 2TB drive for the server and stop wasting precious hours on optimizing away a few MBs of storage-space. This kind of saving would likely only be useful if you were doing full-system imaging of very similar machines (or something like that), and I don't have any need for that these days.
Pingbacks are closed.