Capturing and replaying from easy_install

Have some time before a client meeting this afternoon... so here's the issue I've been turning over in my mind since it was brought up at the last PyGTA.  We've run into the same issue at multiple clients and while the "right" solution would likely be to make easy_install/distutils record the information, I've spent way too much of my life inside the distutils code-base already (we used to have custom distutils code in PyOpenGL).

So here's the problem/use-case:

You run easy_install to install TurboGears (or any other huge multi-package system) into a virtualenv, it downloads 45 (or so) eggs.  You run setup.py on your TG application... another 4 (or so) eggs.  You hand the package to someone else, they run the setup... and it doesn't work, some dependency changed version ever so slightly, you now get to track down which one and fix it.

Buildout, PoachEggs and the like *seem* to want you to manually specify all 49 packages (with the idea that you do it once and then everyone shares the joy).  Problem is, I'm a lazy cuss.  I may have issues with easy_install, but it's the closest thing we've got to a package manager for Python eggs at the moment, so how do I just ask easy_install to do "exactly the same install" as it just did on some other machine?

If you're me, you hack up a dumb script that processes the output of easy_install looking for "Downloading " lines.  You make the script download those files into a directory (and record an index file so it knows the order), you then add an option to the same script to replay the install with extremely restrictive easy_install options (i.e. no hosts, no dependency resolution, etceteras).

Obviously this falls into the "expediant hack" rather than "properly engineered" pile, but it does look like it would work.  It generates an editable file you can tweak, just as you might tweak a "recipe" base system's configuration, but it automates the initial generation of that recipe.

The project is on LaunchPad, as is the code (bzr branch lp:recordeggs) and the downloads are in PyPI.

To use it:

easy_install -i http://www.turbogears.org/2.0/downloads/2.0.1/index tg.devtools 2>&1 | recordeggs.py ../egg-sources
recordeggs -p ../egg-sources

The "redirect stderr to stdout" stuff is one of those things I always need to look up, so maybe I should just run easy_install as a captive process, but for now this seems to work.

[Update] usage is now:

recordeggs ../egg-sources [easy_install options/args]
recordeggs -p ../egg-sources

which feels a lot more natural, but I haven't yet got the pipes aligned so you could see the easy_install output as it processes (at the moment it kicks all the output out at the end of the run).

Comments

  1. Jeffrey Gelens

    Jeffrey Gelens on 06/30/2009 3:36 p.m. #

    Check PIP: http://pip.openplans.org/ I think this suits your needs better ;-)

  2. Kevin Teague

    Kevin Teague on 06/30/2009 7:49 p.m. #

    Try pip's freeze command:

    $ pip freeze stable-req.txt

    Also note that Buildout doesn't want you to have to manually specify a list of packages. Buildout wants to do what you want it to do :P

    That is to say, you can have it:

    1) Resolve the specific versions, picking the newest available at install time (unless a version range overrides this in a setup.py file)

    2) Manually pin down just a package or two as necessary, allowing the other packages to be auto-picked.

    3) Supply a complete lists of versions, typically these represent the packages that make up the release of a complete web framework or application.

    If you've chosen the easy path and just installed a top-level package and allowed Buildout to resolve the versions of the dependant packages at install time, you can use a recipe like 'buildout.dumppickedversions' to automatically extract the versions a given run of Buildout has picked.

    TurboGears is using another approach, where each set of specific versions of packages gets it's own PyPI index created that you can point easy_install at in a repeatable manner (http://www.turbogears.org/2.0/downloads/2.0.1/index).

    I think Pip is the best tool for the "lazy cuss" use case, as Buildout requires more investment in time to learn how to use. But Buildout let's you specify repeatable installation of non-Python parts, so config-files for web frameworks, database set-up, and other bits that might compose the full working stack of a sandbox needed to run your application can be managed.

  3. Mike C. Fletcher

    Mike C. Fletcher on 07/02/2009 9:54 a.m. #

    pip -- yup, basically it *looks* like it should do the job. But for some reason it's picking up all installed packages on my system... that's scores of packages including wx, pygame, pyopengl, openglcontext, experimental window managers, etceteras... way more work to cut down than to hand-write the list. The python archive command might work more straightforwardly, but after half an hour debugging what turned out to be a download failure for a package I just left that on the "some day" pile. Quite likely the error with the freeze stuff is between the keyboard and the chair, but I don't want to spend forever on this... (see lazy cuss qualification above).

    BuildOut looks good, but frankly, again, more work than I want to do for this trivial task. The whole recordeggs script is 130 lines, with 30 of those being documentation. The code that runs on the final machines is dead-simple. It immediately creates a local directory with the packages to be used for the installs.

    The TurboGears one-index-per-release stuff is fine, but it doesn't provide local copies that can be checked into VCS/backup so that you can automatically recreate your environment without needing external downloads.

Comments are closed.

Pingbacks

Pingbacks are closed.