Switched back to the web-"spidering" application yesterday, a few more hours of work today on it. This is really just a web-checking engine that goes out and sees if any of a few tens of thousands of pages have changed in the last day or so and reports the results.
Nothing really complex there, but it's the first application I've created by re-using components from the application framework in Cinemon. The framework allows for using PyTable with Twisted by running database queries in a background thread. That's where just about all of the headaches have come so far.
First problem was that a Queue.Queue instance was somehow being created twice. Tracking that down took something like 10 hours, mostly because I'm not accustomed to working with other developers, so my normal strategy of debugging "in-head" wasn't picking up the change Tim had made that karked it up.
Basically, Tim had used a relative import to import the application module, which wound up creating two copies of the module. Duplicate loading of modules is always bad (e.g. isinstance( instance, cls ) will fail), but it's particularly bad when the module being duplicated is one providing a singleton object.
There's a PEP (Python Enhancement Proposal) that will make these problems go away. With that pep, instead of writing:
To get the module inside the package (somepackage.scannerapplication) from another module in the package (somepackage.somemodule) you would use:
or, of course, use the (currently valid, and more explicit):
from somepackage import scannerapplication
which is what all of my code uses.
The second problem was because of a rather, ahem, stupid "feature" of PostgreSQL, where it was keeping locks open after a SELECT that never committed that were blocking another thread (after a few hundred queries). Sure, I should have committed/rollbacked the transaction, but SELECT should not have been blocking an insert. Oh well, blah.
Pingbacks are closed.