Another day playing with PyPy. First up was a pleasant surprise in that the 2x slowdown reduced with the currently nightly build, bringing performance up from 40,000cps to ~60,000cps.
After that, applied a micro-optimization that got me about a 6% speedup; I eliminated the use of a state "struct object" (object that just used regular attribute access and no methods) in favour of passing all of the state as explicit arguments and returning the state modification(s) to the caller. Not a huge win, but what it did do is make it possible for cfbolz to point out that, with the modifications, there was an unneeded re-raising of an exception in one of the most heavily used methods.
Eliminating just that trivial operation caused performance to triple (from ~60,000cps to a somewhat respectable 180,000cps). Baseline for the naive rewrite in cPython was 82,000cps, so we're suddenly seeing a real performance improvement.
Second fix cfbolz proposed was a bit more... evil... basically replace a for-loop with a recursive call, which caused performance to jump to 265,000cps on pypy, but caused it to drop noticeably on cpython. A little bit of code to test for pypy before using the pypy hack eliminates that issue.
End of the day, the code-base is markedly faster on pypy, but has also been optimized (slightly) for cpython as well. cpython parses at 110,000cps, pypy at 265,000cps on the test file. That puts us at ~1/12th the speed of the optimized C for pypy and ~1/30th for cpython.
There are still lots of things I want to explore, but I don't have the time to work on them today, so I suppose that will be next week...
 at the time I'd thought it had been entirely reduced, but that was because I ran the cPython test in the wrong window (duh!)