Forgot how big SimpleParse is...


At this point I have almost all of SimpleParse converted to pure-python operation. It is still using the Python call-stack for its state storage, just for convenience in coding. That makes it approximately equivalent to the version using "stock" mxTextTools wrt blowing up on recursive grammars (the TextTools fork SimpleParse bundles doesn't blow up on such grammars, but I can easily convert to non-stack operation in the future). That said, the Python rewrite is much simpler code. There's about 3200 lines of code in the pure-python version (excluding tests and examples), versus about 13000 lines of code in the trunk (~9000 of that is in C).

Of course, it's also ridiculously slow. The mxTextTools version (trunk) parses a (VRML) test file at 3.2 million bytes/second, the pure-python version running on CPython 2.6 is currently running 82 thousand bytes/second on the same test file, or ~40x slower. The C version is using Boyer-Moore searching, while the Python version is currently just doing brute force searches... likely should fix that before I start trying to pit them against each other. Even so, CPython is not well suited to these kind of low-level algorithms, choosing SimpleParse as the test case allows me to see how well the type annotation/JIT stuff can optimize (heavily OO) algorithmic code with an eye to writing heavy-lifting libraries in pure Python.

Anyway, seems my experimenting day has sped away, so I suppose I shall have to wait until next week to start experimenting with PyPy (which was the point of this whole exercise).

Comments

Comments are closed.

Pingbacks

Pingbacks are closed.