Code that ought to run fast, but can't due to Python limitations.
Nick Craig-Wood
nick at craig-wood.com
Sun Jul 5 03:30:04 EDT 2009
John Nagle <nagle at animats.com> wrote:
> As an example of code that really needs to run fast, but is
> speed-limited by Python's limitations, see "tokenizer.py" in
>
> http://code.google.com/p/html5lib/
>
> This is a parser for HTML 5, a piece of code that will be needed
> in many places and will process large amounts of data. It's written
> entirely in Python. Take a look at how much work has to be performed
> per character.
>
> This is a good test for Python implementation bottlenecks. Run
> that tokenizer on HTML, and see where the time goes.
>
> ("It should be written in C" is not an acceptable answer.)
You could compile it with Cython though. lxml took this route...
--
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick
More information about the Python-list
mailing list