Code that ought to run fast, but can't due to Python limitations.

Nick Craig-Wood nick at craig-wood.com
Sun Jul 5 03:30:04 EDT 2009


John Nagle <nagle at animats.com> wrote:
>      As an example of code that really needs to run fast, but is
>  speed-limited by Python's limitations, see "tokenizer.py" in
> 
>  	http://code.google.com/p/html5lib/
> 
>  This is a parser for HTML 5, a piece of code that will be needed
>  in many places and will process large amounts of data. It's written
>  entirely in Python.  Take a look at how much work has to be performed
>  per character.
> 
>  This is a good test for Python implementation bottlenecks.  Run
>  that tokenizer on HTML, and see where the time goes.
> 
>  ("It should be written in C" is not an acceptable answer.)

You could compile it with Cython though.  lxml took this route...

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list