Code that ought to run fast, but can't due to Python limitations.
Benjamin Kaplan
benjamin.kaplan at case.edu
Sat Jul 4 14:03:52 EDT 2009
On Sat, Jul 4, 2009 at 1:33 PM, John Nagle<nagle at animats.com> wrote:
> As an example of code that really needs to run fast, but is
> speed-limited by Python's limitations, see "tokenizer.py" in
>
> http://code.google.com/p/html5lib/
>
> This is a parser for HTML 5, a piece of code that will be needed
> in many places and will process large amounts of data. It's written
> entirely in Python. Take a look at how much work has to be performed
> per character.
>
> This is a good test for Python implementation bottlenecks. Run
> that tokenizer on HTML, and see where the time goes.
>
> ("It should be written in C" is not an acceptable answer.)
>
> Python doesn't have a "switch" or "case" statement, and when
> you need a state machine with many states, that makes for painful,
> slow code. There's a comment in the code that it would be useful
> to run a few billion lines of HTML through an instrumented version
> of the parser to decide in which order the IF statements should be
> executed. You shouldn't have to do that.
>
If you're cases are hashable, just use a dict instead of the if chain.
Then you get the constant time access to it.
def func_a() :
...
def func_b():
...
def func_c():
....
case = {"a":func_a, "b":func_b, "c":func_c}
case[value]()
> Yes, I've read PEP 3103. The big problem is the difficulty of figuring
> out what's a constant and what might change. If all the cases are
> constants,
> case statements are easy. But in Python, the compiler can't tell.
>
> Parsers have many named compile-time constants. Python doesn't support
> named compile-time constants, and this is one of the places where we
> have to pay the bill for that limitation.
>
> Something to think about when you need three more racks of servers
> because the HTML parser is slow.
>
> John Nagle
> --
> http://mail.python.org/mailman/listinfo/python-list
>
More information about the Python-list
mailing list