Code that ought to run fast, but can't due to Python limitations.

Sun Jul 5 04:58:27 EDT 2009

John Nagle wrote:
>    Here's some actual code, from "tokenizer.py".  This is called once
> for each character in an HTML document, when in "data" state (outside
> a tag).  It's straightforward code, but look at all those
> dictionary lookups.
> 
>     def dataState(self):
>         data = self.stream.char()
> 
>         # Keep a charbuffer to handle the escapeFlag
>         if self.contentModelFlag in\
>           (contentModelFlags["CDATA"], contentModelFlags["RCDATA"]):

Is the tuple

	(contentModelFlags["CDATA"], contentModelFlags["RCDATA"])

constant? If that is the case, I'd cut it out into a class member (or
module-local variable) first thing in the morning. And I'd definitely keep
the result of the "in" test in a local variable for reuse, seeing how many
times it's used in the rest of the code.

Writing inefficient code is not something to blame the language for.

Stefan