Code that ought to run fast, but can't due to Python limitations.
Stefan Behnel
stefan_ml at behnel.de
Sun Jul 5 04:58:27 EDT 2009
John Nagle wrote:
> Here's some actual code, from "tokenizer.py". This is called once
> for each character in an HTML document, when in "data" state (outside
> a tag). It's straightforward code, but look at all those
> dictionary lookups.
>
> def dataState(self):
> data = self.stream.char()
>
> # Keep a charbuffer to handle the escapeFlag
> if self.contentModelFlag in\
> (contentModelFlags["CDATA"], contentModelFlags["RCDATA"]):
Is the tuple
(contentModelFlags["CDATA"], contentModelFlags["RCDATA"])
constant? If that is the case, I'd cut it out into a class member (or
module-local variable) first thing in the morning. And I'd definitely keep
the result of the "in" test in a local variable for reuse, seeing how many
times it's used in the rest of the code.
Writing inefficient code is not something to blame the language for.
Stefan
More information about the Python-list
mailing list