Code that ought to run fast, but can't due to Python limitations.

Stefan Behnel stefan_ml at behnel.de
Sun Jul 5 04:58:27 EDT 2009


John Nagle wrote:
>    Here's some actual code, from "tokenizer.py".  This is called once
> for each character in an HTML document, when in "data" state (outside
> a tag).  It's straightforward code, but look at all those
> dictionary lookups.
> 
>     def dataState(self):
>         data = self.stream.char()
> 
>         # Keep a charbuffer to handle the escapeFlag
>         if self.contentModelFlag in\
>           (contentModelFlags["CDATA"], contentModelFlags["RCDATA"]):

Is the tuple

	(contentModelFlags["CDATA"], contentModelFlags["RCDATA"])

constant? If that is the case, I'd cut it out into a class member (or
module-local variable) first thing in the morning. And I'd definitely keep
the result of the "in" test in a local variable for reuse, seeing how many
times it's used in the rest of the code.

Writing inefficient code is not something to blame the language for.

Stefan



More information about the Python-list mailing list