Code that ought to run fast, but can't due to Python limitations.

Sat Jul 4 22:09:12 EDT 2009

John Nagle <nagle at animats.com> writes:

>    A dictionary lookup (actually, several of them) for every input
> character is rather expensive. Tokenizers usually index into a table
> of character classes, then use the character class index in a switch
> statement.
> 
>    This is an issue that comes up whenever you have to parse some
> formal structure, from XML/HTML to Pickle to JPEG images to program
> source.
> […] 
>    The temptation is to write tokenizers in C, but that's an admission
> of language design failure.

This sounds like a job for <URL:http://pyparsing.wikispaces.com/>
Pyparsing.

-- 
 \          “Better not take a dog on the space shuttle, because if he |
  `\   sticks his head out when you're coming home his face might burn |
_o__)                                                up.” —Jack Handey |
Ben Finney