Code that ought to run fast, but can't due to Python limitations.
Ben Finney
ben+python at benfinney.id.au
Sat Jul 4 22:09:12 EDT 2009
John Nagle <nagle at animats.com> writes:
> A dictionary lookup (actually, several of them) for every input
> character is rather expensive. Tokenizers usually index into a table
> of character classes, then use the character class index in a switch
> statement.
>
> This is an issue that comes up whenever you have to parse some
> formal structure, from XML/HTML to Pickle to JPEG images to program
> source.
> […]
> The temptation is to write tokenizers in C, but that's an admission
> of language design failure.
This sounds like a job for <URL:http://pyparsing.wikispaces.com/>
Pyparsing.
--
\ “Better not take a dog on the space shuttle, because if he |
`\ sticks his head out when you're coming home his face might burn |
_o__) up.” —Jack Handey |
Ben Finney
More information about the Python-list
mailing list