Regex for String Literals

Tim Peters tim.one at comcast.net
Mon Sep 2 17:46:56 EDT 2002


[Stefan Franke]
> Does someone know a regular expression that matches all
> kinds of Python string literals (along with their  finer points
> WRT line breaks, unicode..)?

tokenize.py (in the std library) strives to match the Python compiler's
tokenization exactly.  You'll find a suitable collection of hairy regexps
there, but, if you can, find a way to *use* tokenize.py directly.  Using the
generator interface this is less mind-bending than it used to be (you can
iterate over a token stream instead of fighting with stateful callback
functions).





More information about the Python-list mailing list