pattern matching
Roy Smith
roy at panix.com
Wed Feb 23 23:07:30 EST 2011
In article <mailman.364.1298517901.1189.python-list at python.org>,
Chris Rebert <clp2 at rebertia.com> wrote:
> regex = compile("(\d\d)/(\d\d)/(\d{4})")
I would probably write that as either
r"(\d{2})/(\d{2})/(\d{4})"
or (somewhat less likely)
r"(\d\d)/(\d\d)/(\d\d\d\d)"
Keeping to one consistent style makes it a little easier to read. Also,
don't forget the leading `r` to get raw strings. I've long since given
up trying to remember the exact rules of what needs to get escaped and
what doesn't. If it's a regex, I just automatically make it a raw
string.
Also, don't overlook the re.VERBOSE flag. With it, you can write
positively outrageous expressions which are still quite readable. With
it, you could write this regex as:
r" (\d{2}) / (\d{2}) / (\d{4}) "
which takes up only slightly more space, but makes it a whole lot easier
to scan by eye.
I'm still going to stand by my previous statement, however. If you're
trying to parse HTML, use an HTML parser. Using a regex like this is
perfectly fine for parsing the CDATA text inside the HTML <td> element,
but pattern matching the HTML markup itself is madness.
More information about the Python-list
mailing list