Newbie needs regex help
Dan M
dan at catfolks.net
Mon Dec 6 10:03:48 EST 2010
I'm getting bogged down with backslash escaping.
I have some text files containing characters with the 8th bit set. These
characters are encoded one of two ways: either "=hh" or "\xhh", where "h"
represents a hex digit, and "\x" is a literal backslash followed by a
lower-case x.
Catching the first case with a regex is simple. But when I try to write a
regex to catch the second case, I mess up the escaping.
I took at look at http://docs.python.org/howto/regex.html, especially the
section titled "The Backslash Plague". I started out trying :
dan at dan:~/personal/usenet$ python
Python 2.7 (r27:82500, Nov 15 2010, 12:10:23)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> r = re.compile('\\\\x([0-9a-fA-F]{2})')
>>> a = "This \xef file \xef has \x20 a bunch \xa0 of \xb0 crap \xc0
characters \xefn \xeft."
>>> m = r.search(a)
>>> m
No match.
I then followed the advice of the above-mentioned document, and expressed
the regex as a raw string:
>>> r = re.compile(r'\\x([0-9a-fA-F]{2})')
>>> r.search(a)
Still no match.
I'm obviously missing something. I spent a fair bit of time playing with
this over the weekend, and I got nowhere. Now it's time to ask for help.
What am I doing wrong here?
More information about the Python-list
mailing list