sjmachin at lexicon.net
Sun Jan 25 01:04:23 CET 2009
On Jan 25, 5:59 am, Scott David Daniels <Scott.Dani... at Acm.Org> wrote:
> Sean Brown wrote:
> > I have the following string ...: "td[ct] = [[ ... ]];\r\n"
> > The ... (representing text in the string) is what I'm extracting ....
> > So I think the regex \[\[(.*)\]\]; should do it.
> > The problem is it appears that python is escaping the \ in the regex
> > because I see this:
> >>>> reg = '\[\[(.*)\]\];'
> >>>> reg
> > '\\[\\[(.*)\\]\\];'
> > Now to me looks like it would match the string - \[\[ ... \]\];
> > ...
> OK, you already have a good answer as to what is happening.
> I'll mention that raw strings were put in the language exactly for
> regex work. They are useful for any time you need to use the backslash
> character (\) within a string (but not as the final character).
> For example:
> len(r'\a\b\c\d\e\f\g\h') == 16 and len('\a\b\c\d\e\f\g\h') == 13
> If you get in the habit of typing regex strings as r'...' or r"...",
> and examining the patters with print(somestring), you'll ease your life.
All excellent suggestions, but I'm surprised that nobody has mentioned
the re.VERBOSE format.
This flag allows you to write regular expressions that look nicer.
Whitespace within the pattern is ignored, except when in a character
class or preceded by an unescaped backslash, and, when a line contains
a '#' neither in a character class or preceded by an unescaped
backslash, all characters from the leftmost such '#' through the end
of the line are ignored.
That means that the two following regular expression objects that
match a decimal number are functionally equal:
a = re.compile(r"""\d + # the integral part
\. # the decimal point
\d * # some fractional digits""", re.X)
b = re.compile(r"\d+\.\d*")
(1)"looks nicer" is not the point; it's understandability
(2) if you need a space, use a character class ->[ ]<- not an
unescaped backslash ->\ <-
(3) the indentation in the manual doesn't fit my idea of "looks
nicer"; I'd do
a = re.compile(r"""
\d + # the integral part
\. # the decimal point
\d * # some fractional digits
(4) you can aid understandability by more indentation especially when
you have multiple capturing expressions and (?......) gizmoids e.g.
..... # prefix
(?......) # look-back assertion
(?....) # etc etc
Worth a try if you find yourself going nuts getting the parentheses
More information about the Python-list