[Tutor] re syntax

Kent Johnson kent37 at tds.net
Mon Aug 28 16:12:34 CEST 2006


Tiago Saboga wrote:
> The problem is: why the hell do I need 
> to escape special caracters when in verbose mode? And why isn't it made clear 
> on the re module doc page ( http://docs.python.org/lib/module-re.html ).
>
> I'll just paste below (or above? I never know how to say this in english) my 
> ipython's session, you'll see what I mean.
>   
Below is correct.
> But before that, an extra question to tutors: do you have some advice for 
> people like me who like to program but can't spend enough time doing it?
>
> ...it's a joke...
>   
Hire a personal trainer :-)
> In [18]: re.compile("ab").match("abc").group()
> Out[18]: 'ab'
>
> In [19]: re.compile("ab", re.X).match("abc").group()
> Out[19]: 'ab'
>
> In [20]: re.compile("a\tb").match("a\tbc").group()
> Out[20]: 'a\tb'
>
> In [21]: re.compile("a\tb", re.X).match("a\tbc").group()
> ---------------------------------------------------------------------------
> exceptions.AttributeError                            Traceback (most recent 
> call last)
>
>   
Ahem. Which part of "Whitespace within the pattern is ignored" do you 
not understand? :-)
Your pattern is <character a><character TAB><character b>. Last I 
checked TAB is considered to be whitespace ;)

OK, I know it's a bit more subtle than that, but when you swear at the 
docs I get a little defensive...

In a non-raw string, backslash escapes are interpreted by the Python 
compiler (or parser...). The actual string object seen by your program 
does not contain a backslash. So when you pass a regex of "a\tb", 
re.compile() sees a TAB just as if you typed it in directly. If you 
don't specify re.X, the tab is considered part of the regex and matched. 
If you do specify re.X, the tab is whitespace and ignored as requested.

My recommendation is to *always* use raw strings to specify regexes. In 
a raw string, backslash escapes are not interpreted by the compiler, 
they become part of the actual string. If you use r"a\tb", there is no 
TAB in the string, it is a literal backslash followed by a character 
't'. The regex engine knows how to interpret the standard backslash 
escapes so they will work as you expect, even with the verbose flag.

Kent
> /home/tiago/<ipython console>
>
> AttributeError: 'NoneType' object has no attribute 'group'
>
> In [22]: re.compile("a\tb", re.X).match(r"a\tbc").group()
> ---------------------------------------------------------------------------
> exceptions.AttributeError                            Traceback (most recent 
> call last)
>
> /home/tiago/<ipython console>
>
> AttributeError: 'NoneType' object has no attribute 'group'
>
> In [23]: re.compile(r"a\tb", re.X).match("a\tbc").group()
> Out[23]: 'a\tb'
>
> In [24]: re.compile("a\\tb", re.X).match("a\tbc").group()
> Out[24]: 'a\tb'
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
>   




More information about the Tutor mailing list