[ python-Bugs-1518406 ] re '\' char interpretation problem

SourceForge.net noreply at sourceforge.net
Fri Jul 7 00:55:02 CEST 2006


Bugs item #1518406, was opened at 2006-07-06 21:26
Message generated for change (Comment added) made by niemeyer
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1518406&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Regular Expressions
Group: Python 2.4
Status: Closed
Resolution: Invalid
Priority: 5
Submitted By: ollie oldham (ooldham)
Assigned to: Gustavo Niemeyer (niemeyer)
Summary: re '\' char interpretation problem

Initial Comment:
I've run across 2 problems having to do with '\' 
character problems with the re module.

Problem 1 does not match the re when it should have.
Problem 2 matches, when it should not have.

There is a short snippet of code attached that shows 
the problems I'm having, and the output as it occurs 
on my machine.

I'm running on Windows 2000
Python versions: 2.4b1 and 2.4.3c1 both act the same 
way.

Problem (1) : why does * work and not + ?
import re
rex = re.compile(r'[a-z]:\.*', re.IGNORECASE)
rey = re.compile(r'[a-z]:\.+', re.IGNORECASE)
path1 = r'D:\Logs'
print rex.match(path1) # Matches - as it should have.
print rey.match(path1) # FAILES to match - should have.

Problem 2) : match occurs on nonUncPath when it should 
not
import re
uncPath = r'\\someUNC\path'
nonUncPath = r'\nonUnc\path'
rew = re.compile('\\\\.+', re.IGNORECASE)
print rew.match(uncPath) # works as it should.
print rew.match(nonUncPath) # matches and it should 
NOT.


----------------------------------------------------------------------

>Comment By: Gustavo Niemeyer (niemeyer)
Date: 2006-07-06 22:55

Message:
Logged In: YES 
user_id=7887

Please, use a single way to report issues. Do not message
*and* add a comment to the bug.

I think you're missing the behavior of r'' in Python. It
changes the way the Python interpreter parses the string,
not the way the regular expression compiler/interpreter
works. r'\.' is precisely the same as '\\.', and both of
them really describe the string |\.|.

  >>> r'\.' == '\\.'
  True

  >>> print r'\.'
  \.

Escaping a dot means a real dot. Please have a look at the
re module documentation and perhaps some general regular
expression info for more details.


----------------------------------------------------------------------

Comment By: ollie oldham (ooldham)
Date: 2006-07-06 22:46

Message:
Logged In: YES 
user_id=649833

I beg to differ on problem 1)

Since ‘r’ was used in the definition of both the re and 
path, the ‘.’ Char is not being escaped (not supposed to be 
anyway).
And even if it is, then rex=re.compile(‘[a-z]:\\.+’, 
re.IGNORECASE) should get me what I want (in textual form:: 
char a-z colon backslash with 1 or more trailing chars).
But that does not work either.

I beg to differ on item 2) as well:
Yes - '\\\\.+' is the equivalent of r'\\.+'
BUT I then read this as: 2 backslashes with 1 or more 
chars – NOT backslash with escaped ‘.’


----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2006-07-06 21:36

Message:
Logged In: YES 
user_id=7887

1) r'[a-z]:\.+' should not match r'D:\Logs'. r'\.+' matches
one or more dots. There's no dot in this string.

2) '\\\\.+' is the equivalent of r'\\.+', and should match
anything that starts with a '\' and has at least one char
following it, which includes r'\nonUnc\path'.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1518406&group_id=5470


More information about the Python-bugs-list mailing list