A bug in Python's regular expression engine?
Just Another Victim of the Ambient Morality
ihatespam at hotmail.com
Tue Nov 27 11:19:37 EST 2007
"Paul Hankin" <paul.hankin at gmail.com> wrote in message
news:31047857-42ca-415e-83be-d1d360341ab0 at j20g2000hsi.googlegroups.com...
> On Nov 27, 3:48 pm, "Just Another Victim of the Ambient Morality"
> <ihates... at hotmail.com> wrote:
>> This won't compile for me:
>>
>> regex = re.compile('(.*\\).*')
>>
>> I get the error:
>>
>> sre_constants.error: unbalanced parenthesis
>>
>> I'm running Python 2.5 on WinXP. I've tried this expression with
>> another RE engine in another language and it works just fine which leads
>> me
>> to believe the problem is Python. Can anyone confirm or deny this bug?
>
> Your code is equivalent to:
> regex = re.compile(r'(.*\).*')
>
> Written like this, it's easier to see that you've started a regular
> expression group with '(', but it's never closed since your closed
> parenthesis is escaped (which causes it to match a literal ')' when
> used). Hence the reported error (which isn't a bug).
>
> Perhaps you meant this?
> regex = re.compile(r'(.*\\).*')
>
> This matches any number of characters followed by a backslash (group
> 1), and then any number of characters. If you're using this for path
> splitting filenames under Windows, you should look at os.path.split
> instead of writing your own.
Indeed, I did end up using os.path functions, instead.
I think I see what's going on. Backslash has special meaning in both
the regular expression and Python string declarations. So, my version
should have been something like this:
regex = re.compile('(.*\\\\).*')
That is funny. Thank you for your help...
Just for clarification, what does the "r" in your code do?
More information about the Python-list
mailing list