A bug in Python's regular expression engine?
Paul Hankin
paul.hankin at gmail.com
Tue Nov 27 11:07:17 EST 2007
On Nov 27, 3:48 pm, "Just Another Victim of the Ambient Morality"
<ihates... at hotmail.com> wrote:
> This won't compile for me:
>
> regex = re.compile('(.*\\).*')
>
> I get the error:
>
> sre_constants.error: unbalanced parenthesis
>
> I'm running Python 2.5 on WinXP. I've tried this expression with
> another RE engine in another language and it works just fine which leads me
> to believe the problem is Python. Can anyone confirm or deny this bug?
Your code is equivalent to:
regex = re.compile(r'(.*\).*')
Written like this, it's easier to see that you've started a regular
expression group with '(', but it's never closed since your closed
parenthesis is escaped (which causes it to match a literal ')' when
used). Hence the reported error (which isn't a bug).
Perhaps you meant this?
regex = re.compile(r'(.*\\).*')
This matches any number of characters followed by a backslash (group
1), and then any number of characters. If you're using this for path
splitting filenames under Windows, you should look at os.path.split
instead of writing your own.
HTH
--
Paul Hankin
More information about the Python-list
mailing list