[New-bugs-announce] [issue16688] Backreferences make case-insensitive regex fail on non-ASCII strings.
pyos
report at bugs.python.org
Fri Dec 14 23:19:34 CET 2012
New submission from pyos:
The title says it all: if a regular expression that makes use of backreferences is compiled with `re.I` flag, it will always fail when matched against a string that contains characters outside of U+0000-U+00FF range. I've been unable to further narrow the bug down.
A simple example:
>>> import re
>>> r = re.compile(r'(a)\1', re.I) # should match "aa", "aA", "Aa", or "AA"
>>> r.findall('aa') # works as expected
['a']
>>> r.findall('aa bcd') # still works
['a']
>>> r.findall('aa Ā') # ord('Ā') == 0x0100
[]
The same code works as expected in Python 3.2:
>>> r.findall('aa Ā')
['a']
----------
components: Regular Expressions
messages: 177518
nosy: ezio.melotti, mrabarnett, pitrou, pyos
priority: normal
severity: normal
status: open
title: Backreferences make case-insensitive regex fail on non-ASCII strings.
type: behavior
versions: Python 3.3
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue16688>
_______________________________________
More information about the New-bugs-announce
mailing list