[pypy-issue] [issue786] Module re, \d does with re.U option does not work the same way as with CPython
tracker at bugs.pypy.org
Fri Jul 8 05:38:10 CEST 2011
New submission from Simon <simon.corston at nuance.com>:
\d is interpreted as [0-9] but with the re.U option in CPython it gets
interpreted as anything having the Unicode character attribute of digit.
This means that in CPython, \d will match the superscript 3 when used with re.U.
In Pypy, it doesn't leading to diffs in output for the same regex.
This behavior is analogous to the intepretation of \w according to the re.U
switch, which _does_ work correctly in pypy.
Repro code attached.
nosy: linguist, pypy-issue
title: Module re, \d does with re.U option does not work the same way as with CPython
PyPy bug tracker <tracker at bugs.pypy.org>
-------------- next part --------------
# With the re.U switch, \d matches more than just 0-9
# Returns True in CPython, False in Pypy
print unicodeDigitMatcher.match(foo) is not None
# Without the re.U switch, \d matches only 0-9
# Returns False in Cpython and False in Pypy
print arabicDigitMatcher.match(foo) is not None
More information about the pypy-issue