[Python-checkins] r74237 - in python/branches/py3k: Doc/library/re.rst Lib/test/test_re.py Misc/NEWS Modules/_sre.c
mark.dickinson
python-checkins at python.org
Tue Jul 28 19:22:37 CEST 2009
Author: mark.dickinson
Date: Tue Jul 28 19:22:36 2009
New Revision: 74237
Log:
Issue #6561: '\d' in a regular expression should match only Unicode
character category [Nd], not [No].
Modified:
python/branches/py3k/Doc/library/re.rst
python/branches/py3k/Lib/test/test_re.py
python/branches/py3k/Misc/NEWS
python/branches/py3k/Modules/_sre.c
Modified: python/branches/py3k/Doc/library/re.rst
==============================================================================
--- python/branches/py3k/Doc/library/re.rst (original)
+++ python/branches/py3k/Doc/library/re.rst Tue Jul 28 19:22:36 2009
@@ -338,11 +338,12 @@
``\d``
For Unicode (str) patterns:
- Matches any Unicode digit (which includes ``[0-9]``, and also many
- other digit characters). If the :const:`ASCII` flag is used only
- ``[0-9]`` is matched (but the flag affects the entire regular
- expression, so in such cases using an explicit ``[0-9]`` may be a
- better choice).
+ Matches any Unicode decimal digit (that is, any character in
+ Unicode character category [Nd]). This includes ``[0-9]``, and
+ also many other digit characters. If the :const:`ASCII` flag is
+ used only ``[0-9]`` is matched (but the flag affects the entire
+ regular expression, so in such cases using an explicit ``[0-9]``
+ may be a better choice).
For 8-bit (bytes) patterns:
Matches any decimal digit; this is equivalent to ``[0-9]``.
Modified: python/branches/py3k/Lib/test/test_re.py
==============================================================================
--- python/branches/py3k/Lib/test/test_re.py (original)
+++ python/branches/py3k/Lib/test/test_re.py Tue Jul 28 19:22:36 2009
@@ -605,6 +605,27 @@
self.assertEqual(next(iter).span(), (4, 4))
self.assertRaises(StopIteration, next, iter)
+ def test_bug_6561(self):
+ # '\d' should match characters in Unicode category 'Nd'
+ # (Number, Decimal Digit), but not those in 'Nl' (Number,
+ # Letter) or 'No' (Number, Other).
+ decimal_digits = [
+ '\u0037', # '\N{DIGIT SEVEN}', category 'Nd'
+ '\u0e58', # '\N{THAI DIGIT SIX}', category 'Nd'
+ '\uff10', # '\N{FULLWIDTH DIGIT ZERO}', category 'Nd'
+ ]
+ for x in decimal_digits:
+ self.assertEqual(re.match('^\d$', x).group(0), x)
+
+ not_decimal_digits = [
+ '\u2165', # '\N{ROMAN NUMERAL SIX}', category 'Nl'
+ '\u3039', # '\N{HANGZHOU NUMERAL TWENTY}', category 'Nl'
+ '\u2082', # '\N{SUBSCRIPT TWO}', category 'No'
+ '\u32b4', # '\N{CIRCLED NUMBER THIRTY NINE}', category 'No'
+ ]
+ for x in not_decimal_digits:
+ self.assertIsNone(re.match('^\d$', x))
+
def test_empty_array(self):
# SF buf 1647541
import array
Modified: python/branches/py3k/Misc/NEWS
==============================================================================
--- python/branches/py3k/Misc/NEWS (original)
+++ python/branches/py3k/Misc/NEWS Tue Jul 28 19:22:36 2009
@@ -108,6 +108,10 @@
Extension Modules
-----------------
+- Issue #6561: '\d' in a regex now matches only characters with
+ Unicode category 'Nd' (Number, Decimal Digit). Previously it also
+ matched characters with category 'No'.
+
- Issue #4509: Array objects are no longer modified after an operation
failing due to the resize restriction in-place when the object has exported
buffers.
Modified: python/branches/py3k/Modules/_sre.c
==============================================================================
--- python/branches/py3k/Modules/_sre.c (original)
+++ python/branches/py3k/Modules/_sre.c Tue Jul 28 19:22:36 2009
@@ -168,7 +168,7 @@
#if defined(HAVE_UNICODE)
-#define SRE_UNI_IS_DIGIT(ch) Py_UNICODE_ISDIGIT((Py_UNICODE)(ch))
+#define SRE_UNI_IS_DIGIT(ch) Py_UNICODE_ISDECIMAL((Py_UNICODE)(ch))
#define SRE_UNI_IS_SPACE(ch) Py_UNICODE_ISSPACE((Py_UNICODE)(ch))
#define SRE_UNI_IS_LINEBREAK(ch) Py_UNICODE_ISLINEBREAK((Py_UNICODE)(ch))
#define SRE_UNI_IS_ALNUM(ch) Py_UNICODE_ISALNUM((Py_UNICODE)(ch))
More information about the Python-checkins
mailing list