[Python-checkins] bpo-44987: Speed up unicode normalization of ASCII strings (GH-28283)

serhiy-storchaka webhook-mailer at python.org
Sat Sep 11 11:04:43 EDT 2021


https://github.com/python/cpython/commit/9abd07e5963f966c4d6df8f4e4bf390ed8191066
commit: 9abd07e5963f966c4d6df8f4e4bf390ed8191066
branch: main
author: Dong-hee Na <donghee.na at python.org>
committer: serhiy-storchaka <storchaka at gmail.com>
date: 2021-09-11T18:04:38+03:00
summary:

bpo-44987: Speed up unicode normalization of ASCII strings (GH-28283)

files:
A Misc/NEWS.d/next/Library/2021-09-11-14-41-02.bpo-44987.Mt8DiX.rst
M Doc/whatsnew/3.11.rst
M Modules/unicodedata.c

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 9befe8f2732e7..254d7224a7a50 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -287,6 +287,9 @@ Optimizations
 
 * :file:`.pdbrc` is now read with ``'utf-8'`` encoding.
 
+* Pure ASCII strings are now normalized in constant time by :func:`unicodedata.normalize`.
+  (Contributed by Dong-hee Na in :issue:`bpo-44987`.)
+
 
 CPython bytecode changes
 ========================
diff --git a/Misc/NEWS.d/next/Library/2021-09-11-14-41-02.bpo-44987.Mt8DiX.rst b/Misc/NEWS.d/next/Library/2021-09-11-14-41-02.bpo-44987.Mt8DiX.rst
new file mode 100644
index 0000000000000..dec50d87c916c
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2021-09-11-14-41-02.bpo-44987.Mt8DiX.rst
@@ -0,0 +1,2 @@
+Pure ASCII strings are now normalized in constant time by :func:`unicodedata.normalize`.
+Patch by Dong-hee Na.
diff --git a/Modules/unicodedata.c b/Modules/unicodedata.c
index b4563f331d5a8..97585725c0b6e 100644
--- a/Modules/unicodedata.c
+++ b/Modules/unicodedata.c
@@ -807,6 +807,10 @@ is_normalized_quickcheck(PyObject *self, PyObject *input, bool nfc, bool k,
         return NO;
     }
 
+    if (PyUnicode_IS_ASCII(input)) {
+        return YES;
+    }
+
     Py_ssize_t i, len;
     int kind;
     const void *data;



More information about the Python-checkins mailing list