[Python-checkins] bpo-19865: ctypes.create_unicode_buffer() supports non-BMP strings on Windows (GH-14081)

Miss Islington (bot) webhook-mailer at python.org
Fri Jun 14 12:43:26 EDT 2019


https://github.com/python/cpython/commit/b0f6fa8d7d4c6d8263094124df9ef9cf816bbed6
commit: b0f6fa8d7d4c6d8263094124df9ef9cf816bbed6
branch: 3.8
author: Miss Islington (bot) <31488909+miss-islington at users.noreply.github.com>
committer: GitHub <noreply at github.com>
date: 2019-06-14T09:43:22-07:00
summary:

bpo-19865: ctypes.create_unicode_buffer() supports non-BMP strings on Windows (GH-14081)

(cherry picked from commit 9765efcb39fc03d5b1abec3924388974470a8bd5)

Co-authored-by: Zackery Spytz <zspytz at gmail.com>

files:
A Misc/NEWS.d/next/Library/2019-06-14-08-30-16.bpo-19865.FRGH4I.rst
M Lib/ctypes/__init__.py
M Lib/ctypes/test/test_buffers.py

diff --git a/Lib/ctypes/__init__.py b/Lib/ctypes/__init__.py
index 4107db3e3972..128155dbf4f2 100644
--- a/Lib/ctypes/__init__.py
+++ b/Lib/ctypes/__init__.py
@@ -274,7 +274,15 @@ def create_unicode_buffer(init, size=None):
     """
     if isinstance(init, str):
         if size is None:
-            size = len(init)+1
+            if sizeof(c_wchar) == 2:
+                # UTF-16 requires a surrogate pair (2 wchar_t) for non-BMP
+                # characters (outside [U+0000; U+FFFF] range). +1 for trailing
+                # NUL character.
+                size = sum(2 if ord(c) > 0xFFFF else 1 for c in init) + 1
+            else:
+                # 32-bit wchar_t (1 wchar_t per Unicode character). +1 for
+                # trailing NUL character.
+                size = len(init) + 1
         buftype = c_wchar * size
         buf = buftype()
         buf.value = init
diff --git a/Lib/ctypes/test/test_buffers.py b/Lib/ctypes/test/test_buffers.py
index 166faaf4e4b8..15782be757c8 100644
--- a/Lib/ctypes/test/test_buffers.py
+++ b/Lib/ctypes/test/test_buffers.py
@@ -60,5 +60,14 @@ def test_unicode_conversion(self):
         self.assertEqual(b[::2], "ac")
         self.assertEqual(b[::5], "a")
 
+    @need_symbol('c_wchar')
+    def test_create_unicode_buffer_non_bmp(self):
+        expected = 5 if sizeof(c_wchar) == 2 else 3
+        for s in '\U00010000\U00100000', '\U00010000\U0010ffff':
+            b = create_unicode_buffer(s)
+            self.assertEqual(len(b), expected)
+            self.assertEqual(b[-1], '\0')
+
+
 if __name__ == "__main__":
     unittest.main()
diff --git a/Misc/NEWS.d/next/Library/2019-06-14-08-30-16.bpo-19865.FRGH4I.rst b/Misc/NEWS.d/next/Library/2019-06-14-08-30-16.bpo-19865.FRGH4I.rst
new file mode 100644
index 000000000000..efd1f55c0135
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2019-06-14-08-30-16.bpo-19865.FRGH4I.rst
@@ -0,0 +1,2 @@
+:func:`ctypes.create_unicode_buffer()` now also supports non-BMP characters
+on platforms with 16-bit :c:type:`wchar_t` (for example, Windows and AIX).



More information about the Python-checkins mailing list