[Python-checkins] r88890 - peps/trunk/pep-0393.txt
martin.v.loewis
python-checkins at python.org
Tue Aug 23 16:13:19 CEST 2011
Author: martin.v.loewis
Date: Tue Aug 23 16:13:19 2011
New Revision: 88890
Log:
Add open issues section, add macros, adjust macro
names to implementation.
Modified:
peps/trunk/pep-0393.txt
Modified: peps/trunk/pep-0393.txt
==============================================================================
--- peps/trunk/pep-0393.txt (original)
+++ peps/trunk/pep-0393.txt Tue Aug 23 16:13:19 2011
@@ -145,10 +145,17 @@
The canonical representation can be accessed using two macros
PyUnicode_Kind and PyUnicode_Data. PyUnicode_Kind gives one of the
-value PyUnicode_1BYTE (1), PyUnicode_2BYTE (2), or PyUnicode_4BYTE
-(3). PyUnicode_Data gives the void pointer to the data, masking out
-the pointer kind. All these functions call PyUnicode_Ready
-in case the canonical representation hasn't been computed yet.
+value PyUnicode_WCHAR_KIND (0), PyUnicode_1BYTE_KIND (1),
+PyUnicode_2BYTE_KIND (2), or PyUnicode_4BYTE _KIND(3). PyUnicode_Data
+gives the void pointer to the data, masking out the pointer kind. All
+these functions call PyUnicode_Ready in case the canonical
+representation hasn't been computed yet. Access to individual
+characters should use PyUnicode_{READ|WRITE}[_CHAR]:
+
+ - PyUnciode_READ(kind, data, index)
+ - PyUnicode_WRITE(kind, data, index, value)
+ - PyUnicode_READ_CHAR(unicode, index)
+ - PyUnicode_WRITE_CHAR(unicode, index, value)
A new function PyUnicode_AsUTF8 is provided to access the UTF-8
representation. It is thus identical to the existing
@@ -163,13 +170,6 @@
PyUnicode_AsUnicode is deprecated; it computes the wstr representation
on first use.
-String Operations
------------------
-
-Various convenience functions will be provided to deal with the
-canonical representation, in particular with respect to concatenation
-and slicing.
-
Stable ABI
----------
@@ -181,6 +181,30 @@
about the internals of CPython's data types, include PyUnicodeObject
instances. It will need to be slightly updated to track the change.
+Open Issues
+===========
+
+- When an application uses the legacy API, it may hold onto
+ the Py_UNICODE* representation, and yet start calling Unicode
+ APIs, which would call PyUnicode_Ready, invalidating the
+ Py_UNICODE* representation; this would be an incompatible change.
+ The following solutions can be considered:
+
+ * accept it as an incompatible change. Applications using the
+ legacy API will have to fill out the Py_UNICODE buffer completely
+ before calling any API on the string under construction.
+ * require explicit PyUnicode_Ready calls in such applications;
+ fail with a fatal error if a non-ready string is ever read.
+ This would also be an incompatible change, but one that is
+ more easily detected during testing.
+ * as a compromise between these approaches, implicit PyUnicode_Ready
+ calls (i.e. those not deliberately following the construction of
+ a PyUnicode object) could produce a warning if they convert an
+ object.
+
+- Which of the APIs created during the development of the PEP should
+ be public?
+
Discussion
==========
More information about the Python-checkins
mailing list