[Python-checkins] r88217 - peps/trunk/pep-0393.txt

Thu Jan 27 22:37:25 CET 2011

Author: martin.v.loewis
Date: Thu Jan 27 22:37:25 2011
New Revision: 88217

Log:
Changes from Nick Coghlan:
Clarify PyUnicode_AsUTF8 usage.
Rename PyUnicode_Finalize.
Store representation form in state.


Modified:
   peps/trunk/pep-0393.txt

Modified: peps/trunk/pep-0393.txt
==============================================================================

--- peps/trunk/pep-0393.txt	(original)
+++ peps/trunk/pep-0393.txt	Thu Jan 27 22:37:25 2011
@@ -69,13 +69,21 @@
 These fields have the following interpretations:
 
 - length: number of code points in the string (result of sq_length)
-- str: shortest-form representation of the unicode string; the lower
-  two bits of the pointer indicate the specific form:
-  01 => 1 byte (Latin-1); 10 => 2 byte (UCS-2); 11 => 4 byte (UCS-4);
-  00 => null pointer
-
+- str: shortest-form representation of the unicode string
   The string is null-terminated (in its respective representation).
-- hash, state: same as in Python 3.2
+- hash: same as in Python 3.2
+- state:
+
+  * lowest 2 bits (mask 0x03) - interned-state (SSTATE_*) as in 3.2
+  * next 2 bits (mask 0x0C) - form of str:
+
+    + 00 => reserved
+    + 01 => 1 byte (Latin-1)
+    + 10 => 2 byte (UCS-2)
+    + 11 => 4 byte (UCS-4);
+
+  * next bit (mask 0x10): 1 if str memory follows PyUnicodeObject  
+
 - utf8_length, utf8: UTF-8 representation (null-terminated)
 - wstr_length, wstr: representation in platform's wchar_t
   (null-terminated). If wchar_t is 16-bit, this form may use surrogate
@@ -123,11 +131,11 @@
 PyUnicode_FromUnicode remains supported but is deprecated. If the
 Py_UNICODE pointer is non-null, the str representation is set. If the
 pointer is NULL, a properly-sized wstr representation is allocated,
-which can be modified until PyUnicode_Finalize() is called (explicitly
+which can be modified until PyUnicode_Ready() is called (explicitly
 or implicitly). Resizing a Unicode string remains possible until it
 is finalized.
 
-PyUnicode_Finalize() converts a string containing only a wstr
+PyUnicode_Ready() converts a string containing only a wstr
 representation into the canonical representation. Unless wstr and str
 can share the memory, the wstr representation is discarded after the
 conversion.
@@ -139,7 +147,7 @@
 PyUnicode_Kind and PyUnicode_Data. PyUnicode_Kind gives one of the
 value PyUnicode_1BYTE (1), PyUnicode_2BYTE (2), or PyUnicode_4BYTE
 (3). PyUnicode_Data gives the void pointer to the data, masking out
-the pointer kind. All these functions call PyUnicode_Finalize
+the pointer kind. All these functions call PyUnicode_Ready
 in case the canonical representation hasn't been computed yet.
 
 A new function PyUnicode_AsUTF8 is provided to access the UTF-8
@@ -150,7 +158,7 @@
 should use the existing PyUnicode_AsUTF8String where possible
 (which generates a new string object every time). API that implicitly
 converts a string to a char* (such as the ParseTuple functions) will
-use this function to compute a conversion.
+use PyUnicode_AsUTF8 to compute a conversion.
 
 PyUnicode_AsUnicode is deprecated; it computes the wstr representation
 on first use.