On 20/12/2011 09:54, Antoine Pitrou wrote:
Hello,
The include file (unicodeobject.h) seems to imply that some pure ASCII strings can be non-compact, but I don't understand how that can happen.
If you create a string from Py_UNICODE* or wchar_t* (using the legacy API), PyUnicode_READY() may create a non-compact but ASCII string. Such string would be in the following state (extract of unicodeobject.h): - legacy string, ready: * structure = PyUnicodeObject structure * test: !PyUnicode_IS_COMPACT(op) && kind != PyUnicode_WCHAR_KIND * kind = PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND or PyUnicode_4BYTE_KIND * compact = 0 * ready = 1 * data.any is not NULL * utf8 is shared and utf8_length = length with data.any if ascii = 1 * utf8_length = 0 if utf8 is NULL
Besides, the following comment also seems wrong:
- compact:
* structure = PyCompactUnicodeObject * test: PyUnicode_IS_ASCII(op)&& !PyUnicode_IS_COMPACT(op)
I added the "test" lines recently because I always forget how to get the structure type. The correct test should be: - compact: * structure = PyCompactUnicodeObject * test: PyUnicode_IS_COMPACT(op) && !PyUnicode_IS_ASCII(op) Victor