[Python-Dev] _PyUnicode_CheckConsistency() too strict?

Victor Stinner victor.stinner at gmail.com
Mon Feb 3 16:35:51 CET 2014


2014-02-03 Phil Thompson <phil at riverbankcomputing.com>:
> For example, a string created with a maxchar of 255 (ie. a Latin-1 string)
> must contain at least one character in the range 128-255 otherwise you get
> an assertion failure.

Yes, it's the specification of the PEP 393.

> As it stands, when converting Latin-1 strings in my C extension module I
> must first check each character and specify a maxchar of 127 if the strings
> happens to only contain ASCII characters.

Use PyUnicode_FromKindAndData(PyUnicode_1BYTE_KIND, latin1_str,
length) which computes the kind for you.

> What is the reasoning behind the checks being so strict?

Different Python functions rely on the exact kind to compare strings.
For example, if you search a latin1 substring in an ASCII string, the
search returns immediatly instead of searching in the string. A latin1
string cannot be found in an ASCII string.

The main reason in the PEP 393 itself, a string must be compact to not
waste memory.

Victor


More information about the Python-Dev mailing list