[Python-Dev] _PyUnicode_CheckConsistency() too strict?

Phil Thompson phil at riverbankcomputing.com
Mon Feb 3 16:44:27 CET 2014


On 03-02-2014 3:35 pm, Victor Stinner wrote:
> 2014-02-03 Phil Thompson <phil at riverbankcomputing.com>:
>> For example, a string created with a maxchar of 255 (ie. a Latin-1 
>> string)
>> must contain at least one character in the range 128-255 otherwise 
>> you get
>> an assertion failure.
>
> Yes, it's the specification of the PEP 393.
>
>> As it stands, when converting Latin-1 strings in my C extension 
>> module I
>> must first check each character and specify a maxchar of 127 if the 
>> strings
>> happens to only contain ASCII characters.
>
> Use PyUnicode_FromKindAndData(PyUnicode_1BYTE_KIND, latin1_str,
> length) which computes the kind for you.
>
>> What is the reasoning behind the checks being so strict?
>
> Different Python functions rely on the exact kind to compare strings.
> For example, if you search a latin1 substring in an ASCII string, the
> search returns immediatly instead of searching in the string. A 
> latin1
> string cannot be found in an ASCII string.
>
> The main reason in the PEP 393 itself, a string must be compact to 
> not
> waste memory.
>
> Victor

Are you saying that code will fail if a particular Latin-1 string just 
happens not to contains any character greater than 127?

I would be very surprised if that was the case. If it isn't the case 
then I think that particular check shouldn't be made.

Phil


More information about the Python-Dev mailing list