[Python-Dev] _PyUnicode_CheckConsistency() too strict?

Mon Feb 3 18:16:03 CET 2014

On 03-02-2014 4:38 pm, Paul Moore wrote:
> On 3 February 2014 16:10, Phil Thompson <phil at riverbankcomputing.com> 
> wrote:
>> That doesn't answer my original question, that just works around the 
>> use
>> case I presented.
>>
>> To restate...
>>
>> Why is a Latin-1 string considered inconsistent just because it 
>> doesn't
>> happen to contain any characters in the range 128-255?
>
> Butting in here (sorry) but I thought what Victor was trying to say 
> is
> that being able to say that a string marked as Latin1 "kind"
> definitely has characters >127 allows the code to optimise some tests
> (for example, two strings cannot be equal if their kinds differ).

So there *is* code that will fail if a particular Latin-1 string just 
happens not to contains any character greater than 127?

> Obviously, requiring this kind of constraint makes it somewhat harder
> for user code to construct string objects that conform to the spec.
> That's why the PyUnicode_FromKindAndData function has the convenience
> feature of doing the check and setting the kind correctly for you -
> you should use that rather than trying to get the details right
> yourself..
>
> Paul.

I see now...

The docs for PyUnicode_FromKindAndData() say...

"Create a new Unicode object *with* the given kind"

...and so I didn't think is was useful to me. If they said...

"Create a new Unicode object *from* the given kind"

...then I might have got it.

Thanks - I'm happy now.

Phil