[Python-Dev] PEP 393 Summer of Code Project

Wed Aug 24 04:56:57 CEST 2011

On Tue, Aug 23, 2011 at 18:56, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
>> kind=0 is used and public, it's PyUnicode_WCHAR_KIND. Is it still
>> necessary? It looks to be only used in PyUnicode_DecodeUnicodeEscape().
>
> If it can be removed, it would be nice to have kind in [0; 2] instead of kind
> in [1; 2], to be able to have a list (of 3 items) => callback or label.

It is also used in PyUnicode_DecodeUTF8Stateful() and there might be
some cases which I missed converting checks for 0 when I introduced
the macro.  The question was more if this should be written as 0 or as
a named constant.  I preferred the named constant for readability.

An alternative would be to have kind values be the same as the number
of bytes for the string representation so it would be 0 (wstr), 1
(1-byte), 2 (2-byte), or 4 (4-byte).

I think the value for wstr/uninitialized/reserved should not be
removed.  The wstr representation is still used in the error case in
the utf8 decoder because these strings can be resized. Also having one
designated value for "uninitialized" limits comparisons in the
affected functions to the kind value, otherwise they would need to
check the str field for NULL to determine in which buffer to write a
character.

> I suppose that compilers prefer a switch with all cases defined, 0 a first item
> and contiguous values. We may need an enum.

During the Summer of Code, Martin and I did a experiment with GCC and
it did not seem to produce a jump table as an optimization for three
cases but generated comparison instructions anyway.  I am not sure how
much we should optimize for potential compiler optimizations here.

Regards,
Torsten