[Python-Dev] Unicode support in getargs.c

Martin v. Loewis martin@v.loewis.de
Wed, 2 Jan 2002 00:58:04 +0100


>   String objects have no such accompanying unicode object, and I
>   think they should have.

No. That would either give you cyclic structures, or an ever growing
chain of unicode->string->unicode->string objects that could easily
result in unacceptable memory consumption.

Furthermore, I consider the existance of the embedded string object in
a Unicode object as a flaw in itself, as it relies on the default
encoding. IMO, the default encoding shouldn't be used if possible, as
it only serves the transition towards Unicode, and only in limited
ways.

> - There is no unicode equivalent of "c", the single character.

Why do you need that?

> - "u#" does something useful, but something completely different from
>   what "s#" does. More to the point, it probably does something
>   dangerous, if I understand correctly. If I write a C routine with an
>   "u#" format and the Python code passes a string object the string object
>   will be used as a buffer object and its binary contents will be interpreted
>   as unicode.

That sounds like a bug to me. Passing a string to u# most certainly
does not do the right thing; it is bad that does so silently.

OTOH, why do you need u#? Normally, you use s# if a string can have
embedded null bytes; you do that if the string is "binary". For
Unicode, that is useless: A Unicode string typically won't have any
embedded null bytes, and it definitely isn't "binary".

Regards,
Martin