[Python-Dev] Unicode support in getargs.c

Martin v. Loewis martin@v.loewis.de
Fri, 4 Jan 2002 00:23:35 +0100


> For the record: my view of Unicode is really "ascii done right", i.e. a 
> datatype that allows you to get richer characters than what 1960s ascii 
> gives you. 

Exactly, with the stress on *ASCII*. Almost everybody could agree on
ASCII; it is the 8-bit character sets where the troubles start.

> For this it should be as backward-compatible as possible, i.e.  if
> some API expects a unicode filename and I pass "a.out" it should
> interpret it as u"a.out".

That works fine with the current API.

> All the converting to different charsets is icing on the cake, the
> number one priority should be that unicode is as compatible as
> possible with the 8-bit convention used on the platform (whatever it
> may be).

The problem is that there are multiple conventions on many systems,
and only the application can know which of these to apply.

> Using Python StringObjects as binary buffers is also far less common
> than using StringObjects to store plain old strings, so if either of
> these uses bites the other it's the binary buffer that needs to
> suffer.

This is a conclusion I cannot agree with. Most strings are really
binary, if you look at them closely enough :-)

> UnicodeObjects and StringObjects should behave pretty orthogonal to
> how FloatObjects and IntObjects behave.

For the Python programmer: yes; For the C programmer: memory
management makes that inherently difficult, which you don't have for
int vs float.

Regards,
Martin