[Python-Dev] Unicode support in getargs.c
Martin v. Loewis
martin@v.loewis.de
Fri, 4 Jan 2002 00:23:35 +0100
> For the record: my view of Unicode is really "ascii done right", i.e. a
> datatype that allows you to get richer characters than what 1960s ascii
> gives you.
Exactly, with the stress on *ASCII*. Almost everybody could agree on
ASCII; it is the 8-bit character sets where the troubles start.
> For this it should be as backward-compatible as possible, i.e. if
> some API expects a unicode filename and I pass "a.out" it should
> interpret it as u"a.out".
That works fine with the current API.
> All the converting to different charsets is icing on the cake, the
> number one priority should be that unicode is as compatible as
> possible with the 8-bit convention used on the platform (whatever it
> may be).
The problem is that there are multiple conventions on many systems,
and only the application can know which of these to apply.
> Using Python StringObjects as binary buffers is also far less common
> than using StringObjects to store plain old strings, so if either of
> these uses bites the other it's the binary buffer that needs to
> suffer.
This is a conclusion I cannot agree with. Most strings are really
binary, if you look at them closely enough :-)
> UnicodeObjects and StringObjects should behave pretty orthogonal to
> how FloatObjects and IntObjects behave.
For the Python programmer: yes; For the C programmer: memory
management makes that inherently difficult, which you don't have for
int vs float.
Regards,
Martin