Re: [Python-Dev] Re: Unicode debate

28 Apr 2000


      [GvR, on string.encoding ]
...
Marc-Andre took this idea a bit further, but I think it's not
practical given the current implementation: there are too many places
where the C code would have to be changed in order to propagate the
string encoding information,
I may miss something, but the encoding attr just travels with the string
object, no? Like I said in my reply to MAL, I think it's undesirable to do
*anything* with the encoding attr if not in combination with a unicode
string.
...
and there are too many sources of strings
with unknown encodings to make it very useful.
That's why the default encoding must be settable as well, as Fredrik suggested.
...
Plus, it would slow down 8-bit string ops.
Not if you ignore it most of the time, and just pass it along when
concatenating.
...
I have a better idea: rather than carrying around 8-bit strings with
an encoding, use Unicode literals in your source code.
Explain that to newbies... I guess is that they will want simple 8 bit
strings in their native encoding. Dunno.
...
If the source
encoding is known, these will be converted using the appropriate
codec.
If you object to having to write u"..." all the time, we could say
that "..." is a Unicode literal if it contains any characters with the
top bit on (of course the source file encoding would be used just like
for u"...").
Only if "\377" would still yield an 8-bit string, for binary goop...

Just

Re: [Python-Dev] Re: Unicode debate

Just van Rossum