On Thu, 4 May 2000 22:22:38 +0100, Just van Rossum <just@letterror.com> wrote:
(Boy, is it quiet here all of a sudden ;-)
Sorry for the duplication of stuff, but I'd like to reiterate my points, to separate them from my implementation proposal, as that's just what it is: an implementation detail.
These things are important to me: - get rid of the Unicode-ness of wide strings, in order to - make narrow and wide strings as similar as possible - implicit conversion between narrow and wide strings should happen purely on the basis of the character codes; no assumption at all should be made about the encoding, ie. what the character code _means_. - downcasting from wide to narrow may raise OverflowError if there are characters in the wide string that are > 255 - str(s) should always return s if s is a string, whether narrow or wide - file objects need to be responsible for handling wide strings - the above two points should make it possible for - if no encoding is known, Unicode is the default, whether narrow or wide
The above points seem to have the following consequences: - the 'u' in \uXXXX notation no longer makes much sense, since it is not neccesary for the character to be a Unicode code point: it's just a 2-byte int. \wXXXX might be an option. - the u"" notation is no longer neccesary: if a string literal contains a character > 255 the string should automatically become a wide string. - narrow strings should also have an encode() method. - the builtin unicode() function might be redundant if: - it is possible to specify a source encoding. I'm not sure if this is best done through an extra argument for encode() or that it should be a new method, eg. transcode().
- s.encode() or s.transcode() are allowed to output a wide string, as in aNarrowString.encode("UCS-2") and s.transcode("Mac-Roman", "UCS-2").
One other pleasant consequence: - String comparisons work character-by character, even if the representation of those characters have different widths.
My proposal to extend the "old" string type to be able to contain wide strings is of course largely unrelated to all this. Yet it may provide some additional C compatibility (especially now that silent conversion to utf-8 is out) as well as a workaround for the str()-having-to-return-a-narrow-string bottleneck.
Toby Dickenson tdickenson@geminidataloggers.com