[Python-Dev] Python 1.5.2 modules need porting to 2.0 because of unicode - comments please
Guido van Rossum
guido@beopen.com
Tue, 19 Sep 2000 17:00:34 -0500
> > I doubt that we can fix all Unicode related bugs in the 2.0
> > stdlib before the final release... let's make this a project
> > for 2.1.
>
> Exactly my feelings. Since we cannot possibly fix all problems, we may
> need to change the behaviour later.
>
> If we now silently do the wrong thing, silently changing it to the
> then-right thing in 2.1 may break peoples code. So I'm asking that
> cases where it does not clearly do the right thing produces an
> exception now; we can later fix it to accept more cases, should need
> occur.
>
> In the specific case, dropping support for Unicode output in binary
> files is the right thing. We don't know what the user expects, so it
> is better to produce an exception than to silently put incorrect bytes
> into the stream - that is a bug that we still can fix.
>
> The easiest way with the clearest impact is to drop the buffer
> interface in unicode objects. Alternatively, not supporting them in
> for s# also appears reasonable. Users experiencing the problem in
> testing will then need to make an explicit decision how they want to
> encode the Unicode objects.
>
> If any expedition of the issue is necessary, I can submit a bug report,
> and propose a patch.
Sounds reasonable to me (but I haven't thought of all the issues).
For writing binary Unicode strings, one can use
f.write(u.encode("utf-16")) # Adds byte order mark
f.write(u.encode("utf-16-be")) # Big-endian
f.write(u.encode("utf-16-le")) # Little-endian
--Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)