[Python-3000] [Python-Dev] Proposed Python 3.0 schedule
James Y Knight
foom at fuhm.net
Tue Oct 7 17:51:19 CEST 2008
On Oct 7, 2008, at 3:47 AM, Martin v. Löwis wrote:
>> - Having os.getcwdb isn't much use when you can't even run python in
>> the first place when the current directory has "bad" bytes in it.
>
> That's not true: it *is* of much use. Python will live in /usr/bin,
> which has a nicely-decodable path.
>
>> Currently Python outputs:
>> Could not find platform independent libraries <prefix>
>> Could not find platform dependent libraries <exec_prefix>
>> Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
>> Fatal Python error: Py_Initialize: can't initialize sys standard
>> streams
>> ImportError: No module named encodings.utf_8
>> Aborted
>
> I can't reproduce that. This happens (for me) when Python lives in
> a directory that has an undecodable path - not when the current
> directory is undecodable.
Sorry about that: this test was indeed in error: I ran "../python"
from an undecodeable current directory, rather than "/full/path/to/
python", or putting python on the PATH and running it as "python". The
first does not work, but the other more common ways to start it do.
>>
>> I'm sure there's even more APIs dealing with pathnames, command line
>> arguments, or environment variables that ought to be able to handle
>> both
>> bytes and strings, that currently don't.
>
> Please, no.
I completely and totally agree with your distate, it's rather gross to
allow bytes-or-str for every API that touches anything like filenames/
argv/environ. That's why I was pushing for the reversible conversion
to str...But if bytes-or-str is the solution that's been chosen for
this issue, it ought to either be fully committed to and implemented,
or at least fully recognized and documented as a half-baked solution.
Of course, if an reversible encoding into string solution is used
instead, none of these things would need special treatment: they would
all work already.
FWIW: Qt works fine with undecodeable filenames, and it too uses
unicode strings everywhere in its API. I looked into what it does, and
found that it uses your (Martin)'s original idea for solving this: it
stores undecodeable bytes as characters from 0x10fe00 to 0x10feff
(which is valid private-use codespace). While that might not be
ideally correct, since you lose those 256 PUA characters, even that is
IMO better than pushing out bytes to every API, or worse, giving up
and just having python unable to access files, as it is now.
See lines 3074: QString::toUtf8() and 3408: QString::fromUtf8()) of
http://www.google.com/codesearch?q=+show:o7fNK6SzOYs:NO-Bv-AR2rI:toIOngLf1V8&cs_p=http://ie.archive.ubuntu.com/trolltech/pub/qt/snapshots/qt-x11-opensource-src-4.4.0-snapshot-20070402.tar.bz2&cs_f=qt-x11-opensource-src-4.4.0-snapshot-20070402/src/corelib/tools/qstring.cpp
James
More information about the Python-3000
mailing list