[Python-Dev] [Python-3000] Proposed Python 3.0 schedule
James Y Knight
foom at fuhm.net
Tue Oct 7 05:22:09 CEST 2008
On Oct 6, 2008, at 8:52 PM, Benjamin Peterson wrote:
> I'm not sure we do. Correct me if I'm wrong, but the "big ticket",
> issue bytes/unicode filepaths, has been resolved. And looking at the
> tracker, I only see 18 release blockers.
Well, if you mean that the resolution decided upon is to "simply"
allow access to all system APIs using either byte or unicode strings,
then it seems to me that there's a rather large amount of work left to
do...
Here's some I found from a few minutes of futzing around with r66821
of py3k on Linux.
- Having os.getcwdb isn't much use when you can't even run python in
the first place when the current directory has "bad" bytes in it.
Currently Python outputs:
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: can't initialize sys standard streams
ImportError: No module named encodings.utf_8
Aborted
- I'd think "find . -type f -print0 | xargs -0 python -c 'pass'"
ought to work (with files with "bad" bytes being returned by find),
which means that Python shouldn't blow up and refuse to start when
there's a non-properly-encoding argv ("Could not convert argument 1 to
string" and exiting isn't appropriate behavior).
- Of course, just being able to start the interpreter isn't quite
enough: you'll want to be able to access that argument list too,
somehow (add sys.argvb?).
- And then, getopt and optparse modules should work on bytestring
vectors, so that you can use sys.argvb without writing your own
argument parser. They don't currently.
- There's no os.environb for bytewise access to the environment.
Seems important.
- Isn't it a potential security issue that " 'WHATEVER' in
os.environ" can return False if WHATEVER had some "bad" bytes in it,
but spawning a subprocess actually will include WHATEVER in the
subprocess's environment? Actually, even better: the behavior depends
on whether you use subprocess.call('foo') or subprocess.call('foo',
os.environ). The first passes through the "bad" environment variables,
while the second does not. A bit surprising, perhaps.
- Shouldn't this work?
subprocess.call(b'/bin/echo')
Currently raises an exception:
AttributeError: 'int' object has no attribute 'rfind'
- I suppose sys.path should handle bytestrings on the path, and
should be populated using the bytes-version of os.environ so that
PYTHONPATH gets read in properly. Which of course implies that all the
importers need to handle byte filenames.
- zipfile.ZipFile(b'whatever.zip') doesn't work.
- zipfile decodes/encodes the filenames inside the zip file to
unicode, so thus can only handle correctly encoded filenames.
I'm sure there's even more APIs dealing with pathnames, command line
arguments, or environment variables that ought to be able to handle
both bytes and strings, that currently don't.
James
More information about the Python-Dev
mailing list