[Python-3000] Proposed Python 3.0 schedule
Victor Stinner
victor.stinner at haypocalc.com
Tue Oct 7 11:30:35 CEST 2008
Hi,
First of all, please read my document:
http://wiki.python.org/moin/Python3UnicodeDecodeError
I moved the document to a public wiki to allow anyone to edit it!
Le Tuesday 07 October 2008 05:22:09 James Y Knight, vous avez écrit :
> On Oct 6, 2008, at 8:52 PM, Benjamin Peterson wrote:
> > I'm not sure we do. Correct me if I'm wrong, but the "big ticket",
> > issue bytes/unicode filepaths, has been resolved.
Python3 now accepts bytes for os.listdir(), open() (io.open()), os.unlink(),
os.path.*(), etc. But it's not enough to say that Python3 can use bytes
everywhere. It would take months or *years* to fix all issues related to
bytes and unicode. Remember, this task started in 2000 with Python *2.0*
(creation of the unicode type).
> Well, if you mean that the resolution decided upon is to "simply"
> allow access to all system APIs using either byte or unicode strings,
> then it seems to me that there's a rather large amount of work left to
> do...
If you know a problem, open a ticket and propose a solution. It's not possible
to list all new problems since we don't know them yet :-)
> - Having os.getcwdb isn't much use when you can't even run python in
> the first place when the current directory has "bad" bytes in it.
My python3.0 works correctly in a directory with an invalid name. What is your
OS / locale / Python version? Please create a ticket if needed.
> - I'd think "find . -type f -print0 | xargs -0 python -c 'pass'"
> ought to work (with files with "bad" bytes being returned by find),
First, fix your home directory :-) There are good tools (convmv?) to fix
invalid filenames.
> which means that Python shouldn't blow up and refuse to start when
> there's a non-properly-encoding argv ("Could not convert argument 1 to
> string" and exiting isn't appropriate behavior)
Why not? It's a good idea to break compatibility to refuse invalid bytes
sequences. You can still uses the command line, an input file or a GUI to
read raw bytes sequences.
> - Of course, just being able to start the interpreter isn't quite
> enough: you'll want to be able to access that argument list too,
> somehow (add sys.argvb?).
If we create sys.argvb, what shoul be done if sys.argv creation failed?
sys.argv would be empty or unset? Or some values would be removed (and so
argv[2] is argv[1])? I think that many (a lot of) programs suppose that
sys.argv exists and "is valid". If you introduce a special case (sometimes,
sys.argv doesn't exist or is truncated !?), it will introduce new issues.
> - There's no os.environb for bytewise access to the environment.
> Seems important.
It would be strange if you can put a variable in bytes to os.environb whereas
os.environ would not get the key. I know two major usages of the environment:
(1) read a variable in Python
(2) put a variable for a child process
(1) can be done with os.getenv() and returns None if the variable (key or
value) is an invalid bytes sequence.
(2) can be done with subprocess.Popen(). subprocess doesn't support bytes yet
but I wrote patches: #4035 and #4036.
> - Isn't it a potential security issue that " 'WHATEVER' in
> os.environ" can return False if WHATEVER had some "bad" bytes in it,
> but spawning a subprocess actually will include WHATEVER in the
> subprocess's environment?
Yes. Python should remove the key while creating os.environ.
> - Shouldn't this work? subprocess.call(b'/bin/echo')
Yes. Most programs (at least on Linux and Mac) supports bytes and so you
should be able use bytes arguments in their command lines, see issues #4035
and #4036.
> - I suppose sys.path should handle bytestrings on the path, and
> should be populated using the bytes-version of os.environ so that
> PYTHONPATH gets read in properly. Which of course implies that all the
> importers need to handle byte filenames.
If your file system is broken, rename your directory but don't introduce a
special case for sys.path.
> - zipfile.ZipFile(b'whatever.zip') doesn't work.
Since zipfile uses bytes in its file structure, zipfile should accept bytes.
But the right question is: should this issue block Python3 or can we wait for
Python 3.1 (maybe 3.0.1)?
--
People wants to try the new Python version! Python3 introduces new amazing
features like "keyword only arguments". The bytes/unicode problem is old and
only affects broken systems
Windows (90% of the computers in the world?) only uses characters for the
filenames, environment and command line. Mac and Linux use UTF-8 most of the
time, and slowly everything speaks UTF-8! Python3 should not be delayed
because of this problem.
About the initial barry's question: why Python3 is delayed until december?
There are too much open issues?
--
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
More information about the Python-3000
mailing list