[Python-3000] Proposed Python 3.0 schedule

Victor Stinner victor.stinner at haypocalc.com
Tue Oct 7 11:30:35 CEST 2008


Hi,

First of all, please read my document:
http://wiki.python.org/moin/Python3UnicodeDecodeError

I moved the document to a public wiki to allow anyone to edit it!

Le Tuesday 07 October 2008 05:22:09 James Y Knight, vous avez écrit :
> On Oct 6, 2008, at 8:52 PM, Benjamin Peterson wrote:
> > I'm not sure we do. Correct me if I'm wrong, but the "big ticket",
> > issue bytes/unicode filepaths, has been resolved.

Python3 now accepts bytes for os.listdir(), open() (io.open()), os.unlink(), 
os.path.*(), etc. But it's not enough to say that Python3 can use bytes 
everywhere. It would take months or *years* to fix all issues related to 
bytes and unicode. Remember, this task started in 2000 with Python *2.0* 
(creation of the unicode type).

> Well, if you mean that the resolution decided upon is to "simply"
> allow access to all system APIs using either byte or unicode strings,
> then it seems to me that there's a rather large amount of work left to
> do...

If you know a problem, open a ticket and propose a solution. It's not possible 
to list all new problems since we don't know them yet :-)

>   - Having os.getcwdb isn't much use when you can't even run python in
> the first place when the current directory has "bad" bytes in it.

My python3.0 works correctly in a directory with an invalid name. What is your 
OS / locale / Python version? Please create a ticket if needed.

>   - I'd think "find . -type f -print0 | xargs -0 python -c 'pass'"
> ought to work (with files with "bad" bytes being returned by find),

First, fix your home directory :-) There are good tools (convmv?) to fix 
invalid filenames.

> which means that Python shouldn't blow up and refuse to start when
> there's a non-properly-encoding argv ("Could not convert argument 1 to
> string" and exiting isn't appropriate behavior)

Why not? It's a good idea to break compatibility to refuse invalid bytes 
sequences. You can still uses the command line, an input file or a GUI to 
read raw bytes sequences.

>   - Of course, just being able to start the interpreter isn't quite
> enough: you'll want to be able to access that argument list too,
> somehow (add sys.argvb?).

If we create sys.argvb, what shoul be done if sys.argv creation failed? 
sys.argv would be empty or unset? Or some values would be removed (and so 
argv[2] is argv[1])? I think that many (a lot of) programs suppose that 
sys.argv exists and "is valid". If you introduce a special case (sometimes, 
sys.argv doesn't exist or is truncated !?), it will introduce new issues.

>   - There's no os.environb for bytewise access to the environment.
> Seems important.

It would be strange if you can put a variable in bytes to os.environb whereas 
os.environ would not get the key. I know two major usages of the environment:
 (1) read a variable in Python
 (2) put a variable for a child process 

(1) can be done with os.getenv() and returns None if the variable (key or 
value) is an invalid bytes sequence.

(2) can be done with subprocess.Popen(). subprocess doesn't support bytes yet 
but I wrote patches: #4035 and #4036.

>   - Isn't it a potential security issue that " 'WHATEVER' in
> os.environ" can return False if WHATEVER had some "bad" bytes in it,
> but spawning a subprocess actually will include WHATEVER in the
> subprocess's environment?

Yes. Python should remove the key while creating os.environ.

> - Shouldn't this work? subprocess.call(b'/bin/echo')

Yes. Most programs (at least on Linux and Mac) supports bytes and so you 
should be able use bytes arguments in their command lines, see issues #4035 
and #4036.

>   - I suppose sys.path should handle bytestrings on the path, and
> should be populated using the bytes-version of os.environ so that
> PYTHONPATH gets read in properly. Which of course implies that all the
> importers need to handle byte filenames.

If your file system is broken, rename your directory but don't introduce a 
special case for sys.path. 

>   - zipfile.ZipFile(b'whatever.zip') doesn't work.

Since zipfile uses bytes in its file structure, zipfile should accept bytes. 
But the right question is: should this issue block Python3 or can we wait for 
Python 3.1 (maybe 3.0.1)?

--

People wants to try the new Python version! Python3 introduces new amazing 
features like "keyword only arguments". The bytes/unicode problem is old and 
only affects broken systems

Windows (90% of the computers in the world?) only uses characters for the 
filenames, environment and command line. Mac and Linux use UTF-8 most of the 
time, and slowly everything speaks UTF-8! Python3 should not be delayed 
because of this problem.

About the initial barry's question: why Python3 is delayed until december? 
There are too much open issues?

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/


More information about the Python-3000 mailing list