
Hi, First of all, please read my document: http://wiki.python.org/moin/Python3UnicodeDecodeError I moved the document to a public wiki to allow anyone to edit it! Le Tuesday 07 October 2008 05:22:09 James Y Knight, vous avez écrit :
On Oct 6, 2008, at 8:52 PM, Benjamin Peterson wrote:
I'm not sure we do. Correct me if I'm wrong, but the "big ticket", issue bytes/unicode filepaths, has been resolved.
Python3 now accepts bytes for os.listdir(), open() (io.open()), os.unlink(), os.path.*(), etc. But it's not enough to say that Python3 can use bytes everywhere. It would take months or *years* to fix all issues related to bytes and unicode. Remember, this task started in 2000 with Python *2.0* (creation of the unicode type).
Well, if you mean that the resolution decided upon is to "simply" allow access to all system APIs using either byte or unicode strings, then it seems to me that there's a rather large amount of work left to do...
If you know a problem, open a ticket and propose a solution. It's not possible to list all new problems since we don't know them yet :-)
- Having os.getcwdb isn't much use when you can't even run python in the first place when the current directory has "bad" bytes in it.
My python3.0 works correctly in a directory with an invalid name. What is your OS / locale / Python version? Please create a ticket if needed.
- I'd think "find . -type f -print0 | xargs -0 python -c 'pass'" ought to work (with files with "bad" bytes being returned by find),
First, fix your home directory :-) There are good tools (convmv?) to fix invalid filenames.
which means that Python shouldn't blow up and refuse to start when there's a non-properly-encoding argv ("Could not convert argument 1 to string" and exiting isn't appropriate behavior)
Why not? It's a good idea to break compatibility to refuse invalid bytes sequences. You can still uses the command line, an input file or a GUI to read raw bytes sequences.
- Of course, just being able to start the interpreter isn't quite enough: you'll want to be able to access that argument list too, somehow (add sys.argvb?).
If we create sys.argvb, what shoul be done if sys.argv creation failed? sys.argv would be empty or unset? Or some values would be removed (and so argv[2] is argv[1])? I think that many (a lot of) programs suppose that sys.argv exists and "is valid". If you introduce a special case (sometimes, sys.argv doesn't exist or is truncated !?), it will introduce new issues.
- There's no os.environb for bytewise access to the environment. Seems important.
It would be strange if you can put a variable in bytes to os.environb whereas os.environ would not get the key. I know two major usages of the environment: (1) read a variable in Python (2) put a variable for a child process (1) can be done with os.getenv() and returns None if the variable (key or value) is an invalid bytes sequence. (2) can be done with subprocess.Popen(). subprocess doesn't support bytes yet but I wrote patches: #4035 and #4036.
- Isn't it a potential security issue that " 'WHATEVER' in os.environ" can return False if WHATEVER had some "bad" bytes in it, but spawning a subprocess actually will include WHATEVER in the subprocess's environment?
Yes. Python should remove the key while creating os.environ.
- Shouldn't this work? subprocess.call(b'/bin/echo')
Yes. Most programs (at least on Linux and Mac) supports bytes and so you should be able use bytes arguments in their command lines, see issues #4035 and #4036.
- I suppose sys.path should handle bytestrings on the path, and should be populated using the bytes-version of os.environ so that PYTHONPATH gets read in properly. Which of course implies that all the importers need to handle byte filenames.
If your file system is broken, rename your directory but don't introduce a special case for sys.path.
- zipfile.ZipFile(b'whatever.zip') doesn't work.
Since zipfile uses bytes in its file structure, zipfile should accept bytes. But the right question is: should this issue block Python3 or can we wait for Python 3.1 (maybe 3.0.1)? -- People wants to try the new Python version! Python3 introduces new amazing features like "keyword only arguments". The bytes/unicode problem is old and only affects broken systems Windows (90% of the computers in the world?) only uses characters for the filenames, environment and command line. Mac and Linux use UTF-8 most of the time, and slowly everything speaks UTF-8! Python3 should not be delayed because of this problem. About the initial barry's question: why Python3 is delayed until december? There are too much open issues? -- Victor Stinner aka haypo http://www.haypocalc.com/blog/