[Python-Dev] Bytes for the command line, process arguments and environment variables
victor.stinner at haypocalc.com
Sat Jan 3 04:29:11 CET 2009
Python 3.0 is released and supports unicode everywhere, great! But as pointed
by different people, bytes are required on non-Windows OS for backward
compatibility. This email is just a sum up all many issues/email threads.
Problems with Python 3.0:
(1) Invalid unicode string on the command line
=> some people wants to get the command line arguments as bytes
and so start even if non decodable unicode strings are present
on the command line
(2) Non decodable environment variables are skipped in os.environ
=> Create os.environb (or anything else) to get these variables
as bytes (and be able to setup new variables as bytes)
=> Read the email thread "Python-3.0, unicode, and os.environ"
(Decembre 2008) opened by Toshio Kuratomi
(3) Support bytes for os.exec*() and subprocess.Popen(): process arguments
and the environment variables
=> http://bugs.python.org/issue4035: my patch for os.exec*()
=> http://bugs.python.org/issue4036: my patch for subprocess.Popen()
I like the curent behaviour and I don't want to change it. Be free to propose
a solution to solve the issue ;-)
I already proposed "os.environb" which will have the similar API
than "os.environ" but with bytes. Relations between os.environb and
- for an undecodable variable value in os.environb, os.environ will raise
a KeyError. Example with utf8 charset and os.environb[b'PATH'] = '\xff':
path=os.environ['PATH'] will raise a KeyError to keep the current
- os.environ raises an UnicodeDecodeError if the key or value can not be
encoded in the current charset. Example with ASCII charset:
os.environ['PATH'] = '/home/hayp\xf4'
- except undecodable variable values in os.environb, os.environ and
os.environb will be consistent. Example: delete a variable in
os.environb will also delete the key in os.environ.
I think that most of these points (or all points) are ok for everyone
(especially ok for Toshio Kuratomi and me :-)).
Now I have to try to write an implementation of this, but it's complex,
especially to keep os.environ and os.environb consistents!
I proposed patches to fix non-Windows OS, but Antoine Pitrou wants also bytes
on Windows. Amaury wrote that it's possible using the ANSI version of the
Windows API. I don't know this API and so I can not contribute to this point.
Use a private Unicode block causes interoperability problems:
- the block may be already used by other programs/libraires
- 3rd party programs/libraries don't understand this block and may
have problems this display/process the data
(Is the idea really rejected? It has at least many problems)
I don't have new solutions, it's just an email to restart the discussion about
bytes ;-) Martin also asked for a PEP to change the posix module API to
Victor Stinner aka haypo
More information about the Python-Dev