[Python-Dev] Missing FAQ about Python3 and unicode
Victor Stinner
victor.stinner at haypocalc.com
Wed Dec 31 01:49:32 CET 2008
Hi,
Slowly, we get recurrent questions about Python3 and unicode. It's maybe time
to start a FAQ? Here is an ugly draft to start it ;-)
(1) Exit on undecodable command line arguments
$ LANG=en_GB.UTF-8 python3.0 test.py $'\xff'
Could not convert argument 2 to string$
Is it an expected behaviour? Yes!
Example of the question: http://bugs.python.org/issue3023
(2) Undecodable filenames
os.listdir(str)->str raises an exception on undecodable filenames.
Solution: use os.listdir(bytes)->bytes. To display the filename to the user,
use a function like:
import sys
def humanFilename(filename):
encoding = sys.getfilesystemencoding()
return filename.encode(encoding, "replace")
See also http://bugs.python.org/issue3187
(3) Bytes environment variables
Python 3.0 only supports decodable variables for os.environ. Undecodable
variables are skipped for the creation of os.environ but original variables
still exist at the C level.
$ A=$(echo -e "\xff") B=c ./python
Python 3.1a0 (py3k:67973M, Dec 31 2008, 00:51:49)
>>> import os
>>> os.environ.get('A'), os.environ.get('B')
(None, 'c')
>>> retcode=os.system('echo -n $A|hexdump -C')
00000000 ff |.|
00000001
>>> retcode=os.system('echo -n $B|hexdump -C')
00000000 63 |c|
00000001
Discussion to support bytes environment variables:
http://mail.python.org/pipermail/python-dev/2008-December/083856.html
--
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
More information about the Python-Dev
mailing list