[Python-Dev] Missing FAQ about Python3 and unicode

Victor Stinner victor.stinner at haypocalc.com
Wed Dec 31 01:49:32 CET 2008


Hi,

Slowly, we get recurrent questions about Python3 and unicode. It's maybe time 
to start a FAQ? Here is an ugly draft to start it ;-)


(1) Exit on undecodable command line arguments

   $ LANG=en_GB.UTF-8 python3.0 test.py $'\xff'
   Could not convert argument 2 to string$

Is it an expected behaviour? Yes!

Example of the question: http://bugs.python.org/issue3023


(2) Undecodable filenames

os.listdir(str)->str raises an exception on undecodable filenames.

Solution: use os.listdir(bytes)->bytes. To display the filename to the user, 
use a function like:

   import sys
   def humanFilename(filename):
      encoding = sys.getfilesystemencoding()
      return filename.encode(encoding, "replace")

See also http://bugs.python.org/issue3187


(3) Bytes environment variables

Python 3.0 only supports decodable variables for os.environ. Undecodable 
variables are skipped for the creation of os.environ but original variables 
still exist at the C level.

$ A=$(echo -e "\xff") B=c ./python
Python 3.1a0 (py3k:67973M, Dec 31 2008, 00:51:49)
>>> import os
>>> os.environ.get('A'), os.environ.get('B')
(None, 'c')
>>> retcode=os.system('echo -n $A|hexdump -C')
00000000  ff                                                |.|
00000001
>>> retcode=os.system('echo -n $B|hexdump -C')
00000000  63                                                |c|
00000001

Discussion to support bytes environment variables:
http://mail.python.org/pipermail/python-dev/2008-December/083856.html

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/


More information about the Python-Dev mailing list