Missing FAQ about Python3 and unicode
Hi, Slowly, we get recurrent questions about Python3 and unicode. It's maybe time to start a FAQ? Here is an ugly draft to start it ;-) (1) Exit on undecodable command line arguments $ LANG=en_GB.UTF-8 python3.0 test.py $'\xff' Could not convert argument 2 to string$ Is it an expected behaviour? Yes! Example of the question: http://bugs.python.org/issue3023 (2) Undecodable filenames os.listdir(str)->str raises an exception on undecodable filenames. Solution: use os.listdir(bytes)->bytes. To display the filename to the user, use a function like: import sys def humanFilename(filename): encoding = sys.getfilesystemencoding() return filename.encode(encoding, "replace") See also http://bugs.python.org/issue3187 (3) Bytes environment variables Python 3.0 only supports decodable variables for os.environ. Undecodable variables are skipped for the creation of os.environ but original variables still exist at the C level. $ A=$(echo -e "\xff") B=c ./python Python 3.1a0 (py3k:67973M, Dec 31 2008, 00:51:49)
import os os.environ.get('A'), os.environ.get('B') (None, 'c') retcode=os.system('echo -n $A|hexdump -C') 00000000 ff |.| 00000001 retcode=os.system('echo -n $B|hexdump -C') 00000000 63 |c| 00000001
Discussion to support bytes environment variables: http://mail.python.org/pipermail/python-dev/2008-December/083856.html -- Victor Stinner aka haypo http://www.haypocalc.com/blog/
On Tue, Dec 30, 2008 at 6:49 PM, Victor Stinner <victor.stinner@haypocalc.com> wrote:
Hi,
Slowly, we get recurrent questions about Python3 and unicode. It's maybe time to start a FAQ? Here is an ugly draft to start it ;-)
Looks like good stuff! It would probably make a good addition to the meager porting docs in development on the wiki. [1] ... [1] http://wiki.python.org/moin/PortingToPy3k -- Regards, Benjamin Peterson
participants (2)
-
Benjamin Peterson
-
Victor Stinner