[New-bugs-announce] [issue6988] shlex.split() converts unicode input to UCS-4 output with varying byte order

Bill Fenner report at bugs.python.org
Thu Sep 24 15:54:08 CEST 2009


New submission from Bill Fenner <fenner at gmail.com>:

In python 2.5, shlex handled unicode input fine:

Python 2.5.1 (r251:54863, Jun 15 2008, 18:24:51) 
[GCC 4.3.0 20080428 (Red Hat 4.3.0-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import shlex
>>> shlex.split( u'Hello, World!' )
['Hello,', 'World!']

In python 2.6, shlex turns unicode input into UCS-4 output, thus utterly
confusing execl:

Python 2.6 (r26:66714, Jun  8 2009, 16:07:29)
[GCC 4.4.0 20090506 (Red Hat 4.4.0-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import shlex
>>> shlex.split( u'Hello, World' )
['H\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00,\x00\x00\x00',
'\x00\x00\x00W\x00\x00\x00o\x00\x00\x00r\x00\x00\x00l\x00\x00\x00d\x00\x00\x00']

Even weirder, the two return strings have different byte order (see
'H\x00\x00\x00' vs. '\x00\x00\x00W'!)

----------
components: Library (Lib)
messages: 93074
nosy: fenner
severity: normal
status: open
title: shlex.split() converts unicode input to UCS-4 output with varying byte order
versions: Python 2.6

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6988>
_______________________________________


More information about the New-bugs-announce mailing list