incorrect(?) shlex behaviour

Sun May 15 12:27:26 EDT 2005

bill wrote:
> Its gets worse:
> >>> from shlex import StringIO
> >>> from shlex import shlex
> >>> t = shlex(StringIO("2>&1"))
> >>> while True:
> ...  b = t.read_token()
> ...  if not b: break
> ...  print b
> ...
> 2
> &
> 1                    <----------- where's the '>' !?
> >>> import shlex
> >>> print shlex.split("2>&1")
> ['2>&1']
>
> It strikes me that split should be behaving exactly the same way as
> read_token, but that may be a misunderstanding on my part of what
split
> is doing.
>
> However, it is totally bizarre that read_token discards the '>'
symbol
> in the string!  I don't know much about lexical analysis, but it
> strikes me that discarding characters is a bad thing.
>From the docs:
split(s[, comments])
    Split the string s using shell-like syntax. If comments is False
(the default), the parsing of comments in the given string will be
disabled (setting the commenters member of the shlex instance to the
empty string). This function operates in POSIX mode. New in version
2.3.

Maybe looking at string methods split might help.
>>> "$(which sh)".split()
['($(which', 'sh)']

>From the docs:
read_token()
    Read a raw token. Ignore the pushback stack, and do not interpret
source requests. (This is not ordinarily a useful entry point, and is
documented here only for the sake of completeness.)

# Just like in my first post
>>> from StringIO import StringIO
>>> from shlex import shlex
>>> t = shlex(StringIO("2>&1"))
>>> t.get_token()
'2'
>>> t.get_token()
'>'
>>> t.get_token()
'&'
>>> t.get_token()
'1'
>>> t.get_token()
''
# Your way
>>> t = shlex(StringIO("2>&1"))
>>> t.read_token()
'2'
>>> t.read_token()
'&'
>>> t.read_token()
'1'
>>> t.read_token()
''
>>>

Hth,
M.E.Farmer