[Tutor] Converting a String to a Tuple
Danny Yoo
dyoo at hkn.eecs.berkeley.edu
Sat Nov 5 01:51:41 CET 2005
On Fri, 4 Nov 2005, Carroll, Barry wrote:
> My UDP client is receiving responses from the server, and now I need to
> process them. A typical response string looks like this:
>
> "(0, ''),some data from the test system"
>
> The tuple represents the error code and message. If the command had
> failed, the response would look like this:
>
> "(-1, 'Error message from the test system')"
>
> I need to extract the tuple from the rest of the response string. I can do
> this using eval, like so:
>
> errtuple = eval(mytxt[:mytxt.find(')')+1])
>
> Is there another, more specific method for transforming a sting into a
> tuple?
Hi Barry,
Since this seems to be such a popular request, here is sample kludgy code
that provides a parse() function that does the structure-building. This
parser doesn't much adequate error checking yet, and I apologize in
advance for that. But I just want to do something to make sure people
don't use eval() to extract simple stuff out of network traffic. *grin*
(In reality, we'd use a parser-generating tool like pyparsing to make the
code below simpler and with good error messages.)
##########################################################################
"""Simple parsing of expressions. Meant to be a demo of how one could
turn strings into structures. If we were to do this for real, though,
we'd definitely use parser generator tools instead.
Main usage:
>>> parse("(0, ''),some data from the test system")
(0, '')
>>> parse("(-1, 'Error message from the test system')")
(-1, 'Error message from the test system')
"""
import re
stringRegex = re.compile(r"""
' # a single quote
( # followed by any number of
[^'] # non-quote characters
| # or
(\') # an escaped quote
)*
'
""", re.VERBOSE)
numberRegex = re.compile(r"""
[+-]? ## optional sign
\d+ ## one or more digits
""", re.VERBOSE)
def tokenize(s):
"""Returns an list of tokens.
Each token will be of the form: (tokenType, datum)
with the tokenType in ['string', 'number', '(', ')', ',']
Tokenizes as much as it can. When it first hits a non-token,
will give up and return what it can.
"""
tokens = []
while True:
s = s.lstrip()
if not s: break
if stringRegex.match(s):
m = stringRegex.match(s)
tokens.append( ('string', m.group(0)[1:-1]) )
s = s[len(m.group(0)):]
elif numberRegex.match(s):
m = numberRegex.match(s)
tokens.append( ('number', int(m.group(0))) )
s = s[len(m.group(0)):]
elif s[0] in ['(', ')', ',']:
tokens.append( (s[0], None) )
s = s[1:]
else:
break
return tokens
def parse(s):
"""Given a string s, parses out a single expression from s.
The result may be a string, a number, or a tuple."""
tokens = tokenize(s)
return parseExpression(tokens)
def parseExpression(tokens):
"""Parses a single expression.
An expression can either be a number, a string, or a tuple.
"""
if not tokens:
raise ValueError, "Empty token list"
firstToken = tokens[0]
if firstToken[0] in ['number', 'string']:
tokens.pop(0)
return firstToken[1]
elif firstToken[0] == '(':
return parseTuple(tokens)
else:
raise ValueError, "Don't know how to handle", tokens[0]
def parseTuple(tokens):
"""Parses a tuple expression.
A tuple is a '(', followed by a bunch of comma separated
expressions, followed by a ')'.
"""
elements = []
eat(tokens, '(')
while True:
if not tokens:
raise ValueError, ("Expected either ',', an expression," +
" or ')', but exhaused token list")
if tokens[0][0] in ['number', 'string', '(']:
elements.append(parseExpression(tokens))
if tokens[0][0] == ')':
break
else:
eat(tokens, ',')
elif tokens[0][0] == ')':
break
else:
raise ValueError, ("Don't know how to handle %r" %
(tokens[0],))
eat(tokens, ')')
return tuple(elements)
def eat(tokens, typeExpected):
"""Tries to eat a token of the given type, and returns its datum.
If we can't, raises ValueError."""
if not tokens:
raise ValueError, ("Expected %4, but exhaused token list" %
(typeExpected,))
token = tokens.pop(0)
if token != (typeExpected, None):
raise ValueError, ("Expected %r, but got %s" % (typeExpected,
token,))
return token[1]
###########################################################################
Whew. That was a mouthful. *grin* But, again, that's because I'm cooking
almost everything from scratch. Parser-generating tools will make this a
lot simpler.
Anyway, let's see how this parse() function works:
######
>>> parse("(0, ''),some data from the test system")
(0, '')
>>> parse("(-1, 'Error message from the test system')")
(-1, 'Error message from the test system')
>>> parse("'question 1) can this handle embedded parens in strings?'")
'question 1) can this handle embedded parens in strings?'
>>> parse("('question 2) how about this?:', ((((1), 2), 3), 4), 5)")
('question 2) how about this?:', ((((1,), 2), 3), 4), 5)
######
The worst thing that might happen from parsing an arbitrary string with
parse() will be an exception or with a structure that is way too deep or
large. But other than that, parse() should be resistant to a
code-injection attack since it doesn't do anything too special: it's
mostly just list/string manipulation and recursion.
I hope this helps!
More information about the Tutor
mailing list