[Tutor] Converting a String to a Tuple

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Sat Nov 5 01:51:41 CET 2005



On Fri, 4 Nov 2005, Carroll, Barry wrote:

> My UDP client is receiving responses from the server, and now I need to
> process them.  A typical response string looks like this:
>
>     "(0, ''),some data from the test system"
>
> The tuple represents the error code and message.  If the command had
> failed, the response would look like this:
>
>     "(-1, 'Error message from the test system')"
>
> I need to extract the tuple from the rest of the response string.  I can do
> this using eval, like so:
>
>     errtuple = eval(mytxt[:mytxt.find(')')+1])
>
> Is there another, more specific method for transforming a sting into a
> tuple?


Hi Barry,

Since this seems to be such a popular request, here is sample kludgy code
that provides a parse() function that does the structure-building.  This
parser doesn't much adequate error checking yet, and I apologize in
advance for that.  But I just want to do something to make sure people
don't use eval() to extract simple stuff out of network traffic.  *grin*

(In reality, we'd use a parser-generating tool like pyparsing to make the
code below simpler and with good error messages.)


##########################################################################
"""Simple parsing of expressions.  Meant to be a demo of how one could
turn strings into structures.  If we were to do this for real, though,
we'd definitely use parser generator tools instead.

Main usage:

    >>> parse("(0, ''),some data from the test system")
    (0, '')

    >>> parse("(-1, 'Error message from the test system')")
    (-1, 'Error message from the test system')
"""

import re

stringRegex = re.compile(r"""
                          '               # a single quote
                          (               # followed by any number of
                              [^']        # non-quote characters
                              |           # or
                              (\')        # an escaped quote
                          )*
                          '
                          """, re.VERBOSE)

numberRegex = re.compile(r"""
                          [+-]?         ## optional sign
                          \d+           ## one or more digits
                          """, re.VERBOSE)

def tokenize(s):
    """Returns an list of tokens.
    Each token will be of the form: (tokenType, datum)
    with the tokenType in ['string', 'number', '(', ')', ',']
    Tokenizes as much as it can.  When it first hits a non-token,
    will give up and return what it can.
    """
    tokens = []
    while True:
        s = s.lstrip()
        if not s: break
        if stringRegex.match(s):
            m = stringRegex.match(s)
            tokens.append( ('string', m.group(0)[1:-1]) )
            s = s[len(m.group(0)):]
        elif numberRegex.match(s):
            m = numberRegex.match(s)
            tokens.append( ('number', int(m.group(0))) )
            s = s[len(m.group(0)):]
        elif s[0] in ['(', ')', ',']:
            tokens.append( (s[0], None) )
            s = s[1:]
        else:
            break
    return tokens


def parse(s):
    """Given a string s, parses out a single expression from s.
    The result may be a string, a number, or a tuple."""
    tokens = tokenize(s)
    return parseExpression(tokens)


def parseExpression(tokens):
    """Parses a single expression.
    An expression can either be a number, a string, or a tuple.
    """
    if not tokens:
        raise ValueError, "Empty token list"
    firstToken = tokens[0]
    if firstToken[0] in ['number', 'string']:
        tokens.pop(0)
        return firstToken[1]
    elif firstToken[0] == '(':
        return parseTuple(tokens)
    else:
        raise ValueError, "Don't know how to handle", tokens[0]


def parseTuple(tokens):
    """Parses a tuple expression.
    A tuple is a '(', followed by a bunch of comma separated
    expressions, followed by a ')'.
    """
    elements = []
    eat(tokens, '(')
    while True:
        if not tokens:
            raise ValueError, ("Expected either ',', an expression," +
                               " or ')', but exhaused token list")
        if tokens[0][0] in ['number', 'string', '(']:
            elements.append(parseExpression(tokens))
            if tokens[0][0] == ')':
                break
            else:
                eat(tokens, ',')
        elif tokens[0][0] == ')':
            break
        else:
            raise ValueError, ("Don't know how to handle %r" %
                               (tokens[0],))
    eat(tokens, ')')
    return tuple(elements)


def eat(tokens, typeExpected):
    """Tries to eat a token of the given type, and returns its datum.
    If we can't, raises ValueError."""
    if not tokens:
        raise ValueError, ("Expected %4, but exhaused token list" %
                           (typeExpected,))
    token = tokens.pop(0)
    if token != (typeExpected, None):
        raise ValueError, ("Expected %r, but got %s" % (typeExpected,
                                                        token,))
    return token[1]
###########################################################################


Whew.  That was a mouthful.  *grin* But, again, that's because I'm cooking
almost everything from scratch.  Parser-generating tools will make this a
lot simpler.


Anyway, let's see how this parse() function works:

######
>>> parse("(0, ''),some data from the test system")
(0, '')
>>> parse("(-1, 'Error message from the test system')")
(-1, 'Error message from the test system')
>>> parse("'question 1) can this handle embedded parens in strings?'")
'question 1) can this handle embedded parens in strings?'
>>> parse("('question 2) how about this?:', ((((1), 2), 3), 4), 5)")
('question 2) how about this?:', ((((1,), 2), 3), 4), 5)
######


The worst thing that might happen from parsing an arbitrary string with
parse() will be an exception or with a structure that is way too deep or
large.  But other than that, parse() should be resistant to a
code-injection attack since it doesn't do anything too special: it's
mostly just list/string manipulation and recursion.


I hope this helps!



More information about the Tutor mailing list