Parsing strings (\n and \\)

Fredrik Lundh fredrik at
Wed Jun 26 12:29:44 CEST 2002

Thomas Guettler wrote:

> Look at the two functoins quote and unquote. I wrote them
> without regular expression because I think it faster.

faster to write, perhaps.

and faster to run, if you only use them on strings with no
more than 2-3 characters.

but if you use a different set of test strings with more ordinary
characters than escaped characters, e.g.

     strings = ['foo', '', '\\', ' ', '"', '\\"', '\\\\']
     strings = [(x+"spamspamspamspamspam")*10 for x in strings]

you'll find that a RE approach can be much faster.  the following
version is about four times faster than your code, under 2.2:

def re_quote(string, sub=re.compile(r"[\\\"]").sub):
    def fixup(m):
        return "\\" +
    return sub(fixup, string)

def re_unquote(string, sub=re.compile(r"(?s)\\(.)|\\").sub):
    def fixup(m):
        ch =
        if ch is None:
            raise 'Parse Error: Backslash at end of string'
        if ch not in r"\\\"":
            raise 'Parse Error: unsupported character after backslash'
        return ch
    return sub(fixup, string)


note the use of callbacks instead of substitution templates.  it's
usually faster (and in my opinion, also more pythonic) to use e.g.

    def fixup(m):
        return "spam %s %s" %, 2)
    re.sub(pattern, fixup, string)

or, if you prefer lambdas:

    re.sub(pattern, lambda m: "spam %s %s" %, 2), string)

than the re.sub non-standard interpolation syntax:

    re.sub(pattern, "spam \\1 \\2", string)

(and where possible, it's also slightly faster to use m.groups() instead
of enumerating all the groups in

ymmv, as usual.


<!-- (the eff-bot guide to) the python standard library:

More information about the Python-list mailing list