[XML-SIG] processing "special characters" efficiently

David Goodger dgoodger@bigfoot.com
Thu, 13 Apr 2000 18:47:45 -0400

> From: Craig.Curtin@wdr.com
> Date: Thu, 6 Apr 2000 15:57:43 -0500
> i'm looking for an efficient mechanism for filtering out
> XML special characters....

I don't know if you follow comp.lang.python, but Fredrik Lundh just posted
the solution to your problem. His book "(the eff-bot guide to) the standard
python library" looks to be a treasure trove of such examples. Enjoy!

David Goodger    dgoodger@bigfoot.com    Open-source projects:
 - The Go Tools Project: http://gotools.sourceforge.net
 (more to come!)

Fredrik Lundh <effbot@telia.com> posted to comp.lang.python:

Randall Hopper <aa8vb@yahoo.com> wrote:
> Is there a Python feature or standard library API that will get me less
> Python code spinning inside this loop?   re.multisub or equivalent? :-)

haven't benchmarked it, but I suspect that this approach
is more efficient:


# based on re-example-5.py

import re
import string

symbol_map = { "foo": "FOO", "bar": "BAR" }

def symbol_replace(match, get=symbol_map.get):
    return get(match.group(1), "")

symbol_pattern = re.compile(
    "(" + string.join(map(re.escape, symbol_map.keys()), "|") + ")"

print symbol_pattern.sub(symbol_replace, "foobarfiebarfoo")



<!-- (the eff-bot guide to) the standard python library:


Randall Hopper <aa8vb@yahoo.com> wrote:
> Thanks!  It's much more efficient.  The 140 seconds original running time
> was reduced to 11.6 seconds.  I can certainly live with that.

thought so ;-)

while you're at it, try replacing the original readline loop with:

    while 1:
        lines = fp.readlines(BUFFERSIZE)
        if not lines:
        lines = string.join(lines, "")
        lines = re.sub(...)

where BUFFERSIZE is 1000000 or so...