Most important text processing examples

Fri Jul 6 12:00:23 EDT 2001

For building dynamic web pages (generating, as opposed to parsing,
strings), I find tweaking Python's string substitution idioms to be
helpful. If you have a web template into which you want to drop some
content, standard Python practice  would be to read in a file that
looks like this:

template = """<html><head><title>%(title)s</title></head>
<body>%(body)s</body><html>"""

#... then generate a dictionary that looks like:
dict = {'title':'This is my title', 'body':'This is the body of my
page'}

#...and then combine the two with:
html = template % dict

#giving html ==
"""<html><head><title>This is my title</title></head>
<body>This is the body of my page</body><html>"""

If you fail to include either the 'title' or 'body' keys on the
dictionary, you will get a KeyError on the last step. When the target
has many tags, some of which are only going to be filled with text in
certain situations, keeping track of all the keys quickly becomes a
tedious chore. If you insist on doing it manually, your code will be
dominated by dictionary assignments. This is why so many of us have
written frameworks for generating web pages. I rely on a
dictionary-like object that returns an empty string if asked for a key
it doesn't have.

from UserDict import UserDict
class NoKeyErrors(UserDict):
    def __init__(self, dict={}):
        UserDict.__init__(self, dict)
    def __getitem__(self, key):
        return UserDict.get(self, key, '')

template % NoKeyErrors({'body':'This is the body of my page'})

#gives html == 
"""<html><head><title></title></head>
<body>This is the body of my page</body><html>"""

...rather than an error. In pages with a dozen or more possible keys,
this allows me to write code that focuses on the keys that will be used
in a given situation, rather than requiring every object to know about
the keys used by every other object that uses the same template. I used
to use another modification of UserDict:

class PartialStringSubstitution(UserDict):
    def __init__(self, dict={}):
        UserDict.__init__(self, dict)
    def __getitem__(self, key):
        return UserDict.get(self, key, '%(' + key + ')s')

which, if it fails to find a key, will re-insert the tag for the key.
This allows you to make multiple passes over a single template:

# early in the code
html = template % PartialStringSubstitution(
                           {'body':'This is the body of my page'})
# giving html ==
"""<html><head><title>%(title)s</title></head>
<body>This is the body of my page</body><html>"""

# much later
html = html % {'title':'This is my title'}

#which also results in html ==
"""<html><head><title>This is my title</title></head>
<body>This is the body of my page</body><html>"""

I only used this double-pass technique in one project, finding it
easier to update a single NoKeyErrors object with various auxiliary
dictionaries than to keep track of when I should make prelimary passes
with PartialStringSubstitution dictionaries and which keys they should
contain.

There are many ways to tackle this kind of string handling, and you can
probably do much better than my hacks. But if you want to write
reasonably modular code on large string generating projects, you will
need to find some way of decoupling the variables required by one
situation from those required by all of the others.