[Python-Dev] PEP 215 redux: toward a simplified consensus?

Barry A. Warsaw barry@zope.com
Mon, 25 Feb 2002 12:41:26 -0500


I'm still woefully behind on my email since returning from vacation,
but I thought I'd rehash a bit on PEP 215, string interpolation, given
some recent hacking and thinking about stuff we talked about at IPC10.

Background: PEP 215 has some interesting ideas, but IMHO is more than
I'm comfortable with.  At IPC10, Guido described his rules for string
interpolation as they would be if his time machine were more powerful.
These follow some discussions we've had during various Zope sprints
about making the rules simpler for non-programmers to understand.
I've also been struggling with how error prone %(var)s substitutions
can be in the thru-the-web Mailman strings where this is supported.
Here's what I've come up with.

Guido's rules for $-substitutions are really simple:

1. $$ substitutes to just a single $

2. $identifier followed by non-identifier characters gets interpolated
   with the value of the 'identifier' key in the substitution
   dictionary.

3. For handling cases where the identifier is followed by identifier
   characters that aren't part of the key, ${identfier} is equivalent
   to $identifier.

And that's it.  For the sake of discussion, forget about where the
dictionary for string interpolation comes from.

I've hacked together 4 functions which I'm experimentally using to
provide these rules in thru-the-web string editing, and also for
sanity checking the strings as they're submitted.  I think there's a
fairly straightforward conversion between traditional %-strings and
these newfangled $-strings, and so two of the functions do the
conversions back and forth.

The second two functions attempt to return a list of all the
substitution variables found in either a %-string or a $-string.  I
match this against the list of known legal substitution variables, and
bark loudly if there's some mismatch.

The one interesting thing about %-to-$ conversion is that the regexp I
use leaves the trailing `s' in %(var)s as optional, so I can
auto-correct for those that are missing.  I think this was an idea
that Paul Dubois came up with during the lunch discussion.  Seems to
work well, and I can do a %-to-$-to-% roundtrip; if the strings at the
ends are the same then there wasn't any missing `s's, otherwise the
conversion auto-corrected and I can issue a warning.

This is all really proto-stuff, but I've done some limited testing and
it seems to work pretty well.  So without changing the language we can
play with $-strings using Guido's rules to see if we like them or not,
by simply converting them to traditional %-strings manually, and then
doing the mod-operator substitutions.

Hopefully I've extracted the right bits of code from my modules for
you to get the idea.  There may be bugs <wink>.

-Barry

-------------------- snip snip --------------------
import re

from string import digits
try:
    # Python 2.2
    from string import ascii_letters
except ImportError:
    # Older Pythons
    _lower = 'abcdefghijklmnopqrstuvwxyz'
    ascii_letters = _lower + _lower.upper()

# Search for $(identifier)s strings, except that the trailing s is optional,
# since that's a common mistake
cre = re.compile(r'%\(([_a-z]\w*?)\)s?', re.IGNORECASE)
# Search for $$, $identifier, or ${identifier}
dre = re.compile(r'(\${2})|\$([_a-z]\w*)|\${([_a-z]\w*)}', re.IGNORECASE)

IDENTCHARS = ascii_letters + digits + '_'
EMPTYSTRING = ''

# Utilities to convert from simplified $identifier substitutions to/from
# standard Python $(identifier)s substititions.  The "Guido rules" for the
# former are:
#    $$ -> $
#    $identifier -> $(identifier)s
#    ${identifier} -> $(identifier)s

def to_dollar(s):
    """Convert from %-strings to $-strings."""
    s = s.replace('$', '$$')
    parts = cre.split(s)
    for i in range(1, len(parts), 2):
        if parts[i+1] and parts[i+1][0] in IDENTCHARS:
            parts[i] = '${' + parts[i] + '}'
        else:
            parts[i] = '$' + parts[i]
    return EMPTYSTRING.join(parts)


def to_percent(s):
    """Convert from $-strings to %-strings."""
    s = s.replace('%', '%%')
    parts = dre.split(s)
    for i in range(1, len(parts), 4):
        if parts[i] is not None:
            parts[i] = '$'
        elif parts[i+1] is not None:
            parts[i+1] = '%(' + parts[i+1] + ')s'
        else:
            parts[i+2] = '%(' + parts[i+2] + ')s'
    return EMPTYSTRING.join(filter(None, parts))


def dollar_identifiers(s):
    """Return the set (dictionary) of identifiers found in a $-string."""
    d = {}
    for name in filter(None, [b or c or None for a, b, c in dre.findall(s)]):
        d[name] = 1
    return d


def percent_identifiers(s):
    """Return the set (dictionary) of identifiers found in a %-string."""
    d = {}
    for name in cre.findall(s):
        d[name] = 1
    return d

-------------------- snip snip --------------------
Python 2.2 (#1, Dec 24 2001, 15:39:01) 
[GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import dollar
>>> dollar.to_dollar('%(one)s %(two)three %(four)seven')
'$one ${two}three ${four}even'
>>> dollar.to_percent(dollar.to_dollar('%(one)s %(two)three %(four)seven'))
'%(one)s %(two)sthree %(four)seven'
>>> dollar.percent_identifiers('%(one)s %(two)three %(four)seven')
{'four': 1, 'two': 1, 'one': 1}
>>> dollar.dollar_identifiers(dollar.to_dollar('%(one)s %(two)three %(four)seven'))
{'four': 1, 'two': 1, 'one': 1}