[Python-checkins] python/nondist/sandbox/string mod292.py,NONE,1.1

rhettinger at users.sourceforge.net rhettinger at users.sourceforge.net
Tue Sep 7 07:22:18 CEST 2004


Update of /cvsroot/python/python/nondist/sandbox/string
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv4905

Added Files:
	mod292.py 
Log Message:
Add a version using the % operator.

--- NEW FILE: mod292.py ---
r''' Doctests for PEP 292's string template functions

Now, it makes sure the return type is a str if all the inputs are a str. Any
unicode components will cause a unicode output.  This matches the behavior of
other re and string ops:

>>> Template('the $xxx and') % dict(xxx='10')
'the 10 and'
>>> Template(u'the $xxx and') % dict(xxx='10')
u'the 10 and'
>>> Template('the $xxx and') % dict(xxx=u'10')
u'the 10 and'
>>> Template(u'the $xxx and') % dict(xxx=u'10')
u'the 10 and'


Non-strings are auto-stringized to the type of the template:

>>> Template('the $xxx and') % dict(xxx=10)
'the 10 and'
>>> Template(u'the $xxx and') % dict(xxx=10)
u'the 10 and'


The ValueErrors are now more specific.  They include the line number and the
mismatched token:

>>> t = """line one
... line two
... the $@malformed token
... line four"""
>>> Template(t) % dict()
Traceback (most recent call last):
 . . .
ValueError: Invalid placeholder on line 3:  '@malformed'


Also, the re pattern was changed just a bit to catch an important class of
language specific errors where a user may use a non-ASCII identifier. The
previous implementation would match up to the first non-ASCII character and
then return a KeyError if the abbreviated is (hopefully) found.  Now, it
returns a value error highlighting the problem identifier.  Note, we still
only accept Python identifiers but have improved error detection:

>>> t = u'Returning $ma\u00F1ana or later.'
>>> Template(t) % {}
Traceback (most recent call last):
 . . .
ValueError: Invalid placeholder on line 1:  u'ma\xf1ana'


Exercise safe substitution:

>>> SafeTemplate('$$ $name ${rank}') % dict(name='Guido', rank='BDFL')
'$ Guido BDFL'
>>> SafeTemplate('$$ $name ${rank}') % dict()
'$ $name ${rank}'
>>> SafeTemplate('$$ $@malformed ${rank}') % dict()
Traceback (most recent call last):
 . . .
ValueError: Invalid placeholder on line 1:  '@malformed'

'''



import re as _re

class Template:
    """A string class for supporting $-substitutions."""
    __slots__ = ['tstring']

    # Search for $$, $identifier, ${identifier}, and any bare $'s
    pattern = _re.compile(r"""
      \$(\$)|                       # Escape sequence of two $ signs
      \$([_a-z][_a-z0-9]*(?!\w))|   # $ and a Python identifier
      \${([_a-z][_a-z0-9]*)}|       # $ and a brace delimited identifier
      \$(\S*)                       # Catchall for ill-formed $ expressions
    """, _re.IGNORECASE | _re.VERBOSE | _re.UNICODE)
    # Pattern notes:
    #
    # The pattern for $identifier includes a negative lookahead assertion
    # to make sure that the identifier is not followed by a Unicode
    # alphanumeric character other than [_a-z0-9].  The idea is to make sure
    # not to partially match an ill-formed identifiers containing characters
    # from other alphabets.  Without the assertion the Spanish word for
    # tomorrow "ma~nana" (where ~n is 0xF1) would improperly match of "ma"
    # much to the surprise of the end-user (possibly an non-programmer).
    #
    # The catchall pattern has to come last because it captures non-space
    # characters after a dollar sign not matched by a previous group.  Those
    # captured characters make the error messages more informative.
    #
    # The substitution functions rely on the first three patterns matching
    # with a non-empty string.  If that changes, then change lines like
    # "if named" to "if named is not None".

    def __init__(self, tstring):
        self.tstring = tstring

    def __mod__(self, mapping):
        """A function for supporting $-substitutions."""
        template = self.tstring
        def convert(mo):
            escaped, named, braced, catchall = mo.groups()
            if named or braced:
                return '%s' % mapping[named or braced]
            elif escaped:
                return '$'
            lineno = len(template[:mo.start(4)].splitlines())
            raise ValueError('Invalid placeholder on line %d:  %r' %
                             (lineno, catchall))
        return self.pattern.sub(convert, template)

class SafeTemplate(Template):
    """A string class for supporting $-substitutions.

    This class is 'safe' in the sense that you will never get KeyErrors if
    there are placeholders missing from the interpolation dictionary.  In that
    case, you will get the original placeholder in the value string.
    """
    __slots__ = ['tstring']

    def __mod__(self, mapping):
        """A function for $-substitutions.

        This function is 'safe' in the sense that you will never get KeyErrors if
        there are placeholders missing from the interpolation dictionary.  In that
        case, you will get the original placeholder in the value string.
        """
        template = self.tstring
        def convert(mo):
            escaped, named, braced, catchall = mo.groups()
            if named:
                try:
                    return '%s' % mapping[named]
                except KeyError:
                    return '$' + named
            elif braced:
                try:
                    return '%s' % mapping[braced]
                except KeyError:
                    return '${' + braced + '}'
            elif escaped:
                return '$'
            lineno = len(template[:mo.start(4)].splitlines())
            raise ValueError('Invalid placeholder on line %d:  %r' %
                             (lineno, catchall))
        return self.pattern.sub(convert, template)

del _re

if __name__ == '__main__':
    import doctest
    print 'Doctest results: ', doctest.testmod()



More information about the Python-checkins mailing list