Draft PEP: string interpolation with backquotes

Oren Tirosh oren-py-l at hishome.net
Sun Dec 2 08:06:26 EST 2001


PEP: XXXX
Title: String interpolation with backquotes
Author: oren at hishome.net (Oren Tirosh)
Created: 2-Dec-2001


Abstract

    This document proposes a string interpolation feature for Python
    to allow easier string formatting.  The suggested syntax change is
    the introduction of a new 'i' prefix for strings that triggers the
    special interpretation of the backquote "`" character within a
    string.

    Example:

        i"X=`x`, Y=`calc_y(x)`."

Copyright

    This document is in the public domain.

Specification

    A new character prefix "i" is defined for strings.  This prefix
    precedes the "u" and "r" prefixes, if present.  The prefix "i"
    stands for "interpolation" or "in-line".  Within a string with an
    "i" prefix an expression enclosed in backquotes is converted into
    its string representation and embedded into the string. An empty
    interpolation ("``") is not allowed. The expression may be any 
    valid Python expression not containing the backquote character.  
    Since the backquote character may be replaced with the repr() 
    function this does not present any actual limitation on embedded
    expressions. 

Rationale

    A similar proposal was made by Marnix Klooster in a python-list
    posting [1] without the "i" prefix.  Marnix noted that this is
    the way it is done in Python's ancestor ABC from which it inherits
    many features and design decisions.

    The most apparent difference between this proposal and a previous
    proposal by Ka-Ping Yee [2] is the use of backquotes rather than
    the '$' character.  Backquotes are familiar to Pythoneers as
    equivalent to the repr() function whereas the $ notation is alien 
    to Python. With backquotes there is only one interpolation format
    with simple rules compared to $ interpolation which uses a tricky
    algorithm to detect the end of the interpolation or the use of an
    alterative format with braces.

    A more significant but less apparent difference is that with this
    proposal the embedded expressions are not characters in a string -
    they are real Python expressions compiled into byte code and the
    validity of the interpolation syntax and the syntax of embedded
    expressions is fully checked at compile-time.

    This design does not sneak runtime evaluation or lazy evaluation
    into the language in the back door.  To get lazy-evaluated string
    interpolation the programmer may explicitly use a lambda function 
    consisting of a single interpolated string.

Implementation notes

    Most of the logic of this proposal is in the tokenizer. The
    example above is broken down into the following tokens:

    <i"X=`>  - INTERPOLATE
    <x>      - NAME
    <`, Y=`> - INTFRAG (interpolation fragment)
    <calc_y> - NAME    
    <(>      - LPAR
    <x>      - NAME
    <)>      - RPAR
    <`.">    - DETERPOLATE

    An INTERPOLATE token instructs the compiler to start a tuple. An
    INTFRAG is similar to a comma separating items in a tuple and the
    DETERPOLATE token terminates the tuple.

    The code generated by this form of interpolation may be the same
    as that generated by the "%" operator.  The INTERPOLATE, INTFRAG,
    and DETERPOLATE tokens are concatenated together, any "%"
    characters in the string are replaced with "%%", the now-empty
    backquotes "``" are replaces with "%s".  Finally, the "%" operator
    is applied to the resulting string and the tuple.

    Correctly generating INTFRAG and DETERPOLATE tokens requires some
    stateful logic in the tokenizer. If the INTERPOLATE token started
    with a single quote an INTFRAG may not contain and unescaped
    single quote and DETERPOLATE ends with a single quote.  Similar
    rules apply to interpolations with double quotes, triple single
    quotes and triple double quotes.

        State 0 - Default state. A backquote is a single character
            BACKQUOTE token.

        State 1 - backquote starts an INTFRAG or DETERPOLATE that ends
            with "'".

        State 2 - backquote starts an INTFRAG or DETERPOLATE that ends
            with '"'.

        State 3 - backquote starts an INTFRAG or DETERPOLATE that ends
            with "'''".

        State 4 - backquote starts an INTFRAG or DETERPOLATE that ends
            with '"""'.

Reference implementation

    A reference implementation in the form of a preprocessor for Python
    sources files is available at:

        http://www.tothink.com/python/interpp

    This prepreocessor is based on a modified version of Ka-Ping Yee's
    tokenize.py module.

Security

    String interpolation involving actual run-time parsing of a string
    opens many potential security holes.  This form of interpolation should
    be secure against this class of attacks.
    
References

    [1] 1996/11/07 python-list posting (Marnix Klooster)
        http://groups.google.com/groups?group=comp.lang.python
         &selm=328195a1.1211700%40news.worldonline.nl

    [2] PEP 215, String Interpolation (Ka-Ping Yee)
        http://www.python.org/peps/pep-0215.html





More information about the Python-list mailing list