[Python-Dev] *Simpler* string substitutions

Guido van Rossum guido@python.org
Thu, 20 Jun 2002 17:21:24 -0400


[Paul]
> We will never come to a solution unless we agree on what, if any,
> the problem is.

[...eloquent argument, ending in...]

> But I am against adding "$" if half of Python programmers are going
> to use that and half are going to use %. $ needs to be a
> replacement. There should be one obvious way to solve simple
> problems like this, not two. I am also against adding it as a
> useless function buried in a module that nobody will bother to
> import.

Well argued.  Alex said roughly the same thing: let's not add $
while keeping %.

Adding a function for $-interpolation to a module would certainly help
some projects (like web templating) from reinventing the wheel -- but
/F has shown that this particular wheel isn't hard to recreate.  I
would certainly recommend any project that offers substitution in
templates that are edited by non-programmers to use the $-based syntax
from Barry's PEP rather than Python's %(name)s syntax.  (In particular
I hope Python's i18n projects will use $ interpolation.)

Oren made a good point that Paul emphasized: the most common use case
needs interpolation from the current namespace in a string literal,
and expressions would be handy.  Oren also made the point that the
necessary parsing could (should?) be done at compile time.

We currently already have many ways to do this:

- In some cases print is appropriate:

  def f(x, t):
      print "The sum of", x, "and", y, "is", x+y

- You can use string concatenation:

  def f(x, y):
      return "The sum of " + str(x) + " and " + str(y) + " is " + str(x+y)

- You can use % interpolation (with two variants: positional and
  by-name).  A problem is that you have to specify an explicit tuple
  or dict of values.

  def f(x, y):
      return "The sum of %s and %s is %s" % (x, y, x+y)

Note that the print version is the shortest, and IMO also the easiest
to read.  (Though some people might disagree and prefer the % version
because it separates the template from the data; it's not much
longer.)

- You could have an interpolation helper function:

  def i(*a):
      return "".join(map(str, a))

  so you could write this:

  def f(x, y):
      return i("The sum of ", x, " and ", y, " is ", x+y)

This comes closer in length to the print version.

IMO the attraction of the $ version is that it reduces the amount of
punctuation so that it becomes even shorter and clearer.  While I said
"shorter" several times above when comparing styles, I really meant
that as a shorthand for "shorter and clearer".  Even the print example
suffers from the fact that every interpolated value is separated from
the surrounding template by a comma and a string quote on both sides
-- that's a lot of visual clutter (not to mention stuff to type).

Maybe in Python 3.0 we will be able to write:

  def f(x, y):
      return "The sum of $x and $y is $(x+y)"

To me, it's a toss-up whether this looks better or worse than the ABC
version:

  def f(x, y):
      return "The sum of `x` and `y` is `x+y`"

but I do know that backticks have a poor reputation for being hard to
find on the keyboard (newbies don't even know they have it), hard to
distinguish in some fonts, and publishers often turn 'foo' into `foo',
making it hard to publish accurate documentation.  I think on some
European keyboards ` is a dead key, making it even harder to type.
Additionally, it's a symmetric operator, which makes it harder to
parse complex examples.

Now, how to get there (or somewhere similar) in Python 2.3?

PEP 215 solves it by using (yet) another string prefix character.  It
uses $, which to me looks a bit ugly; in this thread, someone proposed
using e, so you can do:

  def f(x, y):
      return e"The sum of $x and $y is $(x+y)"

That looks OK to me, especially if it can be combined with u and r to
create unicode and raw strings.

There are other possibilities:

  def f(x, y):
      return "The sum of \$x and \$y is \$(x+y)"

Alas, it's not 100% backwards compatible, and the \$ looks pretty bad.

Another one:

  def f(x, y):
      return "The sum of \(x) and \(y) is \(x+y)"

Still not 100% compatible, looks perhaps a bit better, but notice how
now every interpolation needs three punctuation characters: almost as
many as the print example.

Assuming that interpolating simple variables is relatively common, I
still like plain $ with something to tag the string as an
interpolation best.

PEP 292 is an attempt to do this *without* involving the parser:

  def f(x, y):
      return "The sum of $x and $y is $(x+y)".sub()

Downsides are that it invites using non-literals as formats, with all
the security aspects, and that its parsing happens at run-time (no big
deal IMO).

Now back to $ vs. %.  I think I can defend having both in the
language, but only if % is reduced to the positional version (classic
printf).  This would be used mostly to format numerical data with
fixed column width.  There would be very little overlap in use cases:
% always requires you to specify explicit values, while $ is always
% followed by a variable name.

(Yet another variant is from Tcl, which uses $variable but also
[expression].  In Python 3.0 this would become:

  def f(x, y):
      return "The sum of $x and $y is [x+y]"

But now you have three characters that need quoting, and we might as
well use \$ to quote a literal $ instead of $$.)

All options are still open.

--Guido van Rossum (home page: http://www.python.org/~guido/)