[Python-Dev] Re: *Simpler* string substitutions

François Pinard pinard@iro.umontreal.ca
20 Jun 2002 19:40:23 -0400


[Guido van Rossum]

> [...] All options are still open.

Thanks, Guido, for the synthesis of a summary of various avenues.

These two points are worth underlining:

1) let's not add $ while keeping %.  [...] having both in the language, but
   only if % is reduced to the positional version

2) the necessary parsing could (should?) be done at compile time.

Here are other comments, some of which are related to internationalisation.

>       return "The sum of " + str(x) + " and " + str(y) + " is " + str(x+y)
>       return i("The sum of ", x, " and ", y, " is ", x+y)
>       print "The sum of", x, "and", y, "is", x+y

> Note that the print version is the shortest, and IMO also the easiest
> to read.

These are good for quick programs, and `print' is good for debugging.
But they are less appropriate whenever internationalisation is in the
picture, because it is more handy and precise for translators to handle
wider context at once, than individual sentence fragments.

> [...] % interpolation (with two variants: positional and by-name).

The advantage of by-name interpolation for internationalisation is the
flexibility it gives for translators to reorganise the inserts.

>       return "The sum of `x` and `y` is `x+y`"
>       return "The sum of $x and $y is $(x+y)"
>       return "The sum of $x and $y is [x+y]"

Those three above might be a little too magical for Python.  Python does
not ought to have interpolation on all double-quoted strings like shells
or Perl (and it should probably avoid deciding interpolability on the
delimiter being a single or double quote, even if shells or Perl do this).

>       return "The sum of \(x) and \(y) is \(x+y)"
>       return "The sum of \$x and \$y is \$(x+y)"
>       return e"The sum of $x and $y is $(x+y)"

> [...] I still like plain $ with something to tag the string as an
> interpolation best.

Those three are interesting, because they build on the escape syntax,
or prefix letters, which Python already has.  All these notations would
naturally accept `ur' prefix letters.  The shortest notation in the above is
the third, using the `e' prefix, because this is the one requiring the least
number of supplementary characters per interpolation.  This is really a big
advantage.  (A detail about the letter `e': is it the best letter to use?)

I also like the hidden suggestion that round parentheses are more readable
than braces, something that was already granted in Python through the
current %-by-name syntax.  In fact, `${name}' would be more acceptable
if Python also got at the same time `$(name)' as equivalent, and _also_
`%{name}format' as equivalent for %(name)format'.  The simplest is surely
to avoid braces completely, not introducing them.

As long as Python does not fully get rid of `%', I wonder if the last two
examples above could not be rewritten:

       return "The sum of \%x and \%y is \%(x+y)"
       return e"The sum of %x and %y is %(x+y)"

That would avoid introducing `$' while we already have `%'.  On the other
hand, it might be confusing to overload `%' too much, if one want to mix
everything like in:

       return e"The sum of %x and %y is %%d" % (x+y)

This is debatable, and delicate.  Users already have to deal with how to
quote `\' and `%'.  Having to deal with `$' as well, in all combinations and
exceptional cases, makes a lot of things to consider.  Most of us easily
write shell scripts, yet we have difficulty to properly write or decipher
a shell line using many quoting devices at once.  Python is progressively
climbing the same road.  It should stay simpler, all considered.

But I think the main problem in all these suggestions is how they interact
with internationalisation.  Surely:

       return _(e"The sum of %x and %y is %(x+y)")

cannot be right.  Interpolation has to be delayed to after translation, not
before, because you agree that translators just cannot produce a translation
for all possible inserts.  I do not know what the solution is, and what kind
of elegant magic may be invented to yield programmers all the flexibility
they still need in that area.  It is worth a good thought, and we should
not rush into a decision before this aspect has been carefully analysed.
If other PEPs are necessary for addressing interactions between interpolation
and translation, these PEPs should be fully resolved before or concurrently
with the PEP on interpolation, and not pictured as independent issues.

> [...]  There would be very little overlap in use cases: % always
> requires you to specify explicit values, while $ is always % followed
> by a variable name.

Yes, the suggestion of using `$(name:format)', whenever needed, is a good
one that should be retained, maybe as `%(name:format)', or maybe with `$'.
It means that the overlap would not be so little, after all.

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard