
[Barry A. Warsaw]
This PEP describes a simpler string substitution feature, also known as string interpolation. This PEP is "simpler" in two respects:
1. Python's current string substitution feature (commonly known as %-substitutions) is complicated and error prone. This PEP is simpler at the cost of less expressiveness.
2. PEP 215 proposed an alternative string interpolation feature, introducing a new `$' string prefix. PEP 292 is simpler than this because it involves no syntax changes and has much simpler rules for what substitutions can occur in the string.
For one, I do not like seeing `$' as a string prefix in Python, and wonder if we could not merely go with `%' as we always did in Python. At least, it keeps a kind of clear cut distance between Python and Perl. :-)
In addition, the rules for what can follow a % sign are fairly complex, while the usual application rarely needs such complexity.
This premise seems exaggerated to me. `%' as it stands is not that complex to understand. Moreover, many of us use `%' formatting a lot, so it is not so rare that the current `%' specification is useful.
1. $$ is an escape; it is replaced with a single $
Let's suppose we stick with `%', the above rule reduces to something already known.
3. ${identifier} [...]
We could use %{identifier} as meaning `%(identifier)s'. Clean. Simple.
2. $identifier [...]
This is where the difficulty lies. Since the PEP already suggests that ${identifier} was to be preferred over $identifier, why not just go a bit forward, and drop 2. altogether? Or else, how do you justify that using it really make things more legible?
Then, the whole proposal would reduce to adding %{identifier}, and instead of having `.sub()' methods or whatever, just stick with what we already have.
This would be a mild change instead of a whole new feature, and keep Python a little more wrapped to itself. Interpolation proposals I've seen always looked a bit awkward and foreign so far.
I guess that merely adding %{identifier} would wholly satisfy the given justifications for the PEP (that is, giving a mean for people to avoid the %()s as error prone), with a minimal impact on the current Python definition, and a bit less of a surprise. Python does not have to look like Perl to be useful, you know! :-)
Handling Missing Keys
This would be a non-issue, by the fact that %(identifier)s behaviour, for undefined identifier, is already what we want.
The mapping argument is optional; if it is omitted then the mapping is taken from the locals and globals of the context in which the .sub() method is executed.
This is an interesting idea. However, there are other contexts where the concept of a compound dictionary of all globals and locals would be useful. Maybe we could have some allvars() similar to globals() and locals(), and use `... % allvars()' instead of `.sub()'? So this would serve both string interpolation and other avenues.
I hope I succeed to express my feeling that we should try keeping string interpolation rather natural with what Python already is. We should not carelessly multiply paradigms.

For one, I do not like seeing `$' as a string prefix in Python, and wonder if we could not merely go with `%' as we always did in Python. At least, it keeps a kind of clear cut distance between Python and Perl. :-)
The $ means "substitution" in so many languages besides Perl that I wonder where you've been.
In addition, the rules for what can follow a % sign are fairly complex, while the usual application rarely needs such complexity.
This premise seems exaggerated to me. `%' as it stands is not that complex to understand. Moreover, many of us use `%' formatting a lot, so it is not so rare that the current `%' specification is useful.
I quite like the positional % substitution. I think %(...)s was a mistake -- what we really wanted was ${...}.
1. $$ is an escape; it is replaced with a single $
Let's suppose we stick with `%', the above rule reduces to something already known.
3. ${identifier} [...]
We could use %{identifier} as meaning `%(identifier)s'. Clean. Simple.
Confusing. The visual difference between () and {} is too small.
2. $identifier [...]
This is where the difficulty lies. Since the PEP already suggests that ${identifier} was to be preferred over $identifier, why not just go a bit forward, and drop 2. altogether? Or else, how do you justify that using it really make things more legible?
Less clutter. Compare
"My name is $name, I live in $country"
to
"My name is ${name}, I live in ${country}"
The {} add nothing but noise. We're copying this from the shell.
--Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido van Rossum]
The $ means "substitution" in so many languages besides Perl that I wonder where you've been.
Of course, I've been elsewhere. But Python currently uses `%' for driving interpolation, and on this topic, I've been with Python, if you wonder :-).
I quite like the positional % substitution. I think %(...)s was a mistake -- what we really wanted was ${...}.
The distinction between %()s and %()r, recently introduced, has been useful. But with str() and repr(), only one of those is really necessary. But it gave the impression that Python trend is pushing for % to get stronger. The proposal of using $ as yet another formatting avenue makes it weaker.
Less clutter. Compare
"My name is $name, I live in $country"
to
"My name is ${name}, I live in ${country}"
The {} add nothing but noise. We're copying this from the shell.
Noise decreases legibility. So, maybe the PEP should not say that ${name} is to be preferred over $name? Or else, it should explain why.

The distinction between %()s and %()r, recently introduced, has been useful. But with str() and repr(), only one of those is really necessary. But it gave the impression that Python trend is pushing for % to get stronger. The proposal of using $ as yet another formatting avenue makes it weaker.
Language evolution doesn't always go into a straight line.
Less clutter. Compare
"My name is $name, I live in $country"
to
"My name is ${name}, I live in ${country}"
The {} add nothing but noise. We're copying this from the shell.
Noise decreases legibility. So, maybe the PEP should not say that ${name} is to be preferred over $name? Or else, it should explain why.
I agree that I see no reason to prefer ${name} (except when followed by another word character of course).
--Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido van Rossum]
I quite like the positional % substitution. I think %(...)s was a mistake -- what we really wanted was ${...}.
What is the advantage of curly braces over parens in this context?
+1 on the allvars() suggestion also.
-- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien -----------------------------------------------

I quite like the positional % substitution. I think %(...)s was a mistake -- what we really wanted was ${...}.
What is the advantage of curly braces over parens in this context?
Apart from Make, most $ substituters use ${...}, not $(...).
+1 on the allvars() suggestion also.
I have no idea what you are talking about. :-(
--Guido van Rossum (home page: http://www.python.org/~guido/)

"GvR" == Guido van Rossum guido@python.org writes:
GvR> Apart from Make, most $ substituters use ${...}, not $(...).
GNU Make allows either braces or parentheses; there's no difference between the two. So it's a pretty strong precedent in lots of Unix tools. GNU Make also uses the $$ escape.
-Barry

[Guido van Rossum]
+1 on the allvars() suggestion also.
I have no idea what you are talking about. :-(
François Pinard made the following suggestion and I think something along the lines of allvars() would be very handy, especially with the html stuff I've been doing lately:
This is an interesting idea. However, there are other contexts where the concept of a compound dictionary of all globals and locals would be useful. Maybe we could have some allvars() similar to globals() and locals(), and use `... % allvars()' instead of `.sub()'? So this would serve both string interpolation and other avenues.
-- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien -----------------------------------------------

[Guido van Rossum]
I quite like the positional % substitution. I think %(...)s was a mistake -- what we really wanted was ${...}.
What is the advantage of curly braces over parens in this context?
Apart from Make, most $ substituters use ${...}, not $(...).
I guess what I was really wondering is whether that advantage clearly outways some of the possible disadvantages. I'm not a fan of curly braces and I'll be sad to see more of them in Python. There's something refreshing about only having curly braces for dictionaries and parens everywhere else. And since the exisiting string substitution uses parens why shouldn't the new?
It won't surprise me that you've already considered all this and are fine with using curly braces here, but I just had to ask before it is a done deal. (And I promise I won't go on a boolean crusade and predict that curly braces will appear everywhere to the demise of the language. <wink>)
-- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien -----------------------------------------------

I'm not a fan of curly braces and I'll be sad to see more of them in Python.
This seems more emotional than anything else.
--Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido van Rossum]
I'm not a fan of curly braces and I'll be sad to see more of them in Python.
This seems more emotional than anything else.
Definitely. And habit. Since I program mostly in Python I'm used to {} meaning dictionary and I'm used to typing parens everywhere else. Others who are used to ${} for string substitution in other contexts will be happy that you copied that syntax. I'm just trying to see if there is anything more substantial involved. Sounds like there isn't. And that's fine. I'll adapt. :-)
-- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python software development." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien -----------------------------------------------

"PKO" == Patrick K O'Brien pobrien@orbtech.com writes:
PKO> I guess what I was really wondering is whether that advantage PKO> clearly outways some of the possible disadvantages. I'm not a PKO> fan of curly braces and I'll be sad to see more of them in PKO> Python. There's something refreshing about only having curly PKO> braces for dictionaries and parens everywhere else. And PKO> since the exisiting string substitution uses parens why PKO> shouldn't the new?
Personally, I wouldn't mind it if this syntax took a cue from the make program and accepted both $(name) and ${name} as alternatives to $name (with nested parenthesis/brace matching).
-Barry

Patrick K. O'Brien wrote:
[Guido van Rossum]
I quite like the positional % substitution. I think %(...)s was a mistake -- what we really wanted was ${...}.
What is the advantage of curly braces over parens in this context?
It unambiguously spells that there is no format suffix char.
+1 on the allvars() suggestion also.
me too.

Christian,
you seem to be contradicting yourself. First:
[someone]
+1 on the allvars() suggestion also.
[Christian]
me too.
and later:
[Christian]
The following statements are ordered by increasing hate. 1 - I do hate the idea of introducing a "$" sign at all. 2 - giving "$" special meaning in strings via a module 3 - doing it as a builtin function 4 - allowing it to address local/global variables
Doesn't 4 contradict your +1 on allvars()?
--Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Christian,
you seem to be contradicting yourself. First:
[someone]
+1 on the allvars() suggestion also.
[Christian]
me too.
and later:
[Christian]
The following statements are ordered by increasing hate. 1 - I do hate the idea of introducing a "$" sign at all. 2 - giving "$" special meaning in strings via a module 3 - doing it as a builtin function 4 - allowing it to address local/global variables
Doesn't 4 contradict your +1 on allvars()?
By no means. allvars() is something like locals() or globals(), just an explicit way to produce a dictionary of variables.
What I want to preserve is the distinction between arbitrary "%(name)s" or maybe "${name}" names and my local variables. Using locals() or allvars(), I can decide to *feed* the formatting expression with variable names. But the implementation of .sub() should not know anything about variables, the same way as % doesn't know about variables. Formatting is "by value", IMHO.
Furthermore I'd like to thank Alex for his opinions, additions and adjustments to my post. I have to say that I always *am* emotional with such stuff, although I'm trying hard not to. But he hits the nail's head more than I.
cheers - chris

"CT" == Christian Tismer tismer@tismer.com writes:
CT> By no means. allvars() is something like locals() or CT> globals(), just an explicit way to produce a dictionary of CT> variables.
I'd be ok with something like allvars() and requiring a dictionary to the .sub() method, /if/ allvars() were a method on a frame object. I really, really do want to write in my i18n programs:
def whereBorn(name): country = countryOfOrigin(name) return _('$name was born in $country')
I'd be fine if the definition of _() could reach into the frame of whereBorn() and give me a list of all variables, including ones in nested scopes. Actually, that'd be a lot better than what I do now (although truth be told, losing access to nested scoped variables is only a hypothetical limitation in the code I've written).
The feature would be useless to me if I had to pass some explicit dictionary into the _() method. It makes writing i18n code extremely tedious. Invariably, the unsafeness of an implicit dictionary happens when strings come from untrusted sources, and your .py file can't be considered untrusted. In those cases, creating an explicit dictionary for interpolation is fine, but they also tend not to overlap with i18n much.
-Barry

Barry A. Warsaw:
def whereBorn(name):
country = countryOfOrigin(name) return _('$name was born in $country') ... The feature would be useless to me if I had to pass some explicit dictionary into the _() method. It makes writing i18n code extremely tedious.
I think you are overstating the problem here. The explicit bindings are a small increase over your current code as you are already creating an extra variable just to use the automatic binding. With explicit bindings:
def whereBorn(name): return _('$name was born in $country', name=name, country=countryOfOrigin(name))
The protection provided is not just against untrustworthy translaters but also allows checking the initial language code. You can ensure all the interpolations are provided with values and all the provided values are used. It avoids exposing implementation details such as the names of local variables and can ensure that a more meaningful identifier in the local context of the string is available to the translator. For example, I may have some code that processes a command line argument which has multiple uses on different execution paths: _('$moduleName already exists', moduleName = arg) _('$searchString can not be found', searchString = arg)
Not making bindings explicit may mean that translators use other variables available at the translation point leading to unexpected failures when internal details are changed.
Neil

"NH" == Neil Hodgson nhodgson@bigpond.net.au writes:
>> The feature would be useless to me if I had to >> pass some explicit dictionary into the _() method. It makes >> writing i18n code extremely tedious.
NH> I think you are overstating the problem here.
Trust me, I'm not. Then again, maybe it's just me, or my limited experience w/ i18n'd source code, but being forced to pass in the explicit bindings is a big burden in terms of maintainability and readability.
NH> The explicit bindings are a small increase over your current NH> code as you are already creating an extra variable just to use NH> the automatic binding. With explicit bindings:
NH> def whereBorn(name): | return _('$name was born in $country', | name=name, country=countryOfOrigin(name))
More often then not, you already have the values you want to interpolate sitting in local variables for other uses inside the function. Notice how you've written `name' 5 times there? Try that with every other line of code and see if it doesn't get tedious. ;)
NH> The protection provided is not just against untrustworthy NH> translaters but also allows checking the initial language NH> code. You can ensure all the interpolations are provided with NH> values and all the provided values are used.
Yes, you could do that. Note that the actual interpolation function /does/ have access to a dictionary, it might have more stuff than you want (making the second check impossible), but the first check could be done.
NH> It avoids exposing implementation details such as the names of NH> local variables
This isn't an issue from a security concern, if the code is open source. And you should be picking meaningful local variable names anyway! Mine tend to be stuff like `subject', `listname', `realname'. I've yet to get a question about the meaning of an interpolation variable.
Actually, translators really need access to the source code anyway, and .po files usually contain references to the file and line number of the source string, and po-mode makes it easy for translators to locate the context and the purpose of the translation.
NH> and can ensure that a more meaningful identifier in the local NH> context of the string is available to the translator. For NH> example, I may have some code that processes a command line NH> argument which has multiple uses on different execution paths: NH> _('$moduleName already exists', moduleName = arg) NH> _('$searchString can not be found', searchString = arg)
+1 on using explicit bindings or a dictionary when it improves clarity!
NH> Not making bindings explicit may mean that translators use NH> other variables available at the translation point leading to NH> unexpected failures when internal details are changed.
I18n'ing a program means you have to worry about a lot more things. If some local variable changed, I'd consider using an explicit binding to preserve the original source string, a change to which would force updated translations. Then again, you tend to get paranoid about changing /any/ source string, say to remove a comma, adjust whitespace, or fix a preposition. Any change means a dozen language teams have a new message they must translate (unless you can mechanically fix them for them).
Another i18n approach altogether uses explicit message ids instead of using the source string as the implicit message id, but that has a whole 'nuther set of issues.
multi-lingual-ly y'rs, -Barry

[Barry A. Warsaw]
"NH" == Neil Hodgson nhodgson@bigpond.net.au
Another i18n approach altogether uses explicit message ids instead of using the source string as the implicit message id, but that has a whole 'nuther set of issues.
The `catgets' approach, by opposition to the `gettext' approach. I've seen some people having religious feelings in either direction.
Roughly said, `catgets' is faster, as you directly index the translation string without having to hash the original string first. It is also easier to translate single words or strings offering little translation context, as English ambiguities are resolved by using different message ids for the same text fragment.
On the other hand, `gettext' can be made nearly as fast as `catgets', only _if_ we use efficient hashing combined with proper caching. But the real advantage of `gettext' is that internationalised sources are more legible and easier to maintain, since the original string is shown in clear exactly where it is meant to be used.
A problem with both is that implementations bundled in various systems are often weak of bugged, provided they exist of course. Portability is notoriously difficult. Linux and GNU `gettext' rate rather nicely. But nothing is perfect.
[...] you tend to get paranoid about changing /any/ source string, say to remove a comma, adjust whitespace, or fix a preposition. Any change means a dozen language teams have a new message they must translate (unless you can mechanically fix them for them).
This is why the responsibilities between maintainers and programmers ought to be well split. If the maintainer feels responsible for the work that is induced on the translation teams by string changes, comfort is lost. The maintainer should do its work in all freedom, and the problem of later reflecting tiny editorial changes into PO `msgstr' fully pertains to translators, with the possible help of automatic tools. Translators should be prepared to such changes. If the split of responsibilities is not fully understood and accepted, internationalisation becomes much heavier, in practice, than it has to be.
>> The feature would be useless to me if I had to pass some explicit >> dictionary into the _() method. It makes writing i18n code >> extremely tedious.
NH> I think you are overstating the problem here.
Trust me, I'm not. [...] being forced to pass in the explicit bindings is a big burden in terms of maintainability and readability.
NH> Not making bindings explicit may mean that translators use NH> other variables available at the translation point leading to NH> unexpected failures when internal details are changed.
I18n'ing a program means you have to worry about a lot more things. [...]
Internationalisation should not add a significant burden on the programmer. I mean, if there is something cumbersome in the internationalisation of a string, then there is something cumbersome in that string outside any internationalisation context.
If internationalisation really adds a significant burden, this is a signal that internationalisation has not been implemented well enough in the underlying language, or else, that it is not getting used correctly. I really think that internationalising of strings should be designed so it is a light activity and negligible burden for the maintainer. (And of course, translators should also get help in form of proper files and tools.)

"FP" == François Pinard pinard@iro.umontreal.ca writes:
FP> This is why the responsibilities between maintainers and FP> programmers ought to be well split. If the maintainer feels FP> responsible for the work that is induced on the translation FP> teams by string changes, comfort is lost. The maintainer FP> should do its work in all freedom, and the problem of later FP> reflecting tiny editorial changes into PO `msgstr' fully FP> pertains to translators, with the possible help of automatic FP> tools. Translators should be prepared to such changes. If FP> the split of responsibilities is not fully understood and FP> accepted, internationalisation becomes much heavier, in FP> practice, than it has to be.
Unfortunately, sometimes one person has to wear both hats and then we see the tension between the roles.
>> I18n'ing a program means you have to worry about a lot more >> things. [...]
FP> Internationalisation should not add a significant burden on FP> the programmer. I mean, if there is something cumbersome in FP> the internationalisation of a string, then there is something FP> cumbersome in that string outside any internationalisation FP> context.
It may not be a significant burden, once the infrastructure is in place and a rhythm is established, but it is still not non-zero. Little issues crop up all the time, like the fact that a message might have the same English phrase but need to be distinguished for proper translation in some other languages (gettext vs. catgets), or that the translation is slightly different depending on where the message is output (email, web, console), or dealing with localized formatting of numbers, dates, and other values. It's just stuff you have to keep in mind and deal with, but it's not insurmountable.
I think the current Python tools for i18n'ing are pretty good, and the bright side is that I'd still rather be developing an i18n'd program in Python than in just about any other language. One area that I think we could do better in is in support of localizing dates, currency, etc. Here, Stephan Richter is laying some groundwork in the Zope3 I18n project, possibly integrating IBM's ICU library into Python.
-Barry

I'm pretty negative on string interpolation, I don't see it as that useful or %()s as that bad. But obviously, many others do feel there is a problem.
I don't like the schism that $ vs. % would create. Nor do I like many other proposals. So here is yet another proposal:
* Add new builtin function interp() or some other name: def interp(format, uselocals=True, useglobals=True, dict={}, **kw) * use % as the format character and allow optional () or {} around the name * if this is acceptable, {name:format_modifiers} could be added in the future
Code would then look like this:
>>> x = 5 >>> print interp('x = %x') x = 5 >>> print interp('x = %(x)') x = 5 >>> print interp('x = %{x}') x = 5 >>> print interp('y = %y') NameError: name 'y' is not defined >>> print interp('y = %y', dict={'y': 10}) y = 10 >>> print interp('y = %y', y=10) y = 10
This form: * eliminates any hint of $ * is similar to current % handling, but hopefully fixes the current deficiencies * allows locals and/or globals to be used * allows any dictionary/mapping to be used * allows keywords * is extensible to allow for formatting in the future * doesn't require much extra typing or thought
Now I'm sure everyone will tell me how awful this is. :-)
Neal
PS I'm -0 on this proposal. And I dislike the name interp.

[Barry A. Warsaw]
FP> This is why the responsibilities between maintainers and FP> programmers ought to be well split.
Unfortunately, sometimes one person has to wear both hats and then we see the tension between the roles.
I have the same experience, having been for a good while the assigned French translator for the packages I was maintaining. But I was splitting my roles rather carefully, with the precise purpose of seeing where were lying tensions and problems, and then work at improving how interactions go between involved parties.
>> I18n'ing a program means you have to worry about a lot more >> things. [...]
FP> Internationalisation should not add a significant burden on FP> the programmer.
It may not be a significant burden, once the infrastructure is in place and a rhythm is established, but it is still not non-zero.
The Mailman effort has been especially courageous, as it ought to address many problems on which we did not accumulate much experience yet, but which are inescapable in the long run. For example, I guess you had to take care of translating external HTML templates, considering some input aspects, allowing on-the-fly language selection, and of course, looking into more prosaic non-message "locale" concerns.

"FP" == François Pinard pinard@iro.umontreal.ca writes:
>> It may not be a significant burden, once the infrastructure is >> in place and a rhythm is established, but it is still not >> non-zero.
FP> The Mailman effort has been especially courageous, as it ought FP> to address many problems on which we did not accumulate much FP> experience yet, but which are inescapable in the long run. FP> For example, I guess you had to take care of translating FP> external HTML templates, considering some input aspects, FP> allowing on-the-fly language selection, and of course, looking FP> into more prosaic non-message "locale" concerns.
Thanks, I think it's been valuable experience -- I certainly have learned a lot!
One of the most painful areas has in fact been the translating of HTML templates specifically because a template file is far too coarse a granularity. When I want to add a new widget to a template, I can usually figure out where to add it in say, the Spanish or French version, but it's nearly hopeless to try to add it to the Japanese version. :)
Here, I hope Fred, Stephan Richter, and my efforts at i18n'ing Zope3's Page Templates will greatly improve things. It's early going but it feels right. It would mean you essentially have one version of the template but you'd mark it up to designate the translatable messages, and I think you'd end up integrating those with your Python source catalogs (but maybe in a different domain?). I'm not quite sure how that would translate to plaintext templates (e.g. for email messages).
Input aspects are something neither MM nor Zope has (yet) adequately addressed. What I'm thinking of here are message footers in multiple languages or say, a job description in multiple languages. We'll have to address these down the road.
I've already mentioned about efforts in Zopeland for localizing non-message issues. On-the-fly language selection is something that I have had to deal with in MM, and Python's class-based gettext API is essential here, and works well. Zope3 and MM take slightly different u/i tacks, with Zope3 doing better browser language negotiation and MM allowing for explicit overrides in forms. Some combination of the two is probably where web-based applications want to head.
now-to-make-time-to-finish-MM2.1-ly y'rs, -Barry

[Doh! Forgot to send to the list as well - shouldn't try to use a computer when I have a cold]
Barry A. Warsaw:
Trust me, I'm not. Then again, maybe it's just me, or my limited experience w/ i18n'd source code, but being forced to pass in the explicit bindings is a big burden in terms of maintainability and readability.
My main experience in internationalization has been in GUI apps where there is often a strong separation between the localizable static text and the variable text. In dialogs you often have:
Static localized description: [Editable variable]
In my editor SciTE, which currently has about 15 translations, of the 177 localizable strings, only 9 are messages that require insertion of variables and all of those require only one variable. Most of the strings are menu or dialog items. Maybe I'm just stingy with messages :-)
On the largest sensibly internationalized project I have worked on (7 years old and with a maximum of 20 reasearch/design/develop/test staff when I left), I would estimate that less than 50 messages required variable substitution.
The amount of effort that went into ensuring that the messages were accurate, meaningful and understandable outweighed by several orders of magnitude any typing or reading work.
Neil

Neil Hodgson wrote:
...
Not making bindings explicit may mean that translators use other variables available at the translation point leading to unexpected failures when internal details are changed.
Actually, I don't think that is the case. I think that the security implications of "_" are overstated.
name = "Paul" country = "Canada" password = "jfoiejw" _('${name} was born in ${country}')
The "_" function can use a regular expression to determine that the original code used only "${name}" and "${country}". Then it can disallow access to ${password}
def _(origstring): orig_substitions = get_substitutions(origstring) translation = lookup_translation(origstring) translation_substitions = get_substitutions(translation_substitions) assert translation.substitutions == orig_substitutions
Paul Prescod

[Guido, quotes Christian]
The following statements are ordered by increasing hate. 1 - I do hate the idea of introducing a "$" sign at all. 2 - giving "$" special meaning in strings via a module 3 - doing it as a builtin function 4 - allowing it to address local/global variables
[and adds]
Doesn't 4 contradict your +1 on allvars()?
Since Christian's reply only increased the apparent contradiction, allow me to channel: they are ordered by increasing hate, but starting at the bottom. s/increasing/decreasing/ in his original, or s/hate/love/, and you can continue to read it in the top-down Dutch way <wink>.

Tim Peters wrote:
[Guido, quotes Christian]
The following statements are ordered by increasing hate. 1 - I do hate the idea of introducing a "$" sign at all. 2 - giving "$" special meaning in strings via a module 3 - doing it as a builtin function 4 - allowing it to address local/global variables
[and adds]
Doesn't 4 contradict your +1 on allvars()?
Since Christian's reply only increased the apparent contradiction, allow me to channel: they are ordered by increasing hate, but starting at the bottom. s/increasing/decreasing/ in his original, or s/hate/love/, and you can continue to read it in the top-down Dutch way <wink>.
Huh? Reading from top to bottom, as I used to, I see increasing numbers, which are in the same order as the "increasing hate" (not a linear function, but the same ordering).
4 - allowing it to address local/global variables is what I hate the most. This is in no contradiction to allvars(), which is simply a function that puts some variables into a dict, therefore deliberating the interpolation from variable access.
Where is the problem, please?

[Tim]
Since Christian's reply only increased the apparent contradiction, allow me to channel: ...
[Christian Tismer]
Huh? Reading from top to bottom, as I used to, I see increasing numbers, which are in the same order as the "increasing hate" (not a linear function, but the same ordering).
4 - allowing it to address local/global variables is what I hate the most. This is in no contradiction to allvars(), which is simply a function that puts some variables into a dict, therefore deliberating the interpolation from variable access.
Where is the problem, please?
I was warming up my awesome channeling powers for Guido's impending vacation, and all I can figure is that I must have left them parked in reverse the last time he came back. Nothing a 12-pack of Coke didn't cure, though! I channel that you'll graciously accept my apology <wink>.

Tim Peters wrote:
[Tim]
Since Christian's reply only increased the apparent contradiction, allow me to channel: ...
[Christian Tismer]
Huh? Reading from top to bottom, as I used to, I see increasing numbers, which are in the same order as the "increasing hate" (not a linear function, but the same ordering).
4 - allowing it to address local/global variables is what I hate the most. This is in no contradiction to allvars(), which is simply a function that puts some variables into a dict, therefore deliberating the interpolation from variable access.
Where is the problem, please?
I was warming up my awesome channeling powers for Guido's impending vacation, and all I can figure is that I must have left them parked in reverse the last time he came back. Nothing a 12-pack of Coke didn't cure, though! I channel that you'll graciously accept my apology <wink>.
Whow! A TPA. Will stick it next to my screen :-)
Well, the slightly twisted content of that message shaded its correct logic, maybe.
Meanwhile, I'd like to drop that hate stuff and replace it by a little reasoning:
Let's name locals/globals/whatever as "program variables".
If there are program variables directly accessible inside strings to be interpolated, then I see possible abuse, if abusers manage to supply such a string in an unforeseen way. For that reason, I wanted to enforce that an explicit dictionary has to be passed as an argument, to remind the programmer that she is responsible for providing access.
But at that time, I wasn't considering compile time string parsing. Compile time means the strings containing variable names are evaluated only once, and they behave like constants, cannot be passed in by a later intruder. That sounds pretty cool, although I don't see how this fits with I18n, which needs to change strings at runtime? Maybe it is possible to parse variable names out, replace them with some placeholders, and to do the internationalization after that, still not giving variable access to the final product.
Example (now also allowing functions):
name1 = "Felix" age1 = 17 name2 = "Hannes" age2 = 8
"My little son $name1 is $age1. $name2 is $(age2-age1) years older.".sub()
--> "My little son Felix is 8. Hannes is 9 years older."
This string might be translated under the hood into: _ipol = { x1: name1, x2: age1, x3: name2, x4: (age2-age1) }
"My little son $x1 is $x2. $x3 is $x4 years older.".sub(_ipol)
This string is now safe for further processing.
Maybe the two forms should be syntactically different, but what I mean is a compile time transformation, that removes all real variables names in the first place.
interpolation-is-by-value-not-by-name - ciao - chris

"CT" == Christian Tismer tismer@tismer.com writes:
CT> If there are program variables directly accessible inside CT> strings to be interpolated, then I see possible abuse, if CT> abusers manage to supply such a string in an unforeseen way.
For literal strings in .py files, the only way that's going to happen is if someone you don't trust is hacking your source code, /or/ if you have evil translators sneaking in bogus translation strings. The latter can be solved with a verification step over your message catalogs, while the former I leave as an exercise for the reader. :)
So still, I trust automatic interpolation of program vars for literal strings, but for strings coming from some other source (e.g. a web form), then yes, you obviously want to be explicit about the interpolation dictionary.
-Barry

Barry A. Warsaw wrote:
"CT" == Christian Tismer tismer@tismer.com writes:
CT> If there are program variables directly accessible inside CT> strings to be interpolated, then I see possible abuse, if CT> abusers manage to supply such a string in an unforeseen way.
For literal strings in .py files, the only way that's going to happen is if someone you don't trust is hacking your source code, /or/ if you have evil translators sneaking in bogus translation strings. The latter can be solved with a verification step over your message catalogs, while the former I leave as an exercise for the reader. :)
So still, I trust automatic interpolation of program vars for literal strings, but for strings coming from some other source (e.g. a web form), then yes, you obviously want to be explicit about the interpolation dictionary.
From another reply:
def whereBorn(name):
country = countryOfOrigin(name) return _('$name was born in $country')
Ok, I'm all with it. Since a couple of hours, I'm riding the following horse:
- $name, $(name), $(any expr) is just fine - all of this is compile-time stuff
The idea is: Resolve the variables at compile time. Don't provide the feature at runtime.
Here a simple approach. (I'm working on a complicated, too): (assuming the "e" character triggering expression extraction)
def whereBorn(name): country = countryOfOrigin(name) return _(e'$name was born in $country')
is accepted by the grammar, but turned into the equivalent of:
def whereBorn(name): country = countryOfOrigin(name) return _('%(x1)s was born in %(x2)s') % { "x1": name, "x2": country}
That is: The $ stuff is extracted, turning the fmt string into something anonymous. Your _() processes it, then the variables are formatted in. This turns the $ stuff completely into syntactic sugar. Any Python expression inside $() is allowed, it is compiled as if it were sitting inside the dict. I also believe it is a good idea to do the _() on the unexpanded string (as shown), since the submitted values are most probably hard to translate at all.
cheers - chris

Christian Tismer wrote:
...
Ok, I'm all with it. Since a couple of hours, I'm riding the following horse:
- $name, $(name), $(any expr) is just fine
- all of this is compile-time stuff
....
I think you just described PEP 215. But what you're missing is that we need a compile time facility for its flexibility and simplicity but we also need a runtime facility to allow I18N.
I also believe it is a good idea to do the _() on the unexpanded string (as shown), since the submitted values are most probably hard to translate at all.
_ runs at runtime. If the interpolation is done at compile time then "_" is executed too late.
Paul Prescod

Paul Prescod wrote:
Christian Tismer wrote:
...
Ok, I'm all with it. Since a couple of hours, I'm riding the following horse:
- $name, $(name), $(any expr) is just fine
- all of this is compile-time stuff
....
I think you just described PEP 215. But what you're missing is that we need a compile time facility for its flexibility and simplicity but we also need a runtime facility to allow I18N.
Are you sure you got what I meant? I want to compile the variable references away at compile time, resulting in an ordinary format string. This string is wraped by the runtime _(), and the result is then interpolated with a dict.
I also believe it is a good idea to do the _() on the unexpanded string (as shown), since the submitted values are most probably hard to translate at all.
_ runs at runtime. If the interpolation is done at compile time then "_" is executed too late.
Compile time does no interpolation but a translation of the string into a different one, which is interpolated at runtime.
will-read-PEP215-anyway - chris

Christian Tismer wrote:
...
Are you sure you got what I meant? I want to compile the variable references away at compile time, resulting in an ordinary format string. This string is wraped by the runtime _(), and the result is then interpolated with a dict.
How can that be?
Original expression:
_($"$foo")
Expands to:
_("%(x1)s"%{"x1": foo})
Standard Python order of operations will do the %-interpolation before the method call! You say that it could instead be
_("%(x1)s")%{"x1": foo}
But how would Python know to do that? "_" is just another function. There is nothing magical about it. What if the function was instead re.compile? In that case I would want to do the interpolation *before* the compilation, not after!
Are you saying that the "_" function should be made special and recognized by the compiler?
Paul Prescod

Paul Prescod wrote:
Christian Tismer wrote:
...
Are you sure you got what I meant? I want to compile the variable references away at compile time, resulting in an ordinary format string. This string is wraped by the runtime _(), and the result is then interpolated with a dict.
How can that be?
Original expression:
_($"$foo")
Expands to:
_("%(x1)s"%{"x1": foo})
Standard Python order of operations will do the %-interpolation before the method call! You say that it could instead be
_("%(x1)s")%{"x1": foo}
But how would Python know to do that? "_" is just another function. There is nothing magical about it. What if the function was instead re.compile? In that case I would want to do the interpolation *before* the compilation, not after!
Are you saying that the "_" function should be made special and recognized by the compiler?
As you say it, it looks a little as if something special would be needed, right. I have no concrete idea. Somehow I'd want to express that a function is applied after compile time substitution, but before runtime interpolation.
Here a simple idea, while not very nice, but it could work:
Assume a "$" prefix, which does the interpolation in the way you said. Assume further a "%" prefix, which does it only halfway, returning a tuple: (modified string, dict). This tuple would be passed to _(), and it is _()'s decision to work this way:
def _(s): if type(s) == type(()): s, args = s else: args = None
#... processing s ... if args: return s % args else: return s
But this is a minor issue, I just wanted to tell what I think should happen, without giving an exact solution.
cheers - chris

Extended proposal at the end:
Paul Prescod wrote:
Christian Tismer wrote:
...
Are you sure you got what I meant? I want to compile the variable references away at compile time, resulting in an ordinary format string. This string is wraped by the runtime _(), and the result is then interpolated with a dict.
How can that be?
Original expression:
_($"$foo")
Expands to:
_("%(x1)s"%{"x1": foo})
Standard Python order of operations will do the %-interpolation before the method call! You say that it could instead be
_("%(x1)s")%{"x1": foo}
But how would Python know to do that? "_" is just another function. There is nothing magical about it. What if the function was instead re.compile? In that case I would want to do the interpolation *before* the compilation, not after!
Are you saying that the "_" function should be made special and recognized by the compiler?
My idea has evolved into the following: Consider an interpolating object with the following properties (sketched by a class here):
class Interpol: def __init__(self, fmt, dic): self.fmt = fmt self.dic = dic def __repr__(self): return self.fmt % self.dic
Original expression:
_($"$foo")
Expands at compile time to:
_( Interpol("%(x1)s", {"x1": foo}) )
Having said that, it is now up to the function _() to test whether its argument is an Interpol or not. It can do something like that:
def _(arg): ... if type(arg) is Interpol: return _(arg.fmt) % arg.dic
# or, maybe cleaner, leaving the formatting action # to the Interpol class:
def _(arg): ... if isinstance(arg, Interpol): return arg.__class__(_(arg.fmt), arg.dic)
# which then in turn will return the final string, # if it is interrogated via str or repr.
ciao - chris

[Guido, quotes Christian]
The following statements are ordered by increasing hate. 1 - I do hate the idea of introducing a "$" sign at all. 2 - giving "$" special meaning in strings via a module 3 - doing it as a builtin function 4 - allowing it to address local/global variables
[and adds]
Doesn't 4 contradict your +1 on allvars()?
[Tim]
Since Christian's reply only increased the apparent contradiction, allow
me
to channel: they are ordered by increasing hate, but starting at the bottom. s/increasing/decreasing/ in his original, or s/hate/love/, and
you
can continue to read it in the top-down Dutch way <wink>.
template = [ '$linenum - I do $feeling the idea of introducing the "$$" sign at all.', '$linenum - give "$$" special meaning in strings via a module', '$linenum - doing it as a builtin function' '$linenum - allowing it to address/global local variables' ]
feeling = 'hate' if 'Dutch' in options: feeling = 'love' template = template[::-1] # cool new feature print 'The following statements are ordered by increasing $feeling.'.sub() for cnt, line in enumerate(template): # cool new feature linenum = cnt+1 # still wish enumerate had an optional start arg print linenum.sub() # aspiring cool new feature
'regnitteh dnomyar'[::-1]

[Raymond Hettinger]
... 'regnitteh dnomyar'[::-1]
Is there any chance of ripping this out of the language before someone uses it for real? If not, strings need to grow a .reversed_title_case() method too.
it's-bad-enough-we-added-a-reversed_alternating_rot13-method-ly y'rs - tim

On 19 Jun 2002 at 22:30, Guido van Rossum wrote:
The $ means "substitution" in so many languages besides Perl that I wonder where you've been.
It doesn't mean anything in any language I *like*.
-- Gordon http://www.mcmillan-inc.com/

On 20 Jun 2002 at 15:34, Fredrik Lundh wrote:
gordon wrote:
The $ means "substitution" in so many languages besides Perl that I wonder where you've been.
It doesn't mean anything in any language I *like*.
not even in american?
Where $ means "dough", which is one letter different from "cough" and "tough"[1]?
the-world's-best-language-for-discussing- the-price-of-oranges-ly y'rs
-- Gordon http://www.mcmillan-inc.com/
[1] If you're old-fashioned enough, you can spell "plow" as "plough", too.

On Thu, 20 Jun 2002, Gordon McMillan wrote:
On 20 Jun 2002 at 15:34, Fredrik Lundh wrote:
gordon wrote:
The $ means "substitution" in so many languages besides Perl that I wonder where you've been.
It doesn't mean anything in any language I *like*.
not even in american?
Where $ means "dough", which is one letter different from "cough" and "tough"[1]?
Shouldn't if be: d'oh!
-Kevin
-- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

"FP" == François Pinard pinard@iro.umontreal.ca writes:
FP> However, there are other contexts where the concept of a FP> compound dictionary of all globals and locals would be useful. FP> Maybe we could have some allvars() similar to globals() and FP> locals(), and use `... % allvars()' instead of `.sub()'? So FP> this would serve both string interpolation and other avenues.
Or maybe just make vars() do something more useful when no arguments are given?
In any event, allvars() or a-different-vars() is out of scope for this PEP. We'd use it if it was there, but I think it needs its own PEP, which someone else will have to champion.
-Barry

[Barry A. Warsaw]
FP> However, there are other contexts where the concept of a FP> compound dictionary of all globals and locals would be useful. FP> Maybe we could have some allvars() similar to globals() and FP> locals(), and use `... % allvars()' instead of `.sub()'? So FP> this would serve both string interpolation and other avenues.
Or maybe just make vars() do something more useful when no arguments are given?
I surely had the thought, but changing the meaning of an existing library function is most probably out of question.
In any event, allvars() or a-different-vars() is out of scope for this PEP. We'd use it if it was there, but I think it needs its own PEP, which someone else will have to champion.
I do not see myself championing a PEP yet, I'm not sure the Python community is soft enough for my thin skin (not so thin maybe, but I really had my share of over-long discussions in other projects, I want some rest in these days).
On the other hand, the allvars() suggestion is right on the point in my opinion. It is not a stand-alone suggestion, its goal was to stress out that `.sub()' is too far from the `%' operator, it looks like a random addition. The available formatting paradigms of Python, I mean, those which are standard, should look a bit more unified, just to preserve overall elegance. If we want Python to stay elegant (which is the source of comfort and pleasure, these being the main goals of using Python after all), we have to seek elegance in each Python move.
To the risk of looking frenetic and heretic, I guess that `$' would become more acceptable in view of the preceding paragraph, if we were introducing an `$' operator for driving `$' substitutions, the same as the `%' operator currently drives `%' substitutions. I'm not asserting that this is the direction to take, but I'm presenting this as an example of a direction that would be a bit less shocking, and which through some unification, could somewhat salvage the messy aspect of having two formatting characters.
Saying that PEP 292 rejects an idea because this idea would require another PEP to be debated and accepted beforehand, and than rushing the acceptance of PEP 292 as it stands, is probably missing the point of the discussion. Each time such an argumentation is made, we loose vision and favour the blossom of various Python features in random directions, which is not good in the long term for Python self-consistency and elegance.

On Thu, Jun 20, 2002, François Pinard wrote:
[Barry A. Warsaw]
In any event, allvars() or a-different-vars() is out of scope for this PEP. We'd use it if it was there, but I think it needs its own PEP, which someone else will have to champion.
On the other hand, the allvars() suggestion is right on the point in my opinion. It is not a stand-alone suggestion, its goal was to stress out that `.sub()' is too far from the `%' operator, it looks like a random addition. The available formatting paradigms of Python, I mean, those which are standard, should look a bit more unified, just to preserve overall elegance. If we want Python to stay elegant (which is the source of comfort and pleasure, these being the main goals of using Python after all), we have to seek elegance in each Python move.
+1

"FP" == François Pinard pinard@iro.umontreal.ca writes:
FP> Saying that PEP 292 rejects an idea because this idea would FP> require another PEP to be debated and accepted beforehand, and FP> than rushing the acceptance of PEP 292 as it stands, is FP> probably missing the point of the discussion.
I don't think there's /any/ danger of rushing acceptance of PEP 292. It may not even be accepted at all.
still-slogging-through-50-some-odd-messages-ly y'rs, -Barry
participants (14)
-
Aahz
-
barry@zope.com
-
Christian Tismer
-
Fredrik Lundh
-
Gordon McMillan
-
Guido van Rossum
-
Kevin Jacobs
-
Neal Norwitz
-
Neil Hodgson
-
Patrick K. O'Brien
-
Paul Prescod
-
pinard@iro.umontreal.ca
-
Raymond Hettinger
-
Tim Peters