Can we please have a better dict interpolation syntax?

I have just had the experience of writing a bunch of expressions of the form "create index %(table)s_lid1_idx on %(table)s(%(lid1)s)" % params and found myself getting quite confused by all the parentheses and "s" suffixes. I would *really* like to be able to write this as "create index %{table}_lid1_idx on %{table}(%{lid1})" % params which I find to be much easier on the eyes. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Wouldn't this be even better? "create index ${table}_lid1_idx on $table($lid1)" % params --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido:
Wouldn't this be even better?
"create index ${table}_lid1_idx on $table($lid1)" % params
I wouldn't object to that. I'd have expected *you* to object to it, though, since it re-defines the meaning of "$" in an interpolated string. I was just trying to suggest something that would be backward-compatible. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Correct, my proposal can't be backward-compatible. :-( But somehow I think that, for various cultural reasons (not just Perl :-) $ is a better character to use for interpolation than % -- this is pretty arbitrary, but it seems that $foo is just much more common than %foo as a substitution indicator, across various languages. (% is more common for C-style format strings of course.) There have been many proposals in this area, even a PEP (PEP 215, which I don't like that much, despite its use of $). Many people have also implemented something along these lines, using a function to request interpolation (or using template files etc.), and using various things (from dicts to namespaces) as the source for names. Anyway, I think this is something that can wait until 3.0, and I'd rather not have too many discussions here at once, so I'd rather unhelpfully punt than take this on for real (also for the benefit of Brett, who has to sort through all of this for his python-dev summary). --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido:
I'm not asking for interpolation out of the current namespace or anything like that -- just a simple extension to the current set of formats for interpolating from a dict, that could be done right now without affecting anything. I'd be willing to supply a patch if it has some chance of being accepted. I agree that the more esoteric proposals are best left until later. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

But adding to % interpolation makes it less likely that a radically different (and better) approach will be implemented, because the status quo will be closer to "good enough" without being "right". --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido van Rossum]
Wouldn't this be even better? "create index ${table}_lid1_idx on $table($lid1)" % params
"Better" because it uses `$' instead of `%'? It is really a matter of taste and aesthetics, more than being "better" on technical grounds. Technically, the multiplication of aspects and paradigms goes against some unencumberance and simplicity, which made Python attractive to start with. We would loose something probably not worth the gain.
it seems that $foo is just much more common than %foo as a substitution indicator, across various languages.
Python has the right of being culturally distinct on some details. I see it as an advantage: when languages are too similar, some confusion arises between differences. The distinction actually helps.
Anyway, I think this is something that can wait until 3.0, and I'd rather not have too many discussions here at once,
OK, then. Enough said! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard

On Thu, 2003-10-23 at 08:31, François Pinard wrote:
Better because the trailing type specifier on %-strings is extremely error prone (#1 cause of bugs for Mailman translators is/was leaving off the trailing 's'). Better because the rules for $-strings are simple and easy to explain. Better because the enclosing braces are optional, and unnecessary in the common case, making for much more readable template strings. And yes, better because it uses $ instead of %; it just seems that more people grok that $foo is a placeholder. -Barry

Hi again. Getting back to much older messages about `$'-interpolation. :-) [Barry Warsaw]
Such errors are usually caught much more solidly, for example, right in `msgfmt'. At least for C, and most likely for Python as well, it is kind of catastrophic for an application that its translation file has errors regarding formats. If a bad PO file crashes an application, maintainers and users will get mad at internationalisation. Forgetting a trailing `s' is one error, misspelling a variable name is another: both should absolutely be caught before the release of an internationalised application. Using $-strings instead of %-strings is far from adequately addressing the real problem. Translators, despite all their good will, cannot be trusted to never make errors, and they usually use `msgfmt' to validate their own PO files before transmission (or indirectly rely on robots for doing that validation for them). Whatever they use $-strings or %-strings, validation is necessary.
Better because the rules for $-strings are simple and easy to explain.
I just read PEP 292 again, and a mapping is necessarily provided, this evacuates a lot of questions about how variables would be accessed out of local and global scopes -- this simplifies things a lot. The PEP does not say what the behaviour of the substitution when an identifier is not part of the mapping, and this is an important issue. At one place, the PEP says that `dstring' could be sub-classed to get different behaviour, like allowing dots for attribute access, but elsewhere, it also implies that within "${identifier}", "identifier" has to be an identifier, that is, no dots. And if "identifier" could contain special characters like dots or brackets, it does not say if brackets may be nested nor if they have to balance (like they apparently and nicely do with `%' interpolation). It does not seem all that simple and easy for me. Granted it could have been much more difficult.
And yes, better because it uses $ instead of %; it just seems that more people grok that $foo is a placeholder.
Yet, users of `$' in other languages or scripts do not have to explicitly provide a mapping, so the similarity stays a bit superficial. But if this makes a few more users happy, and being in a library, stays away from the Python language, `$-strings' may indeed serve a purpose. -- François Pinard http://www.iro.umontreal.ca/~pinard

On Thu, 2003-10-23 at 00:16, Guido van Rossum wrote:
There have been many proposals in this area, even a PEP (PEP 215, which I don't like that much, despite its use of $).
And PEP 292, which I probably should update. I should mention that $string substitutions are optional in Mailman 2.1, but they will be the only way to do it in Mailman 3. I've played a lot with various implementations of this idea, and below is the one I've currently settled on. Not all of the semantics may be perfect for core Python (i.e. never throw a KeyError), but this is all doable in modern Python, and for user-exposed templates, gets a +1000 in my book.
-Barry import re # Search for $$, $identifier, or ${identifier} dre = re.compile(r'(\${2})|\$([_a-z]\w*)|\${([_a-z]\w*)}', re.IGNORECASE) EMPTYSTRING = '' class dstring(unicode): def __new__(cls, ustr): ustr = ustr.replace('%', '%%') parts = dre.split(ustr) for i in range(1, len(parts), 4): if parts[i] is not None: parts[i] = '$' elif parts[i+1] is not None: parts[i+1] = '%(' + parts[i+1] + ')s' else: parts[i+2] = '%(' + parts[i+2] + ')s' return unicode.__new__(cls, EMPTYSTRING.join(filter(None, parts))) class safedict(dict): """Dictionary which returns a default value for unknown keys.""" def __getitem__(self, key): try: return super(safedict, self).__getitem__(key) except KeyError: return '${%s}' % key

Greg> I would *really* like to be able to write this as Greg> "create index %{table}_lid1_idx on %{table}(%{lid1})" % params Greg> which I find to be much easier on the eyes. What if lid1 is a float which you want to display with two digits past the decimal point? I think we've been around the block on this one a few times. While %{foo} might be a convenient shorthand for %(foo)s, I don't think it saves enough space (one character) or stands out that much more ("{...}" instead of "(...)s") to make the addition worthwhile. In addition, you'd have to retain the current construct in cases where something other than simple string interpolation was required, in which case you also have the problem of having two almost identical ways to do dictionary interpolation. Skip

On Thu, 2003-10-23 at 09:55, Skip Montanaro wrote:
What if lid1 is a float which you want to display with two digits past the decimal point?
BTW, I should mention that IMO, $-strings are great for end-user editable string templates, such as (in Mailman) things like translatable strings or message footer templates. But I also think the existing %-strings are just fine for programmers. I would definitely be opposed to complicating $-strings with any of the specialized and fine-grained control you have with %-strings. KISS and you'll have a great 99% solution, as long as you accept that the two substitution formats are aimed at different audiences. Then again, see my last post. I'm not sure anything needs to be added to core Python to support useful $-strings. Or maybe it can be implemented as a library module (or part of a 'textutils' package). -Barry

Barry Warsaw writes:
+1 on adding this as a module. I've managed to implement this a few times, and it would be nice to just import the same implementation from everywhere I needed it. One note: calling this "interpolation" (at least when describing it to end users) is probably a mistake; "substitution" makes more sense to people not ingrained in communities where it's called interpolation. It might be ok to call it interpolation for programmers, but... there's no need for two different names for it. ;-) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation

On Thu, 2003-10-23 at 11:18, Fred L. Drake, Jr. wrote:
Wasn't there talk of a textutils package around the time of textwrap.py? Maybe add that for Py2.4?
Again +1 isn't strong enough. :) End users understand "substitution", they don't understand "interpolation". If started to use the former everywhere now. -Barry

Barry> On Thu, 2003-10-23 at 09:55, Skip Montanaro wrote: >> What if lid1 is a float which you want to display with two digits >> past the decimal point? Barry> BTW, I should mention that IMO, $-strings are great for end-user Barry> editable string templates, such as (in Mailman) things like Barry> translatable strings or message footer templates. ... Barry> Then again, see my last post. I'm not sure anything needs to be Barry> added to core Python to support useful $-strings. Or maybe it Barry> can be implemented as a library module (or part of a 'textutils' Barry> package). +1. If it's not something programmers will use (most of the time, anyway) there's no need to build it into the language. If programmers like it, it's only another module to import. In addition, I'm fairly certain such a module could be made compatible with Python as far back as 1.5.2 without a lot of effort. You also have the freedom to make it much more flexible (use of templates and so forth) if it's in a separate module. Skip

I have too much on my plate (spent too much on generator expressions lately :-). I am bowing out of the variable substitution discussion after noting that putting it in a module would be a great start (like for sets). --Guido van Rossum (home page: http://www.python.org/~guido/)

On Thu, 2003-10-23 at 11:38, Guido van Rossum wrote:
I don't have time to do it, but once Someone figures out where to situate it, feel free to use my posted code, either verbatim or as a starting point. PSF donation, blah, blah, blah. -Barry

Guido van Rossum wrote:
This idea seemed to die for no apparent reason. Fred, Skip, and Barry all liked the idea of adding the string substitution code to a module (one idea for a name was textutils) and Guido obviously seems receptive to the idea. Do people feel like moving forward with a new module? -Brett

Skip Montanaro <skip@pobox.com>:
I disagree strongly -- I think it *does* stand out more clearly. The "s" on the end of "%(name)s" too easily gets mixed up with other alphanumeric stuff nearby. If it were just "%(name)" *without* the trailing "s" it wouldn't be nearly as bad, but unfortunately it can't be left off and remain backwards compatible.
What if lid1 is a float which you want to display with two digits past the decimal point?
Then I would use the existing construct -- I'm not suggesting that it be removed.
in which case you also have the problem of having two almost identical ways to do dictionary interpolation.
I don't see that as a big problem. To my mind, practicality beats purity here -- "%(name)s" is too awkward to be practical for routine use. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

On Thu, Oct 23, 2003 at 02:36:41PM +1300, Greg Ewing wrote:
A while ago I proposed the following syntax for embedded expressions in strings, parsed at compile-time: "create index \{table}_lid1_idx on \{table}(\{lid1})" And the equivalent runtime parsed version: r"create index \{table}_lid1_idx on \{table}(\{lid1})".cook(params) testing-the-water-to-see-if-it's-PEP-time-ly yours, Oren

"create index \{table}_lid1_idx on \{table}(\{lid1})"
That looks horrible. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Wouldn't this be even better? "create index ${table}_lid1_idx on $table($lid1)" % params --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido:
Wouldn't this be even better?
"create index ${table}_lid1_idx on $table($lid1)" % params
I wouldn't object to that. I'd have expected *you* to object to it, though, since it re-defines the meaning of "$" in an interpolated string. I was just trying to suggest something that would be backward-compatible. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Correct, my proposal can't be backward-compatible. :-( But somehow I think that, for various cultural reasons (not just Perl :-) $ is a better character to use for interpolation than % -- this is pretty arbitrary, but it seems that $foo is just much more common than %foo as a substitution indicator, across various languages. (% is more common for C-style format strings of course.) There have been many proposals in this area, even a PEP (PEP 215, which I don't like that much, despite its use of $). Many people have also implemented something along these lines, using a function to request interpolation (or using template files etc.), and using various things (from dicts to namespaces) as the source for names. Anyway, I think this is something that can wait until 3.0, and I'd rather not have too many discussions here at once, so I'd rather unhelpfully punt than take this on for real (also for the benefit of Brett, who has to sort through all of this for his python-dev summary). --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido:
I'm not asking for interpolation out of the current namespace or anything like that -- just a simple extension to the current set of formats for interpolating from a dict, that could be done right now without affecting anything. I'd be willing to supply a patch if it has some chance of being accepted. I agree that the more esoteric proposals are best left until later. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

But adding to % interpolation makes it less likely that a radically different (and better) approach will be implemented, because the status quo will be closer to "good enough" without being "right". --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido van Rossum]
Wouldn't this be even better? "create index ${table}_lid1_idx on $table($lid1)" % params
"Better" because it uses `$' instead of `%'? It is really a matter of taste and aesthetics, more than being "better" on technical grounds. Technically, the multiplication of aspects and paradigms goes against some unencumberance and simplicity, which made Python attractive to start with. We would loose something probably not worth the gain.
it seems that $foo is just much more common than %foo as a substitution indicator, across various languages.
Python has the right of being culturally distinct on some details. I see it as an advantage: when languages are too similar, some confusion arises between differences. The distinction actually helps.
Anyway, I think this is something that can wait until 3.0, and I'd rather not have too many discussions here at once,
OK, then. Enough said! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard

On Thu, 2003-10-23 at 08:31, François Pinard wrote:
Better because the trailing type specifier on %-strings is extremely error prone (#1 cause of bugs for Mailman translators is/was leaving off the trailing 's'). Better because the rules for $-strings are simple and easy to explain. Better because the enclosing braces are optional, and unnecessary in the common case, making for much more readable template strings. And yes, better because it uses $ instead of %; it just seems that more people grok that $foo is a placeholder. -Barry

Hi again. Getting back to much older messages about `$'-interpolation. :-) [Barry Warsaw]
Such errors are usually caught much more solidly, for example, right in `msgfmt'. At least for C, and most likely for Python as well, it is kind of catastrophic for an application that its translation file has errors regarding formats. If a bad PO file crashes an application, maintainers and users will get mad at internationalisation. Forgetting a trailing `s' is one error, misspelling a variable name is another: both should absolutely be caught before the release of an internationalised application. Using $-strings instead of %-strings is far from adequately addressing the real problem. Translators, despite all their good will, cannot be trusted to never make errors, and they usually use `msgfmt' to validate their own PO files before transmission (or indirectly rely on robots for doing that validation for them). Whatever they use $-strings or %-strings, validation is necessary.
Better because the rules for $-strings are simple and easy to explain.
I just read PEP 292 again, and a mapping is necessarily provided, this evacuates a lot of questions about how variables would be accessed out of local and global scopes -- this simplifies things a lot. The PEP does not say what the behaviour of the substitution when an identifier is not part of the mapping, and this is an important issue. At one place, the PEP says that `dstring' could be sub-classed to get different behaviour, like allowing dots for attribute access, but elsewhere, it also implies that within "${identifier}", "identifier" has to be an identifier, that is, no dots. And if "identifier" could contain special characters like dots or brackets, it does not say if brackets may be nested nor if they have to balance (like they apparently and nicely do with `%' interpolation). It does not seem all that simple and easy for me. Granted it could have been much more difficult.
And yes, better because it uses $ instead of %; it just seems that more people grok that $foo is a placeholder.
Yet, users of `$' in other languages or scripts do not have to explicitly provide a mapping, so the similarity stays a bit superficial. But if this makes a few more users happy, and being in a library, stays away from the Python language, `$-strings' may indeed serve a purpose. -- François Pinard http://www.iro.umontreal.ca/~pinard

On Thu, 2003-10-23 at 00:16, Guido van Rossum wrote:
There have been many proposals in this area, even a PEP (PEP 215, which I don't like that much, despite its use of $).
And PEP 292, which I probably should update. I should mention that $string substitutions are optional in Mailman 2.1, but they will be the only way to do it in Mailman 3. I've played a lot with various implementations of this idea, and below is the one I've currently settled on. Not all of the semantics may be perfect for core Python (i.e. never throw a KeyError), but this is all doable in modern Python, and for user-exposed templates, gets a +1000 in my book.
-Barry import re # Search for $$, $identifier, or ${identifier} dre = re.compile(r'(\${2})|\$([_a-z]\w*)|\${([_a-z]\w*)}', re.IGNORECASE) EMPTYSTRING = '' class dstring(unicode): def __new__(cls, ustr): ustr = ustr.replace('%', '%%') parts = dre.split(ustr) for i in range(1, len(parts), 4): if parts[i] is not None: parts[i] = '$' elif parts[i+1] is not None: parts[i+1] = '%(' + parts[i+1] + ')s' else: parts[i+2] = '%(' + parts[i+2] + ')s' return unicode.__new__(cls, EMPTYSTRING.join(filter(None, parts))) class safedict(dict): """Dictionary which returns a default value for unknown keys.""" def __getitem__(self, key): try: return super(safedict, self).__getitem__(key) except KeyError: return '${%s}' % key

Greg> I would *really* like to be able to write this as Greg> "create index %{table}_lid1_idx on %{table}(%{lid1})" % params Greg> which I find to be much easier on the eyes. What if lid1 is a float which you want to display with two digits past the decimal point? I think we've been around the block on this one a few times. While %{foo} might be a convenient shorthand for %(foo)s, I don't think it saves enough space (one character) or stands out that much more ("{...}" instead of "(...)s") to make the addition worthwhile. In addition, you'd have to retain the current construct in cases where something other than simple string interpolation was required, in which case you also have the problem of having two almost identical ways to do dictionary interpolation. Skip

On Thu, 2003-10-23 at 09:55, Skip Montanaro wrote:
What if lid1 is a float which you want to display with two digits past the decimal point?
BTW, I should mention that IMO, $-strings are great for end-user editable string templates, such as (in Mailman) things like translatable strings or message footer templates. But I also think the existing %-strings are just fine for programmers. I would definitely be opposed to complicating $-strings with any of the specialized and fine-grained control you have with %-strings. KISS and you'll have a great 99% solution, as long as you accept that the two substitution formats are aimed at different audiences. Then again, see my last post. I'm not sure anything needs to be added to core Python to support useful $-strings. Or maybe it can be implemented as a library module (or part of a 'textutils' package). -Barry

Barry Warsaw writes:
+1 on adding this as a module. I've managed to implement this a few times, and it would be nice to just import the same implementation from everywhere I needed it. One note: calling this "interpolation" (at least when describing it to end users) is probably a mistake; "substitution" makes more sense to people not ingrained in communities where it's called interpolation. It might be ok to call it interpolation for programmers, but... there's no need for two different names for it. ;-) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation

On Thu, 2003-10-23 at 11:18, Fred L. Drake, Jr. wrote:
Wasn't there talk of a textutils package around the time of textwrap.py? Maybe add that for Py2.4?
Again +1 isn't strong enough. :) End users understand "substitution", they don't understand "interpolation". If started to use the former everywhere now. -Barry

Barry> On Thu, 2003-10-23 at 09:55, Skip Montanaro wrote: >> What if lid1 is a float which you want to display with two digits >> past the decimal point? Barry> BTW, I should mention that IMO, $-strings are great for end-user Barry> editable string templates, such as (in Mailman) things like Barry> translatable strings or message footer templates. ... Barry> Then again, see my last post. I'm not sure anything needs to be Barry> added to core Python to support useful $-strings. Or maybe it Barry> can be implemented as a library module (or part of a 'textutils' Barry> package). +1. If it's not something programmers will use (most of the time, anyway) there's no need to build it into the language. If programmers like it, it's only another module to import. In addition, I'm fairly certain such a module could be made compatible with Python as far back as 1.5.2 without a lot of effort. You also have the freedom to make it much more flexible (use of templates and so forth) if it's in a separate module. Skip

I have too much on my plate (spent too much on generator expressions lately :-). I am bowing out of the variable substitution discussion after noting that putting it in a module would be a great start (like for sets). --Guido van Rossum (home page: http://www.python.org/~guido/)

On Thu, 2003-10-23 at 11:38, Guido van Rossum wrote:
I don't have time to do it, but once Someone figures out where to situate it, feel free to use my posted code, either verbatim or as a starting point. PSF donation, blah, blah, blah. -Barry

Guido van Rossum wrote:
This idea seemed to die for no apparent reason. Fred, Skip, and Barry all liked the idea of adding the string substitution code to a module (one idea for a name was textutils) and Guido obviously seems receptive to the idea. Do people feel like moving forward with a new module? -Brett

Skip Montanaro <skip@pobox.com>:
I disagree strongly -- I think it *does* stand out more clearly. The "s" on the end of "%(name)s" too easily gets mixed up with other alphanumeric stuff nearby. If it were just "%(name)" *without* the trailing "s" it wouldn't be nearly as bad, but unfortunately it can't be left off and remain backwards compatible.
What if lid1 is a float which you want to display with two digits past the decimal point?
Then I would use the existing construct -- I'm not suggesting that it be removed.
in which case you also have the problem of having two almost identical ways to do dictionary interpolation.
I don't see that as a big problem. To my mind, practicality beats purity here -- "%(name)s" is too awkward to be practical for routine use. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

On Thu, Oct 23, 2003 at 02:36:41PM +1300, Greg Ewing wrote:
A while ago I proposed the following syntax for embedded expressions in strings, parsed at compile-time: "create index \{table}_lid1_idx on \{table}(\{lid1})" And the equivalent runtime parsed version: r"create index \{table}_lid1_idx on \{table}(\{lid1})".cook(params) testing-the-water-to-see-if-it's-PEP-time-ly yours, Oren

"create index \{table}_lid1_idx on \{table}(\{lid1})"
That looks horrible. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
participants (8)
-
Barry Warsaw
-
Brett C.
-
François Pinard
-
Fred L. Drake, Jr.
-
Greg Ewing
-
Guido van Rossum
-
Oren Tirosh
-
Skip Montanaro