Binary Operator for New-Style String Formatting
Hello all, For better or for worse, I have created a patch against the py3k trunk which introduces a binary operator '@' as an alternative syntax for the new string formatting system introduced by PEP 3101 ("Advanced String Formatting"). [1] For common cases, this syntax should be as simple and as elegant as its deprecated [2] predecessor ('%'), while also ensuring that more complex use cases do not suffer needlessly. I would just like to know whether this idea will float before submitting the patch on Roundup and going through the formal PEP process. This is my first foray into the internals of the Python core, and with any luck, I did not overlook any BDFL proclamations banning all new binary operators for string formatting. :-) QUICK EXAMPLES >>> "{} {} {}" @ (1, 2, 3) '1 2 3' >>> "foo {qux} baz" @ {"qux": "bar"} 'foo bar baz' One of the main complaints of a binary operator in PEP 3101 was the inability to mix named and unnamed arguments: The current practice is to use either a dictionary or a tuple as the second argument, but as many people have commented ... this lacks flexibility. To address this, a convention of having the last element of a tuple as the named arguments dictionary is introduced. >>> "{} {qux} {}" @ (1, 3, {"qux": "bar"}) '1 bar 3' Lastly, to print the repr() of a dictionary as an unnamed argument, one would have to append an additional dictionary so there is no ambiguity: >>> "{}" @ {"foo": "bar"} Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: tuple index out of range >>> "{}" @ ({"foo": "bar"}, {}) "{'foo': 'bar'}" Admittedly, these workarounds are less than clean, but the understanding is the '@' syntax would indeed be an alternative, so one could easily fall back to the str.format() method or the format() function. IMPLEMENTATION Code-wise, the grammar was edited per PEP 306 [3], and a function was introduced in unicodeobject.c as PyUnicode_FormatPrime (in the mathematical sense of A and A' -- I didn't fully understand or want to intrude upon the *_FormatAdvanced namespace). The PyUnicode_FormatPrime function transforms the incoming arguments, i.e. the operands of the binary '@', and makes the appropriate do_string_format() call. Thus, I have reused as much code as possible. I have done my development with git by using two branches: 'master' and 'subversion', the latter of which can be used to run 'svn update' and merge back into master. This way my code changes and the official ones going into the Subversion repository can stay separate, meanwhile allowing 'svn diff' to produce an accurate patch at any given time. The code is available at: http://github.com/jcsalterego/py3k-atsign/ The SVN patch [4] or related commit [5] are good starting points. References: [1] http://www.python.org/dev/peps/pep-3101 [2] http://docs.python.org/3.0/whatsnew/3.0.html [3] http://www.python.org/dev/peps/pep-0306/ [4] http://github.com/jcsalterego/py3k-atsign/blob/master/py3k-atsign.diff [5] http://github.com/jcsalterego/py3k-atsign/commit/5c8bdf72d9252cea78af2b78096... Thanks, -- Jerry Chen
On Sun, Jun 21, 2009 at 1:36 PM, Jerry Chen<j@3rdengine.com> wrote:
QUICK EXAMPLES
>>> "{} {} {}" @ (1, 2, 3) '1 2 3'
>>> "foo {qux} baz" @ {"qux": "bar"} 'foo bar baz'
One of the main complaints of a binary operator in PEP 3101 was the inability to mix named and unnamed arguments:
The current practice is to use either a dictionary or a tuple as the second argument, but as many people have commented ... this lacks flexibility.
The other reason an operator was a pain is the order of operations:
'{0}'.format(1 + 2) '3' '%s' % 1 + 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: cannot concatenate 'str' and 'int' objects
In general, I don't see any gain in introducing an operator for string formatting. What's the point? Maybe you save a few characters of typing, but it sure is easier to Google for "Python string format" than for "Python @". A big -1 from me. Steve -- Where did you get the preposterous hypothesis? Did Steve tell you that? --- The Hiphopopotamus
Hello,
For better or for worse, I have created a patch against the py3k trunk which introduces a binary operator '@' as an alternative syntax for the new string formatting system introduced by PEP 3101 ("Advanced String Formatting"). [1]
While many people find the new format() tedious to adapt to, I don't think adding a third formatting syntax will help us. Especially given this annoyance:
Lastly, to print the repr() of a dictionary as an unnamed argument, one would have to append an additional dictionary so there is no ambiguity:
>>> "{}" @ {"foo": "bar"} Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: tuple index out of range
>>> "{}" @ ({"foo": "bar"}, {}) "{'foo': 'bar'}"
Regards Antoine.
On Sun, 21 Jun 2009 at 12:36, Jerry Chen wrote:
For better or for worse, I have created a patch against the py3k trunk which introduces a binary operator '@' as an alternative syntax for the new string formatting system introduced by PEP 3101 ("Advanced String Formatting"). [1]
It seems to me that this topic is more appropriate for python-ideas. That said, I'm -1 on it. The 'keywords as last item of tuple' reeks of code-smell to my nose, and I don't think you've addressed all of the reasons for why a method was chosen over an operator. Python has a tradition of having "one obvious way" to do something, so introducing an "alternative" syntax that you admit is sub-optimal does not seem to me to have enough benefit to justify breaking that design guideline. Congratulations on your first foray into the core, though :) --David
Jerry Chen wrote:
Hello all,
For better or for worse, I have created a patch against the py3k trunk which introduces a binary operator '@' as an alternative syntax for the new string formatting system introduced by PEP 3101 ("Advanced String Formatting"). [1]
For common cases, this syntax should be as simple and as elegant as its deprecated [2] predecessor ('%'), while also ensuring that more complex use cases do not suffer needlessly.
I would just like to know whether this idea will float before
The place to float trial balloons is the python-ideas list.
submitting the patch on Roundup and going through the formal PEP process. This is my first foray into the internals of the Python
Even if this particular idea in not accepted, I hope you learned from and enjoyed the exercise and will try other forays.
core, and with any luck, I did not overlook any BDFL proclamations banning all new binary operators for string formatting. :-)
QUICK EXAMPLES
>>> "{} {} {}" @ (1, 2, 3)
The only advantage '@' over '.format' is fewer characters. I think it would be more useful to agitate to give 'format' a one char synonym such as 'f'. One disadvantage of using an actual tuple rather than an arg quasi-tuple is that people would have to remember the trailing comma when printing one thing. '{}' @ (1,) rather than '{}' @ (a) == '{}' @ a. [If you say, 'Oh, then accept the latter', then there is a problem when a is a tuple!]
'1 2 3'
>>> "foo {qux} baz" @ {"qux": "bar"} 'foo bar baz'
One of the main complaints of a binary operator in PEP 3101 was the inability to mix named and unnamed arguments:
The current practice is to use either a dictionary or a tuple as the second argument, but as many people have commented ... this lacks flexibility.
To address this, a convention of having the last element of a tuple as the named arguments dictionary is introduced.
>>> "{} {qux} {}" @ (1, 3, {"qux": "bar"}) '1 bar 3'
Lastly, to print the repr() of a dictionary as an unnamed argument, one would have to append an additional dictionary so there is no ambiguity:
>>> "{}" @ {"foo": "bar"} Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: tuple index out of range
>>> "{}" @ ({"foo": "bar"}, {}) "{'foo': 'bar'}"
This is another disadvantage -- to me a big one. Formatting is inherently an n-ary function who args are one format and an indefinite number of objects to plug in. Packaging the remaining args into an object to convert the function to binary is problematical, especially in Python with its mix of positional and named args. Even without that, there is possible confusion between a package as an arg in itself and a package as a container of multiple args. The % formatting problem with tuple puns was one of the reasons to seek a replacement. Terry Jan Reedy
Admittedly, these workarounds are less than clean, but the understanding is the '@' syntax would indeed be an alternative, so one could easily fall back to the str.format() method or the format() function.
IMPLEMENTATION
Code-wise, the grammar was edited per PEP 306 [3], and a function was introduced in unicodeobject.c as PyUnicode_FormatPrime (in the mathematical sense of A and A' -- I didn't fully understand or want to intrude upon the *_FormatAdvanced namespace).
The PyUnicode_FormatPrime function transforms the incoming arguments, i.e. the operands of the binary '@', and makes the appropriate do_string_format() call. Thus, I have reused as much code as possible.
I have done my development with git by using two branches: 'master' and 'subversion', the latter of which can be used to run 'svn update' and merge back into master. This way my code changes and the official ones going into the Subversion repository can stay separate, meanwhile allowing 'svn diff' to produce an accurate patch at any given time.
The code is available at:
http://github.com/jcsalterego/py3k-atsign/
The SVN patch [4] or related commit [5] are good starting points.
References:
[1] http://www.python.org/dev/peps/pep-3101 [2] http://docs.python.org/3.0/whatsnew/3.0.html [3] http://www.python.org/dev/peps/pep-0306/ [4] http://github.com/jcsalterego/py3k-atsign/blob/master/py3k-atsign.diff [5] http://github.com/jcsalterego/py3k-atsign/commit/5c8bdf72d9252cea78af2b78096...
Thanks,
Ah, the people have spoken! On Sun, Jun 21, 2009 at 2:12 PM, Terry Reedy<tjreedy@udel.edu> wrote:
The place to float trial balloons is the python-ideas list.
I'll put this one to rest, and as mentioned, will direct any future suggestions to python-ideas instead of here. Most of the arguments against my proposal state there is little gain and much to lose (in terms of clarity or an "obvious way" to go about string formatting) -- and, I agree.
The only advantage '@' over '.format' is fewer characters. I think it would be more useful to agitate to give 'format' a one char synonym such as 'f'.
str.f() would be a great idea.
One disadvantage of using an actual tuple rather than an arg quasi-tuple is that people would have to remember the trailing comma when printing one thing. '{}' @ (1,) rather than '{}' @ (a) == '{}' @ a. [If you say, 'Oh, then accept the latter', then there is a problem when a is a tuple!]
My code transforms both '{}' @ (a) and '{}' @ a to '{}'.format(a), but the problem you speak of is probably an edge case I haven't quite wrapped my head around. For what it's worth, I spent a bit of time trying to work out the syntactical quirks, including adapting the format tests in Lib/test/test_unicode.py to this syntax and ensuring all the tests passed. In the end though, it seems to be an issue of usability and clarity.
Formatting is inherently an n-ary function who args are one format and an indefinite number of objects to plug in. Packaging the remaining args into an object to convert the function to binary is problematical, especially in Python with its mix of positional and named args. Even without that, there is possible confusion between a package as an arg in itself and a package as a container of multiple args. The % formatting problem with tuple puns was one of the reasons to seek a replacement.
Also (from R. David Murray):
That said, I'm -1 on it. The 'keywords as last item of tuple' reeks of code-smell to my nose, and I don't think you've addressed all of the reasons for why a method was chosen over an operator. Python has a tradition of having "one obvious way" to do something, so introducing an "alternative" syntax that you admit is sub-optimal does not seem to me to have enough benefit to justify breaking that design guideline.
Well stated (and everyone else). Just one last note: I think my end goal here was to preserve the visual clarity and separation between format string and format parameters, as I much prefer: "%s %s %s" % (1, 2, 3) over "{0} {1} {2}".format(1, 2, 3) The former is a style I've grown accustomed to, and if % is indeed being slated for removal in Python 3.2, then I will miss it sorely (or... just get over it). Thanks to everyone who has provided constructive criticism and great arguments. Cheers, -- Jerry Chen
For better or for worse, I have created a patch against the py3k trunk which introduces a binary operator '@' as an alternative syntax for the new string formatting system introduced by PEP 3101 ("Advanced String Formatting"). [1]
I'd like to join everybody else who said that this would be a change for the worse. This kind of syntax is one of the most prominent features of Perl. $@~ly/your/s!, Martin
I'm against syntax for this, for all the reasons stated by others. Jerry Chen wrote:
Just one last note: I think my end goal here was to preserve the visual clarity and separation between format string and format parameters, as I much prefer:
"%s %s %s" % (1, 2, 3)
over
"{0} {1} {2}".format(1, 2, 3)
If it helps, in 3.1 and 2.7 this can be written as "{} {} {}".format(1, 2, 3) I'm not sure it helps for "visual clarity", but it definitely makes the typing easier for simple uses.
The former is a style I've grown accustomed to, and if % is indeed being slated for removal in Python 3.2, then I will miss it sorely (or... just get over it).
I've basically come to accept that %-formatting can never go away, unfortunately. There are too many places where %-formatting is used, for example in logging Formatters. %-formatting either has to exist or it has to be emulated. Although if anyone has any suggestions for migrating uses like that, I'm interested. Eric.
On Jun 21, 2009, at 5:40 PM, Eric Smith wrote:
I've basically come to accept that %-formatting can never go away, unfortunately. There are too many places where %-formatting is used, for example in logging Formatters. %-formatting either has to exist or it has to be emulated.
It'd possibly be helpful if there were builtin objects which forced the format style to be either newstyle or oldstyle, independent of whether % or format was called on it. E.g. x = newstyle_formatstr("{} {} {}") x % (1,2,3) == x.format(1,2,3) == "1 2 3" and perhaps, for symmetry: y = oldstyle_formatstr("%s %s %s") y.format(1,2,3) == x % (1,2,3) == "1 2 3" This allows the format string "style" decision is to be made external to the API actually calling the formatting function. Thus, it need not matter as much whether the logging API uses % or .format() internally -- that only affects the *default* behavior when a bare string is passed in. This could allow for a controlled staged towards the new format string format, with a long deprecation period for users to migrate: 1) introduce the above feature, and recommend in docs that people only ever use new-style format strings, wrapping the string in newstyle_formatstr() when necessary for passing to an API which uses % internally. 2) A long time later...deprecate str.__mod__; don't deprecate newstyle_formatstr.__mod__. 3) A while after that (maybe), remove str.__mod__ and replace all calls in Python to % (used as a formatting operator) with .format() so that the default is to use newstyle format strings for all APIs from then on. James
participants (8)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Eric Smith
-
James Y Knight
-
Jerry Chen
-
R. David Murray
-
Steven Bethard
-
Terry Reedy