RE: [Python-Dev] Re: PEP 292 - Simpler String Substitutions

François Pinard writes:
[quotations from the PEP motivation] let me think that people will often forget to stringify numbers and various other value types,
I agree. I'd like to raise a small voice in favor of automatically calling str() on all arguments before interpolation. I'd rather type this: # ... code ... out.write( template("$foo\t$bar\t$baz\n") % locals() ) than this: # ... code ... out.write( template("$foo\t$bar\t$baz\n") % {'foo': str(foo), 'bar': str(bar), 'baz': str(baz)} ) -- Michael Chermside

+1 to automatic str() -- this is a common newbie mistake and I can't think of a case where you wouldn't want this to happen On Mon, Aug 23, 2004 at 08:54:46AM -0700, Michael Chermside wrote: | Fran?ois Pinard writes: | > [quotations from the PEP motivation] let me think that people will often | > forget to stringify numbers and various other value types, | | I agree. I'd like to raise a small voice in favor of automatically | calling str() on all arguments before interpolation. I'd rather | type this: | | # ... code ... | out.write( template("$foo\t$bar\t$baz\n") % locals() ) | | than this: | | # ... code ... | out.write( template("$foo\t$bar\t$baz\n") % | {'foo': str(foo), 'bar': str(bar), 'baz': str(baz)} )

[Clark C. Evans]
+1 to automatic str() -- this is a common newbie mistake and I can't think of a case where you wouldn't want this to happen
Apologies to Barry, but I'm +1 on auto-str() too. It's a string interpolation -- the user is explicitly asking for a string. If they made a mistake, it was in asking for a string to begin with, not in feeding it a non-string. The same applies to string.join(iterable), for that matter.

On Mon, Aug 23, 2004 at 12:32:39PM -0400, Tim Peters wrote: | The same applies to string.join(iterable), for that matter. This code-snippet is littered everwhere in my applications: string.join([str(x) for x in iterable]) Its tedious and makes code hard to read. Do we need a PEP to fix this? Clark

This code-snippet is littered everwhere in my applications:
string.join([str(x) for x in iterable])
Its tedious and makes code hard to read. Do we need a PEP to fix this?
A PEP would be overkill. Still, it would be helpful to do PEP-like things such as reference implementation, soliticing comments, keep an issue list, etc. A minor issue is that the implementation automatically shifts to Unicode upon encountering a Unicode string. So you would need to test for this before coercing to a string. Also, join works in multiple passes. The proposal should be specific about where stringizing occurs. IIRC, you need the object length on the first pass, but the error handling and switchover to Unicode occur on the second. Raymond

Quoting Raymond Hettinger <python@rcn.com>:
This code-snippet is littered everwhere in my applications:
string.join([str(x) for x in iterable])
Its tedious and makes code hard to read. Do we need a PEP to fix this?
A PEP would be overkill.
Still, it would be helpful to do PEP-like things such as reference implementation, soliticing comments, keep an issue list, etc.
A minor issue is that the implementation automatically shifts to Unicode upon encountering a Unicode string. So you would need to test for this before coercing to a string.
Perhaps have string join coerce to string, and Unicode join coerce to the separator's encoding. If we do that, the existing string->Unicode promotion code should handle the switch between the two join types.
Also, join works in multiple passes. The proposal should be specific about where stringizing occurs. IIRC, you need the object length on the first pass, but the error handling and switchover to Unicode occur on the second.
Having been digging in the guts of string join last week, I'm pretty sure the handover to the Unicode join happens on the first 'how much space do we need' pass (essentially, all of the work done so far is thrown away, and the Unicode join starts from scratch. If you know you have Unicode, you're better off using a Unicode separator to avoid this unnecessary work </tangent>). We then have special casing of zero length and single item sequences, before dropping into the 'build the new string' loop. By flagging the need for a 'stringisation' operation in the failed side of the 'PyUnicode_Check' that occurs during the first pass (to see if we should hand over to the Unicode join), we could avoid unnecessarily slowing the pure string cases. To keep the speed of the pure-string case, we would need to guarantee that the sequence consists of only strings when we run through the final pass to build the new string. So we would need an optional second pass that constructs a new sequence, containing any strings from the original sequence, plus 'stringised' versions of the non-strings. The final pass could remain as-is. The only possible difference is that it may be operating on the new 'stringised' sequence rather than the old one. The alternative implementation (checking each item's type as it is added to the new string in the final pass) has the significant downside of slowing down the existing case of joining only strings. Either implementation should still be a lot faster than ''.join([str(x) for x in seq]) though. Time to go knock out some code, I think. . . Cheers, Nick. P.S. "'\n'.join(locals().items())" sure would be pretty, though

ncoghlan@iinet.net.au wrote:
Quoting Raymond Hettinger <python@rcn.com>:
This code-snippet is littered everwhere in my applications:
string.join([str(x) for x in iterable])
Its tedious and makes code hard to read. Do we need a PEP to fix
this?
A PEP would be overkill.
Still, it would be helpful to do PEP-like things such as reference implementation, soliticing comments, keep an issue list, etc.
A minor issue is that the implementation automatically shifts to Unicode upon encountering a Unicode string. So you would need to test for this before coercing to a string.
Perhaps have string join coerce to string, and Unicode join coerce to the separator's encoding. If we do that, the existing string->Unicode promotion code should handle the switch between the two join types.
The general approach is always to coerce to Unicode if strings and Unicode meet; very much like coercion to floats is done when integers and floats meet. Your suggestion would break this logic and make coercion depend on an argument.
Also, join works in multiple passes. The proposal should be specific about where stringizing occurs. IIRC, you need the object length on the first pass, but the error handling and switchover to Unicode occur on the second.
Having been digging in the guts of string join last week, I'm pretty sure the handover to the Unicode join happens on the first 'how much space do we need' pass (essentially, all of the work done so far is thrown away, and the Unicode join starts from scratch. If you know you have Unicode, you're better off using a Unicode separator to avoid this unnecessary work </tangent>).
It's just a simple length querying loop; there's no storage allocation or anything expensive happening there, so the "throw-away" operation is not expensive. Aside: ''.join() currently only works for true sequences - not iterators. OTOH, the %-format operation is which is why PyString_Format goes through some extra hoops to make sure the work already is not dropped (indeed, it may not even be possible to reevaluate the arguments; think iterators here). We could probably add similar logic to ''.join() to have it also support iterators. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 24 2004)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

[M.-A. Lemburg]
... Aside: ''.join() currently only works for true sequences - not iterators.
def gen(): ... for s in "sure", "it", "does": ... yield s ... ' '.join(gen()) 'sure it does' u' '.join(gen()) u'sure it does'
Every function implemented with PySequence_Fast() works with any iterable, although it's fastest if the input argument is a builtin list or tuple. For anything else (including list or tuple subclasses, and other "true sequences"), it materializes a temp tuple, via the iterator protocol.

M.-A. Lemburg wrote:
ncoghlan@iinet.net.au wrote:
pass (essentially, all of the work done so far is thrown away, and the Unicode join starts from scratch. If you know you have Unicode, you're better off using a Unicode separator to avoid this unnecessary work </tangent>).
It's just a simple length querying loop; there's no storage allocation or anything expensive happening there, so the "throw-away" operation is not expensive.
Yes, my mistake. The tuple that string join creates when the argument is an iterable rather than a list or tuple isn't thrown away - it is given to PyUnicode_Join to work with (since the original iterator may not be able to be iterated a second time). The only work that is lost is that all of the type checks that have been done on prior elements will get repeated in the Unicode join code. So it's not as bad as I first thought - but it will still cost a few cycles. Regards, Nick.

[Clark C. Evans]
This code-snippet is littered everwhere in my applications:
string.join([str(x) for x in iterable])
Its tedious and makes code hard to read. Do we need a PEP to fix this?
You won't need a PEP to replace it with the similar code-snippet from my code: string.join(map(str, iterable)) Same thing in the end, but map reads quite well (perhaps even better than a listcomp) in applications as simple as this.

On Mon, 2004-08-23 at 12:32, Tim Peters wrote:
Apologies to Barry, but I'm +1 on auto-str() too. It's a string interpolation -- the user is explicitly asking for a string. If they made a mistake, it was in asking for a string to begin with, not in feeding it a non-string.
Should it be auto-unicode(), given that Template is derived from unicode? And if so, should we entertain the possibility of insanities like giving the user the ability to pass optional arguments to the unicode() call? If the answers to that are yes and no, that's fine with me. -Barry

On Tue, Aug 24, 2004, Barry Warsaw wrote:
On Mon, 2004-08-23 at 12:32, Tim Peters wrote:
Apologies to Barry, but I'm +1 on auto-str() too. It's a string interpolation -- the user is explicitly asking for a string. If they made a mistake, it was in asking for a string to begin with, not in feeding it a non-string.
Should it be auto-unicode(), given that Template is derived from unicode? And if so, should we entertain the possibility of insanities like giving the user the ability to pass optional arguments to the unicode() call? If the answers to that are yes and no, that's fine with me.
Here you go: yes and no -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "To me vi is Zen. To use vi is to practice zen. Every command is a koan. Profound to the user, unintelligible to the uninitiated. You discover truth everytime you use it." --reddy@lion.austin.ibm.com

On Tue, 2004-08-24 at 14:45, Aahz wrote:
Here you go: yes and no
Cool. Well Aahz, I'm taking that as the green light to check this stuff in. It doesn't seem like anybody else wants to test the patch before hand, so in it goes. I'll also be checking in a de-string-ified sre module, docs, test cases, and an updated PEP 292. -Barry

[Barry Warsaw, on auto-str()]
Should it be auto-unicode(), given that Template is derived from unicode? And if so, should we entertain the possibility of insanities like giving the user the ability to pass optional arguments to the unicode() call? If the answers to that are yes and no, that's fine with me.
unicode ... heh, I heard something about that in Java once. Did Python grow one of those too? I sure hope it doesn't get in the way of using God-given American strings! no-no-no-no-no-but-i'll-settle-for-yes-no-ly y'rs - tim

On Tue, 2004-08-24 at 23:00, Tim Peters wrote:
unicode ... heh, I heard something about that in Java once. Did Python grow one of those too? I sure hope it doesn't get in the way of using God-given American strings!
That's "Freedom Strings" to you, buddy.
participants (10)
-
Aahz
-
Barry Warsaw
-
Clark C. Evans
-
M.-A. Lemburg
-
Michael Chermside
-
ncoghlan@iinet.net.au
-
Neil Schemenauer
-
Nick Coghlan
-
Raymond Hettinger
-
Tim Peters