textFromMap(seq , map=None , sep='' , ldelim='', rdelim='')

Hello, A recommended idiom to construct a text from bits -- usually when the bits themselves are constructed by mapping on a sequence -- is to store the intermediate results and then only join() them all at once. Since I discovered this idiom I find myself constantly use it, to the point of having a func doing that in my python toolkit: def textFromMap(seq , map=None , sep='' , ldelim='',rdelim=''): if (map is None): return "%s%s%s" %(ldelim , sep.join(str(e) for e in seq) , rdelim) else: return "%s%s%s" %(ldelim , sep.join(str(map(e)) for e in seq) , rdelim) Example use: class LispList(list): def __repr__(self): return textFromMap(self , repr , ' ' , '(',')') print LispList([1, 2, 3]) # --> (1 2 3) Is there any similar routine in Python? If yes, please inform me off list and excuse the noise. Else, I wonder whether such a routine would be useful as builtin, precisely since it is a common and recommended idiom. The issues with not having it, according to me, are that the expression is somewhat complicated and, more importantly, hardly tells the reader what it means & does -- even when "unfolded" into 2 or more lines of code: elements = (map(e) for e in seq) elementTexts = (str(e) for e in elements) content = sep.join(elementTexts) text = "%s%s%s" %(ldelim , content , rdelim) There are 2 discussable choices in the func above: * Unlike join(), it converts to str automagically. * It takes optional delimiter parameters which complicate the interface (but are really handy for my common use cases :-) Also, the map parameter is optional in case there is no mapping at all, which is more common if the func "string-ifies" itself. If ever you find this proposal sensible, then what should be the routine's name? And where to integrate it in the language? I think there are at least 3 options: 1. A simple func textFromMap(seq, ...) 2. A static method of str str.fromMap(seq, ...) 3. A method for iterables (1) seq.textFromMap(...) (I personly find the latter more correct semantically (2).) What do you think? Denis (1) I don't know exactly what should be the top class, if any. (2) I think the same about join: should be "seq.join(sep)" since for me the object on which the method applies is seq, not sep. -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com

On 2010-10-25, at 15:49 , spir wrote:
I really am not sure you gain so much over the current `sep.join(str(map(e)) for e in seq))`, even with the addition of ldelim and rdelim which end-up in arguments-soup/noise (5 arguments in the worst case is quite a lot). The name is also strange, and hints at needing function composition more than a new builtin.
This is also the choice of e.g. Ruby, but it has a severe limitation: Python doesn't have any `Iterable` type, yet `join` can be used with any iterable including generators or callable-iterators. Thus you can not put it on the iterable or sequence, or you have to prepare some kind of iterable mixin. This issue might be solved/solvable via the new abstract base classes, but I'm so sure about it (do you explicitly have to mix-in an abc to use its methods?). In fact, Ruby 1.8 does have that limitation (though it's arguably not the worst limitation ever): `Array#join` exists but not `Enumerable#join`. They tried to add `Enumerable#join` in 1.9.1 (though a fairly strange, recursive version of it) then took it out then added it back again (or something, I got lost around there). And in any case since there is no requirement for enumerable collections to mix Enumerable in, you can have enumerable collections with no join support.

Hm. I suppose the need for this would be slightly mitigated if I understood why str.join does not try to convert the elements of the iterable it is passed to strs (and analogously for unicode). Does anyone know what the rationale for that is? lvh

On Tue, Oct 26, 2010 at 2:00 AM, Laurens Van Houtven <lvh@laurensvh.be> wrote:
To elaborate on Guido's answer, omitting automatic coercion makes it fairly easy to coerce via str, repr or ascii (as appropriate), or else to implicitly assert that all the inputs should be strings (or buffers) already. Once you put automatic coercion inside str.join, the last option becomes comparatively hard to do. Note that easy coercion in str.join is one of the use cases that prompted us to keep map as a builtin though: sep.join(map(str, seq)) sep.join(map(repr, seq)) sep.join(map(ascii, seq)) sep.join(seq) The genexp equivalents are both slower and harder to read than the simple map invocations. To elaborate on Terry's answer as well - when join was the function string.join, people often had troubling remembering if the sequence or the separator argument came first. With the str method, while some people may find it odd to have the method invocation on the separator, they typically don't forget the order once they learn it for the first time. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
I don't feel your "the other way around" makes clear sense. The /split/ function depends on two string parameters, what allows a design choice on which one should be the object when making it a method call. I have been burned more than once with internalizing that /join/ is a method on the separator, just to (re-)discover that such is *not* the case of the converse method /split/ - although it could (and therefore should, to minimize cognitive dissonance). IOW, instead of whining that there is no way to make join a method on what "we should" think of as the prominent object (ie the sequence/iterator) and then half-heartedly promote sep.join as the solution, let's take the sep.join idiom seriously together with its implication that a core object role for a string, is to act as a separator. And let's then propagate that notion, to a *coherent* definition of split that makes it as well a method on the separator. Cheers, BB

Boris Borcic wrote:
And let's then propagate that notion, to a *coherent* definition of split that makes it as well a method on the separator.
Let's not. Splitting is not something that you on the separator, it's something you do on the source string. I'm sure you wouldn't expect this: ":".find("key:value") => 3 Nor should we expect this: ":".split("key:value") => ["key", "value"] You perform a search *on* the source string, not the target substring. Likewise you split the source string, not the separator. -- Steven

On 2010-10-26, at 18:09 , Steven D'Aprano wrote:
Much as joining, that's a completely arbitrary decision. Python's take is that you split on a source and join on a separator, most APIs I've seen so far agree on the former but not on the latter. And Python has an API which splits on the separator anyway: re.RegexObject is not a value you can provide to str.split() as far as I know (whereas in Ruby String#split can take a string or a regex indifferently, so it's coherent in that it always splits on the source string, never on the separator).

Steven D'Aprano wrote:
To be honest, my test for this type of questions is how likely I find myself using the bound method outside of immediate method call syntax, and I'd say having a specialized callable that will find specific content in whatever future argument, is more likely than the converse callable that will find occurences of whatever future argument in a fixed string. YMMV
To me, this sounds like giving too much weight to english language intuition. What really counts is not how it gets to be said in good english, but rather - what's the variable/object/value that, in the context of the action, tends to be the most stable focus of attention. And remember that most speakers of E as a second language, never become fully comfortable with E prepositions. Cheers, BB

On Wed, 27 Oct 2010 03:09:23 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
I completely share this view. Also, when one needs to split on multiple seps, repetitive seps, or even more complex separation schemes, it makes even less sense to see split applying on the sep, instead of on the string. Even less when splitting should remove empty parts generated by seps at both end or repeted seps. Note that it's precisely what split() without sep does:
Finally, in any of such cases, join is _not_ a reverse function for split. split in the general case is not reversable because there is loss of information. It is possible only with a pattern limited to a single sep, no (implicit) repetition, and keeping empty parts at ends. Very fine that python's split semantics is so defined, one cannot think at split as reversible in general (*). Denis (*) Similar rule: one cannot rewrite original code from an AST: there is loss of information. One can only write code in a standard form that has same semantics (hopefully). -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com

spir wrote:
Pack behavior ! Where's the alpha male ? :)
Now that's a mighty strange argument, unless you think of /split/ as some sort of multimethod. I didn't mean to deprive you of your preferred swiss army knife :) Obviously the algorithm must change according to the sort of "separation scheme". Isn't it then a natural anticipation to see the dispatch effected along the lines of Python's native object orientation ? Maybe though, this is a case of the user overstepping into the private coding business of language implementors. But on the user's own coding side, the more complex the "separation scheme", the most likely it is that code written to achieve it using /split/, applies multiply on *changing* input "source string"s. What in turn would justify that the action name /split/ be bound more tightly to the relatively stable "separation scheme" than to the relatively unstable "source string".
/split/ currently behaves as it does currently, sure. If it was bound on the separator, s.split() could naturally be written ''.split(s) - so what's your point ? As I told Johnson, deeming ''.join(seqofstr) better-looking than sum(seqofstr) entails promotion of aesthetic sense in favor of ''.split...
repetition,
and keeping empty parts at ends. Very fine that python's split semantics is so defined, one cannot think at split as reversible in general (*).
Now that's gratuitous pedantry ! Note that given f = sep.join g = lambda t : t.split(sep) it is true that g(f(g(x)))==g(x) and f(g(f(y)))==f(y) for whatever values of sep, x, and y that do not provoke any exception. What covers all natural use cases with the notable exception of s.split(), iow sep=None. That is clearly enough to justify calling, as I did, /split/ the "converse" of /join/ (note the order, sep.join applied first, which eliminates sep=None as a use case) And iirc, the mathematical notion that best fits the idea, is not that of http://en.wikipedia.org/wiki/Inverse_function but that of http://en.wikipedia.org/wiki/Adjoint_functors Cheers, BB

On 10/25/2010 9:49 AM, spir wrote:
'map' is a bad parameter name as it 1. reuses the builtin name and 2. uses it for a parameter (the mapped function) of that builtin. ...
(2) I think the same about join: should be "seq.join(sep)" since for me the object on which the method applies is seq, not sep.
The two parameters for the join function are a string and an iterable of strings. There is no 'iterable of strings' class, so that leaves the string class to attach it to as a method. (It once *was* just a function in the string module before it and other functions were so attached.) The fact that the function produces a string is another reason it should be a string method. Ditto for bytes and iterable of bytes. -- Terry Jan Reedy

On 2010-10-25, at 15:49 , spir wrote:
I really am not sure you gain so much over the current `sep.join(str(map(e)) for e in seq))`, even with the addition of ldelim and rdelim which end-up in arguments-soup/noise (5 arguments in the worst case is quite a lot). The name is also strange, and hints at needing function composition more than a new builtin.
This is also the choice of e.g. Ruby, but it has a severe limitation: Python doesn't have any `Iterable` type, yet `join` can be used with any iterable including generators or callable-iterators. Thus you can not put it on the iterable or sequence, or you have to prepare some kind of iterable mixin. This issue might be solved/solvable via the new abstract base classes, but I'm so sure about it (do you explicitly have to mix-in an abc to use its methods?). In fact, Ruby 1.8 does have that limitation (though it's arguably not the worst limitation ever): `Array#join` exists but not `Enumerable#join`. They tried to add `Enumerable#join` in 1.9.1 (though a fairly strange, recursive version of it) then took it out then added it back again (or something, I got lost around there). And in any case since there is no requirement for enumerable collections to mix Enumerable in, you can have enumerable collections with no join support.

Hm. I suppose the need for this would be slightly mitigated if I understood why str.join does not try to convert the elements of the iterable it is passed to strs (and analogously for unicode). Does anyone know what the rationale for that is? lvh

On Tue, Oct 26, 2010 at 2:00 AM, Laurens Van Houtven <lvh@laurensvh.be> wrote:
To elaborate on Guido's answer, omitting automatic coercion makes it fairly easy to coerce via str, repr or ascii (as appropriate), or else to implicitly assert that all the inputs should be strings (or buffers) already. Once you put automatic coercion inside str.join, the last option becomes comparatively hard to do. Note that easy coercion in str.join is one of the use cases that prompted us to keep map as a builtin though: sep.join(map(str, seq)) sep.join(map(repr, seq)) sep.join(map(ascii, seq)) sep.join(seq) The genexp equivalents are both slower and harder to read than the simple map invocations. To elaborate on Terry's answer as well - when join was the function string.join, people often had troubling remembering if the sequence or the separator argument came first. With the str method, while some people may find it odd to have the method invocation on the separator, they typically don't forget the order once they learn it for the first time. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
I don't feel your "the other way around" makes clear sense. The /split/ function depends on two string parameters, what allows a design choice on which one should be the object when making it a method call. I have been burned more than once with internalizing that /join/ is a method on the separator, just to (re-)discover that such is *not* the case of the converse method /split/ - although it could (and therefore should, to minimize cognitive dissonance). IOW, instead of whining that there is no way to make join a method on what "we should" think of as the prominent object (ie the sequence/iterator) and then half-heartedly promote sep.join as the solution, let's take the sep.join idiom seriously together with its implication that a core object role for a string, is to act as a separator. And let's then propagate that notion, to a *coherent* definition of split that makes it as well a method on the separator. Cheers, BB

Boris Borcic wrote:
And let's then propagate that notion, to a *coherent* definition of split that makes it as well a method on the separator.
Let's not. Splitting is not something that you on the separator, it's something you do on the source string. I'm sure you wouldn't expect this: ":".find("key:value") => 3 Nor should we expect this: ":".split("key:value") => ["key", "value"] You perform a search *on* the source string, not the target substring. Likewise you split the source string, not the separator. -- Steven

On 2010-10-26, at 18:09 , Steven D'Aprano wrote:
Much as joining, that's a completely arbitrary decision. Python's take is that you split on a source and join on a separator, most APIs I've seen so far agree on the former but not on the latter. And Python has an API which splits on the separator anyway: re.RegexObject is not a value you can provide to str.split() as far as I know (whereas in Ruby String#split can take a string or a regex indifferently, so it's coherent in that it always splits on the source string, never on the separator).

Steven D'Aprano wrote:
To be honest, my test for this type of questions is how likely I find myself using the bound method outside of immediate method call syntax, and I'd say having a specialized callable that will find specific content in whatever future argument, is more likely than the converse callable that will find occurences of whatever future argument in a fixed string. YMMV
To me, this sounds like giving too much weight to english language intuition. What really counts is not how it gets to be said in good english, but rather - what's the variable/object/value that, in the context of the action, tends to be the most stable focus of attention. And remember that most speakers of E as a second language, never become fully comfortable with E prepositions. Cheers, BB

On Wed, 27 Oct 2010 03:09:23 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
I completely share this view. Also, when one needs to split on multiple seps, repetitive seps, or even more complex separation schemes, it makes even less sense to see split applying on the sep, instead of on the string. Even less when splitting should remove empty parts generated by seps at both end or repeted seps. Note that it's precisely what split() without sep does:
Finally, in any of such cases, join is _not_ a reverse function for split. split in the general case is not reversable because there is loss of information. It is possible only with a pattern limited to a single sep, no (implicit) repetition, and keeping empty parts at ends. Very fine that python's split semantics is so defined, one cannot think at split as reversible in general (*). Denis (*) Similar rule: one cannot rewrite original code from an AST: there is loss of information. One can only write code in a standard form that has same semantics (hopefully). -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com

spir wrote:
Pack behavior ! Where's the alpha male ? :)
Now that's a mighty strange argument, unless you think of /split/ as some sort of multimethod. I didn't mean to deprive you of your preferred swiss army knife :) Obviously the algorithm must change according to the sort of "separation scheme". Isn't it then a natural anticipation to see the dispatch effected along the lines of Python's native object orientation ? Maybe though, this is a case of the user overstepping into the private coding business of language implementors. But on the user's own coding side, the more complex the "separation scheme", the most likely it is that code written to achieve it using /split/, applies multiply on *changing* input "source string"s. What in turn would justify that the action name /split/ be bound more tightly to the relatively stable "separation scheme" than to the relatively unstable "source string".
/split/ currently behaves as it does currently, sure. If it was bound on the separator, s.split() could naturally be written ''.split(s) - so what's your point ? As I told Johnson, deeming ''.join(seqofstr) better-looking than sum(seqofstr) entails promotion of aesthetic sense in favor of ''.split...
repetition,
and keeping empty parts at ends. Very fine that python's split semantics is so defined, one cannot think at split as reversible in general (*).
Now that's gratuitous pedantry ! Note that given f = sep.join g = lambda t : t.split(sep) it is true that g(f(g(x)))==g(x) and f(g(f(y)))==f(y) for whatever values of sep, x, and y that do not provoke any exception. What covers all natural use cases with the notable exception of s.split(), iow sep=None. That is clearly enough to justify calling, as I did, /split/ the "converse" of /join/ (note the order, sep.join applied first, which eliminates sep=None as a use case) And iirc, the mathematical notion that best fits the idea, is not that of http://en.wikipedia.org/wiki/Inverse_function but that of http://en.wikipedia.org/wiki/Adjoint_functors Cheers, BB

On 10/25/2010 9:49 AM, spir wrote:
'map' is a bad parameter name as it 1. reuses the builtin name and 2. uses it for a parameter (the mapped function) of that builtin. ...
(2) I think the same about join: should be "seq.join(sep)" since for me the object on which the method applies is seq, not sep.
The two parameters for the join function are a string and an iterable of strings. There is no 'iterable of strings' class, so that leaves the string class to attach it to as a method. (It once *was* just a function in the string module before it and other functions were so attached.) The fact that the function produces a string is another reason it should be a string method. Ditto for bytes and iterable of bytes. -- Terry Jan Reedy
participants (8)
-
Boris Borcic
-
Guido van Rossum
-
Laurens Van Houtven
-
Masklinn
-
Nick Coghlan
-
spir
-
Steven D'Aprano
-
Terry Reedy