Enhancing iterator objects with map, filter, reduce methods

Hi, I'd like to revive this thread after what seemed like a decade and see where this might take us 😃 I like this idea that the OP suggested but I'd like to extend this to all iterator objects (range_iterators, list_iterators etc.). 📌Idea Iterables to expose the .__iter__() method in iterable as .iter(). Iterators to implement 'transformational' functions like map, filter, flat_map, 'reductional' functions like reduce, sum, join, and 'evaluate' functions like to_list and to_set. ( [1,2,3].iter() .map(lambda x: x+1) .filter(lambda x: x % 2 == 0) .to_list() ) 📌Why? 1. It is readable in a manner that is procedural and dataflow centric. At one glance, it is easy to reason about how our data gets transformed - we start with the subject, our data, then we perform a sequence of transformations. The previous way of doing this: list( filter( lambda x: x%2==0, map( lambda x: x+1, iter([1,2,3])))) incurs a lot of cognitive overload. We could separate them into different lines: it = iter([1,2,3]) it = map(lambda x: x+1, it) it = filter(lambda x: x%2==0, it) list(it) but I would argue that there is a lot of repetition. 2. There are existing communities which deal with frameworks that focus on data transformations. This includes PySpark and pandas. 3. It is conventional - many modern languages like Rust, Scala, JavaScript and Kotlin have a similar API like this. 4. The existing map, filter APIs are not so commonly used (this is the impression I have). Apparently list comprehension appears more pythonic on StackOverflow posts. This extended iterator API could possibly 'revive' the use of lazy evaluation. 5. This API is 'flat', rather than the 'nested' map, reduce and filter. (I guess we can argue that "Flat is better than nested"?) 📌Why not? 1. Method chaining is uncommon and not really well-liked. I have this impression a lot among the community. Could somebody please explain why? 2. A lot of 'lambda' keywords. Agreed. (However, this would tie in well with the alternate lambda syntax proposal https://mail.python.org/archives/list/python-ideas@python.org/thread/QQUS35S... .) 📌On list comprehension vs. method chaining I don't think the aim of this API should be to replace list comprehension and the like. Rather it offers programmers an alternative pattern to transform and reason about their data.

On Tue, Nov 23, 2021 at 1:18 AM Remy <raimi.bkarim@gmail.com> wrote:
Here's the equivalent as a list comprehension, which I think looks better than either of the above: [x + 1 for x in [1,2,3] if x % 2 == 0]
Comprehensions ARE lazily evaluated. If you use to_list at the end (or call list(it), either way), that's equivalent to a list comp; if you don't, it's equivalent to a generator expression. Either way, Python doesn't construct each intermediate list before moving on to the next one.
📌On list comprehension vs. method chaining I don't think the aim of this API should be to replace list comprehension and the like. Rather it offers programmers an alternative pattern to transform and reason about their data.
It's not too hard to create your own dataflow class if you want one. It can start with any arbitrary iterable, and then have your map and filter methods just the same. Cool trick: you can even call your class iter! :) class iter: _get_iterator = iter # snapshot the original def __init__(self, basis): self.basis = self._get_iterator(basis) def map(self, func): return type(self)(map(func, self.basis)) # etc def __iter__(self): return self def __next__(self): return next(self.basis) You should then be able to method-chain onto your iter constructors. Personally, I wouldn't use this sort of thing (comprehensions are far easier to work with IMO), but if you want it, the power is in your hands. ChrisA

Chris Angelico writes:
One thing I noticed when implementing this class (yours is better, so I'm not posting mine :-) is that you and I both implemented .map in the obvious way for this use case, but the map function takes multiple iterables. On the other hand, filter takes only one iterable argument. Obviously, you can implement multifilter def multifilter(func, *iterables): filter(lambda x: func(*x), zip(*iterables)) I think generalizing to this is a YAGNI, since it's so simple. Also, returning an iterable of tuples may not be the right thing. That is, you might want it to return a tuple of iterables, but that would be messy to implement, and in general can't be done space-efficiently I think. This apparently is a "no one ever needed it." Changing map to take a sequence of iterables is a non-starter, since that would be backward incompatible. There's also implementing zip's strict argument, eg, def zippymap(func, *iterables, strict=False): return map(lambda x: func(*x), zip(*iterables, strict)) and corresponding zippymappers for any other mappers (including filter). This seems like it might be useful extension to the functions in the stdlib for the same reason that it's useful for zip itself. Even though it's so easy to implement in terms of zip, it would be more discoverable as a documented argument to the functions. Comments? Steve

On Tue, Nov 23, 2021 at 7:47 PM Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Extending map to more than just one function and one iterable is done in different ways by different languages. I don't think it necessarily needs to be implemented the same way in a pipeline as it is in a stand-alone map function; the pipeline is, by its nature, working with a single iterable, so if its map method took anything more than a single argument, it could just as easily be interpreted as "pass these arguments to the function" (so you could >>iter(x).map(int, 16)<< to convert hex to decimal) rather than as extra iterables. Which is why I kept it simple and didn't allow more args :)
Agreed, not really a lot of point.
I think you're right there.
Given that I don't actually want a pipeline like this, I'm not the best one to ask, but I would strongly favour ultra-simple APIs. ChrisA

Chris Angelico writes:
On Tue, Nov 23, 2021 at 7:47 PM Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Ah, but I changed the subject here. Sorry about not making that clear. This isn't a method on a dataflow, it would be a a change to map itself. Steve

On Wed, Nov 24, 2021 at 12:39 AM Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Oh, oh, gotcha. That may be worth doing, yeah. It doesn't make a lot of sense in the pipeline form, but map() as it currently is could benefit from that. Prior comment withdrawn as it was responding to what you weren't saying :) ChrisA

Chris Angelico wrote:
That's not equivalent. You produce [3] instead of [2, 4]. So you rather proved that the proposal does have merit, as it's apparently easy to get the list comprehension wrong. Actually equivalent list comprehension: [x for x in [1,2,3] for x in [x + 1] if x % 2 == 0] Or spread across lines: [x for x in [1,2,3] for x in [x + 1] if x % 2 == 0]

On Fri, Dec 24, 2021 at 10:22:29AM -0000, Stefan Pochmann wrote:
Chris Angelico wrote:
Or simpler still: [x + 1 for x in [1,2,3] if x % 2 != 0] Or we can use the mighty walrus: [y for x in [1, 2, 3] if (y:=x+1) % 2 == 0] (although y leaks out of the comprehension). Using a hypothetical pipeline syntax with an even more hypothetical arrow-lambda syntax: [1, 2, 3] | map(x=>x+1) | filter(a=>a%2) | list

Stefan Pochmann writes:
Steven D'Aprano wrote:
But you didn't, really. As-is would use genexps to "inline" the map and filter calls: list(x for x in (y + 1 for y in [1, 2, 3]) if x % 2 == 0) which in this case is much harder to get wrong, and still reads as well as, if not better than, the original "chained methods of dataflow object" idiom. Sure, it's a little verbose, but I think it's time for proponents to find persuasive examples, preferably not having nice genexp implementations (and in the stdlib). I don't recall if Steven said or I'm inferring that he'd like the whole proposal better if there were a persuasive "pipeline" syntax proposal with it, but that's where I am. I don't find "method chaining as pipeline" to be an attractive syntax. That's a big IMO FYI, of course YMMV. Yet another Steve

Stephen J. Turnbull wrote:
But you didn't, really.
Yes I did, really. Compare (in fixed-width font): ( [x source [1,2,3].iter() for x in [1,2,3] increment .map(lambda x: x+1) for x in [x + 1] if even .filter(lambda x: x % 2 == 0) if x % 2 == 0] .to_list() ) Same source and two transformations, executed in the same order, written in the same order. Steven executes the increment before the filter, keeps odds instead of evens, and wrote the increment even before the source. You do have the same steps (increment and keep evens) as the original and execute them in the same order, but you *wrote* the first transformation *before* the source, nested instead of flat, which reads inside-out in zig-zag fashion. Not that bad with so few transformations, but if we want to do a few more, it'll likely get messy. While the OP's and mine will still read straight from top to bottom.

Stefan Pochmann writes:
Stephen J. Turnbull wrote:
But you didn't, really.
Yes I did, really. Compare (in fixed-width font):
I had no trouble reading the Python as originally written. Obviously you wrote a comprehension that gets the right answer, and uses the bodies of the lambdas verbatim. The point is that you focus on the lambdas, but what I'm interested in is the dataflows (the implicitly constructed iterators). In fact you also created a whole new subordinate data flow that doesn't exist in the original (the [x+1]). I haven't checked but I bet that a complex comprehension in your style will need to create a singleton iterable per source element for every mapping except the first. One point in favor of doing this calculation with chained iterators is to avoid creating garbage. The nested genexp I proposed creates the same iterators as the proposed method chain, and iterates them the same way, implicitly composing the functions in a one-pass algorithm.
We are well-used to reading parenthesized expressions, though. Without real-world examples, I don't believe the fluent idiom has enough advantages over comprehensions and genexps to justify support in the stdlib, especially given that it's easy to create your own dataflow objects. We don't have a complete specification for a generic facility to be put into the stdlib, except the OP's most limited proposal to add iter, map, filter, and to_list methods to iterators (the first and last of which are actually pointless). But I don't think that would get support from the core devs. It's also not obvious to me that the often awkward comprehension syntax that puts the element-wise transformation first isn't frequently optimal. In log(gdp) for gdp in gdpseries economists don't really care about the dataflow, as it's the same in many many cases. We care about the log transformation, as that's what differentiates this model from others. So putting the transformation before the data source makes a lot of sense for readability (in what is admittedly a case I chose to make the point). I'll grant that putting the source ("x in iterable") between the mapping ("f(x) for") and the filter ("if g(x)") does create readability issues for nested genexps like the one I suggested, if there are more than one or two such filters.
Not that bad with so few transformations, but if we want to do a few more, it'll likely get messy.
If it were up to me (it isn't, but I represent at least a few others in this), "likely" doesn't cut it. I mean, I already admitted that as a *possibility*. We want to see a specification, and real applications that benefit *more* from a generic facility like that proposed than they would from application-specific dataflow objects. Steve

In fact you also created a whole new subordinate data flow that doesn't exist in
Stephen J. Turnbull wrote: the original (the [x+1]). I bet that a complex
comprehension in your style will need to create a singleton iterable per source element for every mapping except the first.
I don't think so. Sounds like you missed that `for x in [x + 1]` is now treated as `x = x + 1` not only by humans but also by Python, see the first item here: https://docs.python.org/3/whatsnew/3.9.html#optimizations From disassembling my expression (in Python 3.10): Disassembly of <code object <listcomp> at 0x00000207F8D599A0, file "<dis>", line 1>: 1 0 BUILD_LIST 0 2 LOAD_FAST 0 (.0) >> 4 FOR_ITER 14 (to 34) 6 STORE_FAST 1 (x) 8 LOAD_FAST 1 (x) 10 LOAD_CONST 0 (1) 12 BINARY_ADD 14 STORE_FAST 1 (x) 16 LOAD_FAST 1 (x) 18 LOAD_CONST 1 (2) 20 BINARY_MODULO 22 LOAD_CONST 2 (0) 24 COMPARE_OP 2 (==) 26 POP_JUMP_IF_FALSE 2 (to 4) 28 LOAD_FAST 1 (x) 30 LIST_APPEND 2 32 JUMP_ABSOLUTE 2 (to 4) >> 34 RETURN_VALUE That's *more* efficient than your separate iterators having to interact, not less. I also tried timing it, with `source = [1,2,3] * 10**6`. Mine took 1.15 seconds to process that, yours took 1.75 seconds. Turning your outer one into a list comp got it to 1.58 seconds, still much slower.

On Mon, Dec 27, 2021 at 07:19:12PM -0000, Stefan Pochmann wrote:
I'm afraid that is doubly wrong. Humans still read `for x in [x + 1]` as a loop, because that's what it is. So says at least this human. And Python the language makes no promise about that being optimized into a simple assignment. Only CPython 3.9 and above does so. Other implementations may or may not do so, and being a mere optimization, it could be removed at any time without notice if it were found to be interfering with some other feature or more powerful optimization. Optimizations are great, but we must be careful not to treat them as language features unless they are documented as such. None is a singleton and always will be, so it is safe (and recommended!) to test `is None`. But 211 may or may not be a singleton, and so testing `is 211` is risky, even it it happens to work for you under some circumstances. Caching of small ints is an implementation-dependent optimization, not a language feature. And so is the comprehension inner loop speed-up. You can rely on it being fast if you like, but that ties you to a specific implementation and version. -- Steve

Stefan Pochmann writes:
OK, I missed that change.
That's *more* efficient than your separate iterators having to interact, not less.
And more efficient than the separate iterators that would be created by method chaining in Chris's implementation AIUI. Whose side are you on here? ;-) Jokes aside, my opinion matters almost not at all, but I think a lot of the core devs and in particular the SC will want to see multiple examples of existing production code that uses alternative idioms such as comprehensions, generators, and mapping functions that would be improved by the proposed change. I doubt you'll find many in the stdlib because Guido and most other core devs have never been fans of map and filter (even though eventually those functions moved from functools to builtins). Then, unless you can present implementations of .map and .filter that transform iter(iterable).map(f).filter(g) to (x for x in iterable for x in [f(x)] if g(x)) as optimized by Python 3.9+, you'll have to deal with the argument that even though "readability counts", the better performance of some of the other idioms reduces the applicability of the method-chaining idiom quite a lot in production code. Steve

On Tue, Dec 28, 2021 at 12:49:13AM +0900, Stephen J. Turnbull wrote:
I believe that Serhiy has optimized the case where a comprehension loops over a singleton list or tuple. If you go back to the pre-walrus operator discussion (PEP 572) one of the alternatives was to use a second loop. [func(y) for x in items for y in [x+1] if condition(y)] # inner loop ...........^^^^^^^^^^^^^^ The second loop was not mentioned in the PEP, but it was discussed during the mega-threads (note plural). If my recollection serves me correctly, at some point Serhiy optimized the inner loop away. If you inspect the output of: import dis dis.dis("[y for x in items for y in [x]]") you will see that the list [x] is never actually created, and the for y loop is turned into just an assignment to y. But this is a CPython implementation detail, not a language promise, so it may not apply to other Python implementations. [...]
We are well-used to reading parenthesized expressions, though.
Just because we're used to them doesn't make them easy to read. If only 14th century mathematicians had discovered reverse Polish notation, instead of using + as a short-hand for the Latin "et" (and), all those stupid internet memes arguing about the value of 6÷2(2+1) (which is ambiguous in standard maths notation, valid answers are 1 or 9) would be obsolete. We wouldn't need brackets around expressions, parsers would be much simpler, and big complex expressions with lots of function calls would be much easier to understand. On the other hand, slice notation (which is nice) would seem bizarre. And currying in Haskell would be much harder.
My brain can't parse that sentence. Are you for it or against that "often awkward comprehension syntax"?
Julia (if I recall correctly) has a nice syntax for automatically turning any function or method into an element-wise function: # Take the log of one value. log(gdp) # Take the log of each value in the series. log.(gdpseries) -- Steve

On Mon, Dec 27, 2021 at 4:07 PM Steven D'Aprano
Julia (if I recall correctly) has a nice syntax for automatically turning any function or method into an element-wise function:
And numpy has an even easier one: np.log(a_scalar) np.log(an_array) I’m only being a little bit silly. In fact, array-oriented operations are really nifty, a heck of a lot easier to parse than map, or comprehensions, etc. Filtering can be a bit ugly, but not too bad. MATLAB has a duplicate set of operators: the matrix ones and element wise ones. We struggled for years how to do that in Python — until someone realized that matrix multiplication is the only actually useful matrix operation. Hence @ and now we’re good. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, Dec 27, 2021 at 10:15:05PM -0800, Christopher Barker wrote:
Ah, but numpy has to use their own special log function that does something like this: # Obviously just pseudocode def numpy.log(obj): if obj is a scalar: return log(obj) # Scalar version. else: # Apply the log function to every element of the # vector, array or matrix. elements = [log(x) for x in obj] return type(obj)(elements) and has to repeat this boilerplate for every single function that operates on both scalars and vectors etc. Whereas Julia allows you to write the scalar log function and then *automatically* apply it to any vector, with no extra code, just by using the "dot-call" syntax log.(obj) https://docs.julialang.org/en/v1/manual/functions/#man-vectorized And it works with operators too. -- Steve

On Tue, Dec 28, 2021 at 1:15 AM Christopher Barker <pythonchb@gmail.com> wrote:
I have an @elementwise decorator I use for teaching decorators. I could dig it up, but everyone here could write it too. The main work is simply returning the same kind of collection that was passed in (as opposed to, e.g. always a list). But that's 2 lines of work. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Tue, Dec 28, 2021 at 5:31 AM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
and numpy has a vectorize() function, which, I only just realized, can be used as a decorator -- as long as you're happy with the defaults.
Ah, but numpy has to use their own special log function that does something like this:
# Obviously just pseudocode def numpy.log(obj): if obj is a scalar: return log(obj) # Scalar version. else: # Apply the log function to every element of the # vector, array or matrix. elements = [log(x) for x in obj] return type(obj)(elements) well, yes, numpy provides special functions, but they look more like this: def numpy.log(obj): obj = np.asarray(obj) return np._log(obj) where np._log is written in C. (yes, np.vectorize does indeed wrap the input function in a loop) Anyway, the point is that numpy works by having an nd-array as a first class object -- I suppose it's "only" for performance reasons that that's necessary, but it's why having a special notation to vectorize any function wouldn't be that helpful. it doesn't have to check "is this a scalar?", because ndarrays can be any (well up to 32) dimensionality -- a scalar, 1D, 2D, etc .... And I'm a bit confused as to why Julia needs that -- it's also based on arrays-as-first-class-objects, but I haven't looked at Julia for ages. Having said that, I do think that the vectorized approach makes for more readable, and less error prone, code for a large class of problems. I often use numpy when performance is a non-issue. In fact, numpy is slower than "pure python" for very small arrays, but I still use it. So having a built-in way to do vectorized operations would be pretty cool. I've often thought that a "numpython" interpreter would be pretty nifty -- it would essentially make ndarrays builtins, so that you could apply all sorts of nifty optimizations at run time. But I've never fleshed out that idea. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, Dec 29, 2021 at 4:54 AM Christopher Barker <pythonchb@gmail.com> wrote:
I'm not sure about that.
If you pass a scalar to log(), you get back a scalar, not an array. Whereas asarray would return an array with no dimensions. Not sure how significant that is, but it does still distinguish between values and arrays. ChrisA

On Tue, Dec 28, 2021 at 10:19 AM Chris Angelico <rosuav@gmail.com> wrote:
well, yes, there is a odd wart in numpy that there are zero-dimensional arrays, and there are scalars, which almost, but not quite the same: In [81]: arr = np.array(3.0) In [82]: arr.shape Out[82]: () In [83]: scalar = np.float64(3.0) In [84]: scalar.shape Out[84]: () In [85]: len(arr) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-85-23f199ad2330> in <module> ----> 1 len(arr) TypeError: len() of unsized object In [86]: len(scalar) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-86-d68a51ddf681> in <module> ----> 1 len(scalar) TypeError: object of type 'numpy.float64' has no len() In [87]: 5.0 * arr Out[87]: 15.0 In [88]: 5.0 * scalar Out[88]: 15.0 But they are almost interchangeable -- the reason the distinction is there at all is for some internal numpy structural / legacy reasons. I think it boils down to numpy scalars being more-or-less interchangable with the built in python number types (https://numpy.org/neps/nep-0027-zero-rank-arrarys.html) Note that if you index into an ndarray, you get one less dimension, until you get to zero dimensions, which is a scalar. In [98]: arr2 = np.ones((3,4)) In [99]: arr2.shape Out[99]: (3, 4) In [100]: arr1 = arr2[0] In [101]: arr1.shape Out[101]: (4,) In [102]: arr0 = arr1[0] In [103]: arr0.shape Out[103]: () In [104]: arr0[0] --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-104-29631edfe735> in <module> ----> 1 arr0[0] IndexError: invalid index to scalar variable. Anyway, the point is that the distinction between scalar and array is the same as between 1D array and 2D array, and that individual functions and operators don't need to be different for operating on different dimensions of arrays. Not sure how significant that is, but it does still distinguish between
values and arrays.
While numpy is strictly making a distinction between the *type* of scalar values and arrays, it isn't making a distinction in the *concept* of a scalar value. In the context of this thread, I think the relevant point is that "array-oriented" operations are another syntactical way of expressing operations on collections of data, as opposed to various ways to spell loopsing, which is what map() and comprehensions are. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Steven D'Aprano writes:
I believe that Serhiy has optimized the case where a comprehension loops over a singleton list or tuple.
Yeah, I missed that.
We are well-used to reading parenthesized expressions, though.
Just because we're used to them doesn't make them easy to read.
I didn't say that. I said that it's not clear to me that that (as yet unproposed) change has big enough advantages to be added to the stdlib, and being used to parentheses is one reason for that.
Why not both? Sometimes it's awkward, sometimes it's optimal.
Julia (if I recall correctly) has a nice syntax for automatically turning any function or method into an element-wise function:
Sure, but syntax is not (yet) what we're talking about here.

On Sun, 26 Dec 2021 at 14:19, Steven D'Aprano <steve@pearwood.info> wrote:
What is the pipeline syntax like indeed? It looks as if your ``|`` is an operator which produces callable objects, e.g., [1, 2, 3] | map such that calling it like [1, 2, 3] | map(x=>x+1) will be equivalent to map(x=>x+1, [1, 2, 3]) except that <an object> | list is apparently supposed to be a list without being called. But you might have actually meant to call it like [1, 2, 3] | map(x=>x+1) | filter(a=>a%2) | list() to get the list. The character ``|`` is OK with [1, 2, 3] , but it's already given a meaning as an operator e.g., with {1, 2, 3} . Can the syntax separate the different uses? I suppose it may be a new operator that you want. I thought some people had already essentially proposed an operator version of functools.partial although [1, 2, 3] | map will not exactly be equivalent to partial(map, [1, 2, 3]) because it's not map([1, 2, 3], x=>x+1) that you want. You want the arguments to be in a different order, but that's the only difference. Best regards, Takuo Matsuoka

On 23/11/21 3:15 am, Remy wrote:
Iterators to implement 'transformational' functions like map, filter, flat_map, 'reductional' functions like reduce, sum, join, and 'evaluate' functions like to_list and to_set.
This would place a burden on all iterators to implement a large and complex interface. This goes directly against the philosophy of Python protocols, which is to be as minimal as possible. Do one thing, and do it well. It would also be a huge backwards-incompatible change. And where do you stop? You've picked an arbitrary subset of things one might want to do with an iterator. Why those particular ones? What about the contents of the itertools module? Should they be included too? Why or why not? -- Greg

Hi Remy, On Mon, Nov 22, 2021 at 02:15:08PM -0000, Remy wrote:
Hi, I'd like to revive this thread after what seemed like a decade and see where this might take us 😃
Reviving old threads from a decade ago is fine, if something has changed. Otherwise we're likely to just going to repeat the same things that were said a decade ago. Has anything changed in that time? If not, then your only hope is that people's sense of what is Pythonic code has changed. Python is a purely object-oriented language that does not enforce, or even prefer, object-oriented syntax. We prefer procedural syntax (functions) in many cases, especially for protocols. What I mean by this is: - all values in Python are objects; there are no "unboxed" or machine values; - but we don't force everything to use "object.method" syntax when functions are better. We even have a FAQ for why Python is designed this way: https://docs.python.org/3/faq/design.html#why-does-python-use-methods-for-so... You might also get some insight from a classic critique of Java: https://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html Note that, *technically*, Python is a "kingdom of nouns" language too. Everything we have, including functions and methods and even classes, is an object. But you will rarely see a `ThingDoer.do` method when a `do` function is sufficient. You want to write: mylist.iter() But that's just a different spelling of: iter(mylist) Instead of: any_iterable.to_list() just write: list(any_iterable) `list` already knows how to iterate over any object which provides either of the two iterable protocols: - the iterator protocol; - or the sequence protocol; so your classes just need to implement one protocol or the other, and they will *automatically and for free* get the functional equivalent of all of your methods: - to_list, to_tuple, to_set, to_frozenset, iter, map, reduce, filter... without any extra work, or bloating your class' namespace and documentation with dozens or hundreds or thousands of methods that you don't use. And most importantly, since you cannot possibly anticipate every possible container class and functional process, you don't have to worry about all the methods you left out: - mapreduce, union, product, to_deque ... Since you cannot provide a dot-method for every possible function your consumers may want, you are never going to eliminate procedural syntax: # Why is there no to_rainbox_forest method??? obj = rainbox_forest(obj) so you might as well embrace it and stop fighting it. Protocol-based functions are better than methods. Any function that operates on an iterable object will automatically Just Work. -- Steve

In my previous post, I suggested that the status quo: iter(myobj) is superior to the suggested method-based syntax: myobj.iter() I stand by that. But I will give one exception, and suggest that so long as we don't have a good syntax for that, this request will never go away for long: Function and method chaining. Procedural/function syntax for chains of function calls suck. It is too verbose (heavy on parentheses) and written backwards: print(sort(filter(transform(merge(extract(data)), args)))) To understand it, you have to read forward to find the *last* function call, which is actually the *first* call, then read backwards. An alternative that works with mutable data is verbose and expensive in vertical real estate: data.extract() data.merge() data.transform(args) data.filter() data.sort() data.print() There is a powerful design pattern to fix this, that works great with immutable data and functions: https://martinfowler.com/articles/collection-pipeline/ Shells such as bash have an excellent syntax for this: data | extract | merge | transform args | filter | sort | print Method chaining is good too: data.extract().merge.transform(args).filter().sort().print() except for the downsides discussed previously. It would be very, very nice if we had syntactic sugar for that chain of function calls that would work on general functions and methods. A long time ago, I wrote a helper class to do that: https://code.activestate.com/recipes/580625-collection-pipeline-in-python/?i... Heavy data processing frameworks and libraries like Pandas already use method chaining extensively. It would be great if we could chain function calls. -- Steve

(Fyi I am both 'Remy' and 'Raimi bin Karim', I don't know how that happened). 📌Goal Based on the discussion in the past few days, I’d like to circle back to my first post to refine the goal of this proposal: to improve readability of chaining lazy functions (map, filter, etc.) for iterables. This type of chainingis otherwise known as the collection pipeline pattern (thank you Steve for the article by Martin Fowler). Also, the general sentiment I am getting from this thread is that chaining function calls is unreadable. 📌Not plausible Extending the iterobject, based on previous discussions. 📌Proposed implementation Earlier in the thread, Chris proposed a custom class for this kind of pipeline. But what if we exposed this as a Python module in the standard library, parking it under the group of functional programming modules? https://docs.python.org/3/library/functional.html. 📜 Lib/iterpipeline.py (adapted from Chris's snippet) class pipeline: def __init__(self, iterable): self.__iterator = iter(iterable) def __iter__(self): return self.__iterator def __next__(self): return next(self.__iterator) def map(self, fn): self.__iterator = map(fn, self.__iterator) return self def filter(self, fn): self.__iterator = filter(fn, self.__iterator) return self def flatten(...): ... ... 📜 client_code.py from iterpipeline import pipeline ( pipeline([1,[2,3],4]) .flatten(…) .map(…) .filter(…) .reduce(…) ) 📌Design At first sight it might seem ridiculous because all we are doing is reusing builtin methods and functions from itertools. But that is exactly what the iterpipeline module offers — a higher-level API for the itertools module that allows users to construct a more fluent collection pipeline. The cons of this design is of course a bloated class which Steve previously mentioned. 📌Up for discussion * Naming * Implementation of the pipeline class * How to evaluate the pipeline. list(…) or to_list(…) * What methods to offer in the API and where do we stop (we don't have to implement everything) 📌On being Pythonic I don’t think we can say if it’s Pythonic because filter(map(…, …), …) wasn’t really a fair fight. But an indication of likeability lies largely in libraries for data processing like PySpark. There are other method-chaining functional programming libraries that have also gained popularity like https://github.com/EntilZha/PyFunctional. 📌On the collection pipeline pattern Because the collection pipeline pattern is more accessible now, I believe it would be a fresh perspective for Python programmers on how they view their data, and how to get to the final result. It becomes an addition to their current toolbox for data flow which is currently list comprehensions and for-loops. 📌On relying on 3rd party libraries instead Personally, this kind of response would make me a little sad. I started out this proposal because I feel strongly about this — I don’t want my fellow Python programmers to be missing out on this alternative way of reasoning about their data transformations. I learnt about this pattern the hard way. After Python, I picked up JavaScript and Kotlin at work. And then Rust as a hobby. Then I learnt PySpark. And I realised that these languages and frameworks had something in common — a fluent pipeline pattern. It just feels different to reason about your data in a sequential manner, rather than jumbled up clauses (no offence, I love list comprehension!). And then it hit me — I actually never thought about my data in this manner in Python. As a language that is commonly used for data processing in this era, Python's missing out this feature. So this is more of a heartfelt note rather than an objective one — I would love my fellow Python programmers to be exposed to this mental model, and that could only be done by implementing it in the standard library.

On Sat, Nov 27, 2021 at 1:39 AM Raimi bin Karim <raimi.bkarim@gmail.com> wrote:
I'm not certain that being in the standard library exposes Python programmers to something. Without looking up any references, which of these can you do with just the Python standard library (shelling out to external programs doesn't count)? * Get the dimensions of your terminal window * Connect to a PostgreSQL database * Extract login information from an FTP URL * Hash and verify passwords with bcrypt * Build an SMTP server * Build an FTP server * Parse a WAV file * Parse an MP3 file * Change the value of 3 * Build an MSI file (Windows installer) * Convert a file from UTF-7 to UTF-8 * Enumerate all Unicode characters that represent the digit 6 More importantly: If you didn't know that one of these was possible, would spending time writing Python code have exposed you to it? For instance (sorry, spoilers), the last one is most definitely possible, but if you were parsing input, would you think to check for anything other than the ASCII character '6'? Once you've thought of something, it's easy to think "it'd be cool if Python already had this". But the exact set of iteration tools that *you* want is probably not the same as the set of iteration tools that someone else wants. It would be extremely hard for the stdlib to have the perfect set of tools available, and as soon as something isn't available, you have to appeal to the core devs to add it, then wait for a release of Python that includes it. In contrast, you can simply use your own pipeline class without synchronizing with anyone else. You can choose what to call it, what tools to make available (and can add more as you find the need), and can make your own decisions about style, like whether it should be "pipe(x) | filter(...) | sort(...)" or "pipe(x).filter(...).sort(...)" or even "pipe(x) @filter(...) @sort(...)". There's no need to convince anyone else of what you think is right - you can just go ahead and do it! BTW, if you want more iteration tools, there are plenty to choose from. The more-itertools library has a ton of really cool stuff. Should they also be available in the pipeline? If you're making your own, then absolutely yes, you can have them available. But if it's part of the standard library, it can't depend on an external module like that. ChrisA

Just a note here: On Fri, Nov 26, 2021 at 6:37 AM Raimi bin Karim <raimi.bkarim@gmail.com> wrote:
to improve readability of chaining lazy functions (map, filter, etc.) for iterables.
I think there is a slight misperception here. I've seen the term lazy used a couple times, and at least once in contrast to list comprehensions. However, there are, of course, generator comprehensions (AKA generator expressions) which are also lazy. So this is about syntax, not capability. Another note: I'm not recommending it, but we could add a bunch of things to the Iterator ABC, and then it could be available everywhere. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 11/26/2021 1:59 PM, Christopher Barker wrote:
Is that true? I'm genuinely curious. I have lots of code with is the logical equivalent of: class Foo: def __init__(self, s): self.s = s self.idx = -1 def __iter__(self): return self def __next__(self): self.idx += 1 if self.idx < len(self.s): return self.s[self.idx] raise StopIteration() Would adding something to the Iterator ABC really also add it to my class Foo? Eric

On 27/11/21 11:34 am, Eric V. Smith wrote:
Would adding something to the Iterator ABC really also add it to my class Foo?
No, your class would need changing to inherit from the Iterator ABC. This is a big problem in general with "just add it to the ABC" ideas. A huge number of existing classes don't inherit from the relevant ABC because there currently isn't any need to do so. Fundamentally, Python is not organised around the concept of ABCs the way some other languages such as Java are. Instead, it's organised around informal protocols. They're informal in the sense that there's no need to inherit from a particular class in order to conform -- you just implement the required methods and you're good to go. As a consequence, there is strong pressure to keep the number of required methods to a minimum. It also means that adding required methods to a protocol late in the life of the language is effectively impossible. -- Greg

On Fri, Nov 26, 2021 at 4:12 PM Eric V. Smith <eric@trueblade.com> wrote:
No, but if you subclasses from the ABC it would. Python ABCs are a mysterious beast. Python itself is mostly Duck Typed. So you example above it a full fledged Iterator. In this case, ABCs serve primarily as formal documentation. But they are a bit more than that, as some of them provided non-abstract functionality, that you can get by subclassing from them. And some type checking systems will check the presence of attributes in the ABCs, so that, for instance, your example would type check as an Iterator. -CHB Eric
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Fri, 26 Nov 2021 at 14:39, Raimi bin Karim <raimi.bkarim@gmail.com> wrote:
I'm somewhat ambivalent about this pattern. Sometimes I find it readable and natural, other times it doesn't fit my intuition for the problem domain. I do agree that helping people gain familiarity with different approaches and ways of expressing a computation, is a good thing. I get your point that putting this functionality in a 3rd party library might not "expose" it as much as you want. In fact, I'd be pretty certain that something like this probably already exists on PyPI, but I wouldn't know how to find it. However, just because that doesn't provide the exposure you're suggesting, doesn't mean that it "could only be done by implementing it in the standard library". This isn't a technical problem, it's much more of a teaching and evangelisation issue. Building a library and promoting it via blogs, social media, demonstrations, etc, is a much better way of getting people interested. Showcasing the approach in an application that lots of people use is another (Pandas, for example, shows off the "fluent" style of chained method calls, which some people love and some hate, that's very similar to your proposal here). It's a lot of work, though, and not the type of work that a programmer is necessarily good at. Many great libraries are relatively obscure, because the author doesn't have the skills/interest/luck to promote them. What you *do* get from inclusion in the stdlib is a certain amount of "free publicity" - the "What's new" notices, people discussing new features, the general sense of "official sanction" that comes from stdlib inclusion. Those are all useful in promoting a new style - but you don't get them just by asking, the feature needs to qualify for the stdlib *first*, and the promotion is more a "free benefit" after the fact. And in any case, as others have mentioned, even being in the stdlib isn't guaranteed visibility - there's lots of stuff in the stdlib that gets overlooked and/or ignored. Sorry - I don't have a good answer for you here. But I doubt you'll find anyone who would be willing to help you champion this for the stdlib. Paul

It's supported with several syntaxes in macropy ( https://pypi.org/project/MacroPy/) but I remember seeing it in a more serious (for lack of a better term) package too, I just can't remember which one. E On Fri, 26 Nov 2021 at 19:41, Paul Moore <p.f.moore@gmail.com> wrote:

Ah yes, it's pipeop ! https://pypi.org/project/pipeop/ On Fri, 26 Nov 2021 at 22:39, Evpok Padding <evpok.padding@gmail.com> wrote:

On Fri, Nov 26, 2021 at 07:40:43PM +0000, Paul Moore wrote:
I don't think that even the most mad shell-scripting fanboi would say that pipelining is the One True software pattern and we should solve all problems that way :-) Collection pipelines are a natural and readable solution to *some* problems, not all, but in Python code it is difficult to use collection pipelines, so we end up writing the code backwards using function-call syntax. # The algorithm: filter, process, sort, print # data | filter something | process arg | sort | print # In Python: print(sort(process(filter(data, something), arg)))
I get your point that putting this functionality in a 3rd party library might not "expose" it as much as you want.
I fear that we don't yet have a good sense of the right syntax for this and how it will interact with the rest of Python syntax. As much as I love pipelines, I think that it's too early to push for this feature in the language. We need more experiments with syntax and functionality first. To me, the most natural syntax looks like this: value | function *args, **kwargs equivalent to `function(value, *args, **kwargs)` but of course we've already used the pipe for bitwise-or and set intersection. `>>` would be another equally good operator. I don't really like `|>` as an operator. If we were to invent a new operator, I'd prefer `->`. We can experiment by providing a wrapper class that takes a function, method or other callable and gives them a `__ror__` method for the pipe, and a `.p` (for partial) method for the partial application: value | Wrapper(function).p(*args, **kwargs) --> partial(function, *args, **kwargs)(value) That's sufficient for experimentation, but needing that wrapper adds too much friction. Ultimately nobody is going to use this idiom to its full potential until it works with arbitrary functions and callables without the wrapper. And by experiment, I don't mean experiment in the stdlib. I think that, like pathlib, this needs a few years of development outside the stdlib to mature. https://www.python.org/dev/peps/pep-0428/
I would be willing to help design an API and write a PEP but I would *not* champion it for 3.11. I think that a premature attempt to add it to the language would doom it to premature rejection. -- Steve

On Sat, Nov 27, 2021 at 02:58:07AM -0000, Raimi bin Karim wrote:
This syntactic sugar imo is powerful because it's not limited to iterables but generalises to possibly any object.
Indeed. There is no reason to limit pipelines to only collections, any object which can be transformed in any way will work.
But I guess since method chaining (for collection pipeline) is more commonplace across many languages, it might be easier to catch on.
We should be careful about the terminology. Method chaining and pipelining are related, but independent, design patterns or idioms: Method chaining or cascading, also called fluent interfaces, relies on calling a chain of methods, usually of the same object: obj.method().foo().bar().baz().foobar() This is very common in Python libraries like pandas, and in immutable classes like str, but not for mutable builtins like list. So it is very simple to implement chaining in your own classes, by having your methods either return a new instance, or by returning self. Just don't return None and you'll probably be fine :-) Pipelining involves calling a sequence of independent functions, not necessarily methods: obj | func | spam | eggs | cheese | aardvark In mathematical notation, using the ring operator for function composition, that is the same as: (func∘spam∘eggs∘cheese∘aardvark)(obj) In concatenative languages like Factor or Forth, you would write it in reverse polish notation (no operator required): obj func spam eggs cheese aardvark compared to regular function notation, where functions are written in prefix order, which puts them in the reverse of executation order: aardvark(cheese(eggs(spam(func(obj))))) Even though they logically go together in some ways, let's be careful to not confuse these two idioms. Note that chaining, pipelining and function composition go together very well: (obj.method().foo() | func∘spam | eggs).bar().baz() executes from left-to-right, exactly as it is written. (Assuming that the ring operator has a higher precedence than the pipe operator, otherwise you can use parentheses.) Contrast how well that reads from left to right compared to: eggs(spam(func(obj.method().foo()))).bar().baz() where the order of executation starts in the middle, runs left to right for a bit, then back to the middle, runs right to left, then jumps to the end and runs left to right again. -- Steve

I like the "obj -> func1 -> func2” idiom If func1 and func2 take only one argument. If func1 takes two arguments (arg1, arg2), it would be like the following: obj -> func1(arg2) -> func2. Suppose you want to insert the returned object not as a first argument. We could do something like the following: obj -> func1(arg1, ?) -> func2. The returned object will be captured in ?, which happens to be the second argument. If ? is the first positional argument and we only have one ? to pass, it can be omitted. That is, obj -> func1(arg2) -> func2 is equivalent to obj -> func1(?, arg2) -> func2(?) If you don’t see the “?”, you can always assume the returned object is used as the first argument because each chained function needs at least one “?” explicitly or implicitly. This implies chained functions need to take at least one argument which makes sense because we want them to transform the data we pass in. If func1 takes zero argument, then … obj -> func1 -> func2 will throw this error TypeError: func1() takes 0 positional arguments but 1 was given, which is the implicit ?. We can use chaining/piping with keyword arguments. obj -> func1(arg1, arg2=?) -> func2 Suppose our only_kw_func signature like this: def only_kw_func(*, arg1, arg2): … , then the following will work. obj -> only_kw_func(arg1=?, arg2=arg2) -> func2 We can probably pass only part of the returned object with pattern matching. If obj is a tuple of three elements, then the third element will be the returned object in the following expression: obj -> only_kw_func(arg1=(_, _, ?), arg2=arg2) -> func2. The “?” captures what we want to pass. If the pattern does not match, a missing argument error will occur. You can pass the returned object or part of it using pattern matching in multiple arguments because each chained function needs at least one “?” explicitly or implicitly. But now you cannot omit ? for the first positional argument. Or, missing argument error will occur. obj -> func1(?, ?) -> func2. obj will be captured twice in func1; one for arg1 and another for arg2. Let’s do a contrived example. Let’s say our obj is (1, 2, [1, 2, 3]) def func1(x: int, y: int, seq: Sequence[int]) -> bool: return (x in seq) and (y in seq) def func2(contains: bool) -> str: if contains: return “Success” else: return “Failure” obj -> func1((?, _, _), (_, ?, _), (_, _, ?)) -> func2 (1, 2, [1, 2, 3]) => func1(1, 2, [1, 2, 3]) => func2(True) => “Success” Abdulla

Forget about pattern matching in the previous email. The ? should always refer to the whole passed object. You can further manipulate the passed/returned object. Consider the following [1.32, 1.1, 1.4] -> map(round, ?) -> map(operator.mul, ?, (? -> list -> len) * [2]) -> tuple [1.32, 1.1, 1.4] => map(round, [1.32, 1.1, 1.4]) -> map(operator.mul, Map(1, 1, 1), (Map(1, 1, 1) => list(Map(1, 1, 1)) => len([1, 1, 1] => 3) * [2]) => tuple(Map(2, 2, 2)) => (2, 2, 2) Another example: [1, 2, 3] -> len -> operator.add(3) is equivalent to [1, 2, 3] -> operator.add(len(?), 3). Of course the first option looks better. Maybe the pattern matching should be flagged with something (maybe “m? pattern” as in (m)atch passed obj (?) with this pattern) and if it encounters “??”, that will be captured and passed for that particular argument. [1, 2, 3] -> operator.add(m? [_, ??, _], 2) # match [1, 2, 3] with [_, ??, _] and pass the captured ?? to the argument [1, 2, 3] => operator.add(2, 2) => 4 The previous email examples for pattern matching would look like this.. Obj is a tuple with three elements, the third element will be passed for arg1 below. obj -> only_kw_func(arg1=m? (_, _, ??), arg2=arg2) -> func2 The other example would be like the following: say our obj is (1, 2, [1, 2, 3]) def func1(x: int, y: int, seq: Sequence[int]) -> bool: return (x in seq) and (y in seq) def func2(contains: bool) -> str: if contains: return “Success” else: return “Failure” obj -> func1(m? (??, _, _), m? (_, ??, _), m? (_, _, ?)) -> func2 (1, 2, [1, 2, 3]) => func1(1, 2, [1, 2, 3]) => func2(True) => “Success” obj -> func(m? ??) is equivalent to obj -> func(?)

Raimi bin Karim writes:
The thing is, the reason such a module is needed at all is that Guido decided ages ago that mutating methods should return None, and in particular they don't return self. I'm not sure why he did that, you'd have to ask him, but we respect his intuition enough that to get it in, it would help to have answers to some of the following questions in advance: 1. Is dataflow/fluent programming distinguishable from whatever it was that Guido didn't like about method chaining idioms? If so, how? 2. Is the method chaining syntax preferable to an alternative operator? 3. Is there an obvious choice for the implementation? Specifically, there are at least three possibilities: a. Restrict it to mutable sequences, and do the transformations in place. b. Construct nested iterators and listify the result only if desired. c. Both. 4. Is this really so tricky that the obvious implementation of the iterator approach (Chris's) needs to be in the stdlib with tons of methods on it, or does it make more sense have applications write one with the specific methods needed for the application? Or perhaps instead of creating a generic class prepopulated with methods, maybe this should be a factory function which takes a collection of mapping functions, and adds them to the dataflow object on the fly? 5. map() and zip() take multiple iterables. Should this feature handle those cases? Note that the factory function approach allows the client to make this decision for themselves. 6. What are the names that you propose for the APIs? They need to indicate the implementations since there are various ways to implement.
I'm with Chris on this. My experience with responding to people on mailing lists is that very few read the documentation until they need to solve a problem that way, and then they read the part that solves their problem, only. Heck, I'm the kind of person who kept a copy of Python Essential Reference at my bedside for a couple years, and *I* don't know half of what's in the stdlib any more. I don't really think putting it in the stdlib will have the promotional effect you hope. As for "only way," I think _Dataflow Programming in Python_ and _Fluent Programming in Python_ are great titles for books. Maybe you could write one of those? I'm half-joking, of course, because writing a book is not something anyone should impose on somebody else. But you do have the passion part down already. :-) (And don't forget to have a cute animal for the cover if you go O'Reilly for the publisher!)

On Sat, Nov 27, 2021 at 10:52 PM Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Pipeline programming is usually about constructing new objects, not mutating existing ones, whereas "fluent programming" can also mean method chaining (one of the other Steves explained it better than I can, search up in the thread for D'Aprano). The policy of not returning self is, from my understanding, to clarify which methods return a new thing and which mutate the existing one. It makes this sort of distinction very obvious: # Works correctly by_name = sorted(people, key=lambda p: p.name) # Clearly gives a borked result by_name = people.sort(key=lambda p: p.name) If list.sort() returned self after mutation, the second one would appear to work, but would leave the original list in a different order.
I'm sure they already exist! Actually, if you keep an eye on Humble Bundle, they often have (electronic editions of) books on sale. In fact, there's a Python bundle right now, though none of the books have those exact titles.
I've never understood the correlation of animal to title. For instance, "Using Asyncio in Python" has a frog, "Flask Web Development" has a dog, and "Think Python" has a bird. But hey, "Introducing Python" and "High Performance Python" both have snakes on the covers, so that's a plus. Maybe it's to remind us not to judge a book by its cover or something :) ChrisA

On Sun, Nov 28, 2021 at 3:06 AM Stephen J. Turnbull < stephenjturnbull@gmail.com> wrote:
yup.
Which is what I would expect for a *sort* *method*. But I guess others might not think about it and be unpleasantly surprised.
Lists could have both a sort() (sort in place) and sorted() (return a new list) method -i fact, I have done exactly that in some of my code (well, not sort, but the same idea, one method to mutate, one to return a new copy) I would like that, and it would be great to support a fluent interface in that way. WHo among us has not chained string methods together? a_str.strip().upper(). ... However, we are kind of stuck because we have a current interface to both the built ins and the ABCs that we don't want to break. And it's been Python's philosophy from the beginning to get generalized behavior from functions, not methods (e.g. len() ) -- so we have a sorted() function, but not a sorted() method on the Sequence protocol. Hmm -- I jsut had a whacky idea: As pointed out, adding a bunch of chaining methods to the Iterator ABC isn't really helpful, as it would A) potentially break / override existing classes that happen to use teh same names, and B) not provide anything for Iterators that don't subclass from the ABC. But we *could* define a new ChainedIterator ABC. Anyone could then subclass from it to get the chaining behavior, and we could (potentially) use it for many of the iterators in the stdlib. I'm not sure I'd even support this, but it's an idea. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, Nov 29, 2021 at 6:01 AM Christopher Barker <pythonchb@gmail.com> wrote:
I wouldn't support it. One of the great things about Python is that it's fairly easy to teach this progression: 1) Here's how you can do stuff in a generic way 2) Here's how you can make your own thing fit that generic way 3) Now your thing can be used by anyone else using #1 For example: 1) 1 + 2, "sp" + "am" 2) __add__ (don't worry about teaching __radd__ initially) 3) Demo() + 3 Or: 1) @lru_cache def foo(x, y) 2) def newdeco(f) 3) @newdeco def foo(x, y) Or use one with another: 1) with open(x) as f: ... 2) @contextlib.contextmanager def spaminate(x) 3) with spaminate(x) as spam: ... And this works because the simple approach IS the way to completely fit in with the protocol. Most Python protocols consist of *one* dunder method on a class. Some require two, or there are a couple that interact with each other (__eq__ and __hash__, for instance), but still, two methods isn't huge. Requiring that someone subclass an ABC as well as defining __iter__ would increase that complexity. Sometimes - in fact, often - you won't notice (because regular iteration still works), and if you subclass something else that subclasses it, no problem. But if you subclass something that happens to have a method name that collides, now you have a problem, and the order of inheritance will matter. But hey. Ideas are great, and figuring out a way to work within the existing structure would be awesome. I have a suspicion it's not possible, though, and it might just be that pipeline programming will end up needing syntactic support. ChrisA

On Sun, Nov 28, 2021 at 11:28 AM Chris Angelico <rosuav@gmail.com> wrote:
<snip of good examples of protocol-based duck typing> Sure, but I"m not sure it's THAT much harder to say: If you want to make an iterator, define a __next__ method. If you want to make a ChainableIterator, derive from abc.ChaibableIterator -- it will give you some nifty stuff for free. And some of the current ABCs do provide some out of the box functionality right now. It would take someone other than me to explore this idea further, I'm not sure I even want it. But it struck me that one of the reasons we don't want to add functionality to existing ABCs is that it would suddenly make other types that currently conform to the ABC, but don't derive from it, no longer valid. ANd with the proliferation of type checking, that would be a mess. But if we create a new ABC that extends an existing protocol, it wouldn't break anything. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sun, Nov 28, 2021 at 04:40:15PM -0800, Christopher Barker wrote:
Sure, but I"m not sure it's THAT much harder to say:
If you want to make an iterator, define a __next__ method.
Just for the record, that's not what makes an iterator. That makes a half-and-half something almost but not quite an iterator :-) To be an iterator, your object needs: 1. a `__next__` method which returns the next value; 2. and an `__iter__` method which returns self. Also we do often use the term "iterator" informally to refer to objects which have an `__iter__` method which returns an iterator.
If you want to make a ChainableIterator, derive from abc.ChaibableIterator -- it will give you some nifty stuff for free.
Pythonistas: "Duck typing for the win! You shouldn't care about inheritence, protocols are better!" Also Pythonistas: "Just inherit from X." *wink*
I think that is correct.
But if we create a new ABC that extends an existing protocol, it wouldn't break anything.
Except that way leads to a thousand ABCs, which is another sort of chaos: - Container - Container_With_Spam - Container_With_Spam_Eggs - Container_With_Spam_Eggs_DoubleSpam_and_Spam - Container_With_Spam_Eggs_DoubleSpam_and_Spam_and_Cheese - Container_With_Spam_Eggs_DoubleSpam_and_Spam_and_Cheese_Without_Eggs -- Steve

On Sun, Nov 28, 2021, 8:59 PM Steven D'Aprano
That's not quite right. An iterator only needs .__next__(), and an iterable only needs .__iter__(). Returning self is a convenient, and probably the most common, way of creating an object that is both. But exceptions exist, and remain iterators and/or iterables.

According to https://docs.python.org/3/glossary.html#term-iterator and https://docs.python.org/3/library/stdtypes.html#typeiter, iterators must implement the __iter__ method. On Sun, 2021-11-28 at 22:02 -0500, David Mertz, Ph.D. wrote:

On Sun, Nov 28, 2021, 11:43 PM Paul Bryan <pbryan@anode.ca> wrote:
From your first link: CPython implementation detail: CPython does not consistently apply the
requirement that an iterator define __iter__().
That said, I don't think the description at the link is very good. Anyway, it's different from what I teach, and also different from how Python actually behaves. E.g.:
Or anyway, what would you call `bar := Bar()` if not "an iterator?! On Sun, 2021-11-28 at 22:02 -0500, David Mertz, Ph.D. wrote:

On Mon, Nov 29, 2021, 12:16 AM Paul Bryan <pbryan@anode.ca> wrote:
And the second link?
Same comments, basically. But the more germane thing is that even assuming a class has both .__next__() and .__iter__(), it is perfectly reasonable for the latter to return something other than `self`. The Foo and Bar classes are slightly contrived, but I've written production code where e.g. `iter(thing)` returns a new `thing.__class__` instance rather than self.

1. Noted: Python's for statement will happily iterate over an object that only implements __next__. 2. The documentation is pretty clear on what is expected of an iterator: implement __next__ and __iter__. 3. It is perfectly reasonable for __iter__ to return something other than self; the documentation already reflects this. 4. If you believe the documentation is in error or the requirement should be relaxed, then further discussion is probably warranted. On Mon, 2021-11-29 at 00:22 -0500, David Mertz, Ph.D. wrote:

On Mon, Nov 29, 2021, 12:33 AM Paul Bryan <pbryan@anode.ca> wrote:
Not so much "in error" as "could be clarified." Another couple words like "usually desirable" ... for an iterator to implement .__iter__() would cover it. Maybe I'll make a PR on the docs. I agree that an iterator that doesn't implement .__iter__() is usually unnecessarily a PITA. Especially since "return self" is *usually* a perfectly good implementation.

There was just a bpo on this — discussion of exactly what an Iterable th or is. I’m pretty sure the conclusion was that in order to be an Iterator, an object needs to have __iter_. I’m not totally sure that it has to return self, but that is the expected behavior, so that it can be re-entrant. Of course, Python being Python, if in a given use case, you don’t need __iter__, you don’t have to have it. Just like the fact that you can pass anything with an appropriate read() method into json.load() will work, but that doesn’t make it a File. -CHB On Sun, Nov 28, 2021 at 9:42 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sun, Nov 28, 2021 at 10:35:34PM -0800, Christopher Barker wrote:
Of course, Python being Python, if in a given use case, you don’t need __iter__, you don’t have to have it.
[cynicism=on] If you think your iterator doesn't need `__iter__`, just wait, and you will find that it does. [cynicism=off] This is Python. We will defend your right to create a class that uses `__eq__` to delete files and `__add__` to multiply :-)
Just like the fact that you can pass anything with an appropriate read() method into json.load() will work, but that doesn’t make it a File.
Exactly. -- Steve

On Sun, Nov 28, 2021 at 09:33:17PM -0800, Paul Bryan wrote:
1. Noted: Python's for statement will happily iterate over an object that only implements __next__.
That is not correct.
2. The documentation is pretty clear on what is expected of an iterator: implement __next__ and __iter__.
That goes back to the PEP introducing the iterator protocol in version 2.1. This is not something new.
3. It is perfectly reasonable for __iter__ to return something other than self; the documentation already reflects this.
Right. Your object can return anything it likes from `__iter__`. But then the object isn't an iterator itself, it is an iterable. That is perfectly fine, many wonderful objects that are iterable aren't iterators (e.g. lists, strings, dicts, sets, etc). -- Steve

On Mon, Nov 29, 2021 at 12:11:43AM -0500, David Mertz, Ph.D. wrote:
That comment is newly added to the documentation, it wasn't there in 3.9: https://docs.python.org/3.9/glossary.html#term-iterator and I don't think it should have been added. Rather than muddying the waters with a comment that CPython doesn't obey its own rules, I would rather we fixed the broken iterators so that they weren't broken. For Python classes, all it needs is a one line method. For C classes, I presume it's a bit more complex, but not that much. Does anyone know what builtin or stdlib objects iterators fail to implement `__iter__`? I haven't been able to find any -- all the obvious examples do (map, filter, reversed, zip, generators, list iterators, set iterators, etc).
The inconvenient truth here is that if you have an object that defines only `__next__`, you **cannot** iterate over it directly!
How is that an iterator when it doesn't support iteration? You can, sometimes, get away with such a broken iterator if you iterate over it *indirectly*, that it, you have a second class with an `__iter__` method which returns a BrokenIterator instance: class Container(object): def __iter__(self): return BrokenIterator() Now you can iterate over a Container instance: for i in Container(): print(i) break but the moment that somebody tries to use that as an actual iterator, for example by skipping the first element: it = iter(Container()) next(it, None) # discard the first element for i in it: print(i) it will blow up in their face. Would-be iterators that supply only `__next__` are broken. If you go back to the original PEP that introduced iterators in Python 2.1, it is clear: "Classes can define how they are iterated over by defining an __iter__() method; this should take no additional arguments and return a valid iterator object. A class that wants to be an iterator should implement two methods: a next() method that behaves as described above, and an __iter__() method that returns self." https://www.python.org/dev/peps/pep-0234/ The reference manual is correct. I quote: "One method needs to be defined for container objects to provide iterable support: ..." (that would be `__iter__`). "The iterator objects themselves are required to support the following two methods, which together form the iterator protocol: ..." (and they would be `__iter__` returning self, and `__next__`). https://docs.python.org/3/library/stdtypes.html#iterator-types
That said, I don't think the description at the link is very good. Anyway, it's different from what I teach,
Then you are teaching it wrong. Sorry.
Then Foo instances are *iterable* but they are not iterators.
And Bar instances are broken iterators. [...]
Or anyway, what would you call `bar := Bar()` if not "an iterator?!
A broken iterator which cannot be iterated over directly. -- Steve

Hi Stephen, Stephen J. Turnbull wrote: that I suggested for iterators), it is to make mistakes. Eg. pipeline([1,2,3]).reduce(lambda a,b: a+b).map(lambda x: x+1) because some methods reduce therefore returning a non-sequence object, instead of self. But with all due respect, you _do_ need to be familiar with the API to use it, so I don't see why (1) could be an issue. And with familiarity, you would make fewer mistakes. this is also because the handful lot of languages that I'm familiar with use method chaining.
list(starmap(..., filter(..., chain(..., map(..., ...)))) with something similar, but read from left to right instead. And because the functions from itertools module take in any iterable (regardless of mutability), the implementation should also do the same, which is (b) in your list.
I think this boils down to the itertools module (was thinking about it over the weekend). I find that the itertools module and some builtins like map, filter don't do themselves justice when chained together. It's okay for one or two function calls. But the design made it seem like it was never meant to be chained together (or was it?). Attempts to do so leads to code that must be read from right to left, making it an awkward API to use for transforming collections (which most of us might agree). If it was indeed built for one or two function calls, then I would argue that it's not really a useable or practical module, because a lot of times we perform not just one or two but multiple transformations on collections. So, to answer this question, I don't think the issue is whether the implementation is tricky such that the stdlib should do it. Rather, *our* itertools module itself is tricky to use, because fundamentally its design is not user-friendly, or rather limiting to the users. And this is a problem. Head over to StackOverflow and most people wouldn't recommend using it. It's not well-liked (except maybe by Lisp-ers). It's most probably because of what I mentioned in the previous paragraph. What does this mean for us? I think it's a good opportunity for us to rethink the design to make it more usable. Hence, I'm putting the onus on us (stdlib), instead of relying on 3rd party libraries to improve on it. As a proposal to improve the design, I suggested above a higher- level API for the itertools module that says "oh you want to use the itertools module? yeah it's a low-level module that is not meant to be used directly so here's a higher level API you can use instead." The implementation doesn't have to be method chaining because I'm generally proposing a higher-level API. Now, I've said that the useability of the itertools module is a problem pretty much in a matter-of-fact manner and putting it on us to rework it. But what does everyone else think about this? Do you share the same concerns too?
map (map_every to indicate a different implementation? though not conventional), filter, reduce, starmap, starfilter, zip, enumerate some from the Itertools Recipes section that might be more common: flatten, nth, take some 'reductional' ones: reduce, sum, all, any, min, max, join (for string iterators) some hybrid flat_map, filter_map some which for_each (returning None, though this is a for-loop).

Hi, All apologies if it has been clarified earlier, but if you dislike nested method calls what is wrong with operating on generators as in ```pycon
This covers chains of maps and filter, and everything provided by
itertools. Granted, it does not offer lazy reduce and sort but these would
not be very hard to implement, and since they need to consume the whole
iterator anyway, it's not clear how useful they would be.
Best,
E
On Sun, 28 Nov 2021 at 17:12, Raimi bin Karim <raimi.bkarim@gmail.com>
wrote:
> Hi Stephen,
>
> Stephen J. Turnbull wrote:
> > 1. Is dataflow/fluent programming distinguishable from whatever it
> > was that Guido didn't like about method chaining idioms? If so,
> > how?
> Are you referring to this
> https://mail.python.org/pipermail/python-dev/2003-October/038855.html?
> He mentioned (if I may summarize) (1) familiarity with the API and
> (2) making mistakes. In fluent programming (at least in the implementation
> that I suggested for iterators), it is to make mistakes. Eg.
>
> pipeline([1,2,3]).reduce(lambda a,b: a+b).map(lambda x: x+1)
>
> because some methods reduce therefore returning a non-sequence
> object, instead of self.
>
> But with all due respect, you _do_ need to be familiar with the API to use
> it, so I don't see why (1) could be an issue. And with familiarity, you
> would
> make fewer mistakes.
>
> > 2. Is the method chaining syntax preferable to an alternative
> > operator?
> I don't have an answer to this. I personally like method chaining. And
> this is also because the handful lot of languages that I'm familiar with
> use method chaining.
>
> > 3. Is there an obvious choice for the implementation? Specifically,
> > there are at least three possibilities:
> > a. Restrict it to mutable sequences, and do the transformations
> > in place.
> > b. Construct nested iterators and listify the result only if
> > desired.
> > c. Both.
> The choice for this implementation is to replace chaining function calls
> from the itertools module (incl. map and filter):
>
> list(starmap(..., filter(..., chain(..., map(..., ...))))
>
> with something similar, but read from left to right instead. And because
> the functions from itertools module take in any iterable (regardless of
> mutability), the implementation should also do the same, which is (b)
> in your list.
>
> > 4. Is this really so tricky that the obvious implementation of the
> > iterator approach (Chris's) needs to be in the stdlib with tons of
> > methods on it, or does it make more sense have applications write
> > one with the specific methods needed for the application?
> > Or perhaps instead of creating a generic class prepopulated with
> > methods, maybe this should be a factory function which takes a
> > collection of mapping functions, and adds them to the dataflow
> > object on the fly?
>
> I think this boils down to the itertools module (was thinking about
> it over the weekend).
>
> I find that the itertools module and some builtins like map, filter
> don't do themselves justice when chained together. It's okay for
> one or two function calls. But the design made it seem like it was
> never meant to be chained together (or was it?). Attempts to do
> so leads to code that must be read from right to left, making it an
> awkward API to use for transforming collections (which most of us
> might agree).
>
> If it was indeed built for one or two function calls, then I would
> argue that it's not really a useable or practical module, because
> a lot of times we perform not just one or two but multiple
> transformations on collections.
>
> So, to answer this question, I don't think the issue is whether the
> implementation is tricky such that the stdlib should do it. Rather,
> *our* itertools module itself is tricky to use, because fundamentally
> its design is not user-friendly, or rather limiting to the users. And this
> is a problem. Head over to StackOverflow and most people wouldn't
> recommend using it. It's not well-liked (except maybe by Lisp-ers).
> It's most probably because of what I mentioned in the previous
> paragraph.
>
> What does this mean for us? I think it's a good opportunity for us
> to rethink the design to make it more usable. Hence, I'm putting
> the onus on us (stdlib), instead of relying on 3rd party libraries to
> improve on it.
>
> As a proposal to improve the design, I suggested above a higher-
> level API for the itertools module that says "oh you want to use the
> itertools module? yeah it's a low-level module that is not meant to
> be used directly so here's a higher level API you can use instead."
> The implementation doesn't have to be method chaining because
> I'm generally proposing a higher-level API.
>
> Now, I've said that the useability of the itertools module is a problem
> pretty much in a matter-of-fact manner and putting it on us to rework
> it. But what does everyone else think about this? Do you share the
> same concerns too?
>
> > 5. map() and zip() take multiple iterables. Should this feature
> > handle those cases? Note that the factory function approach
> > allows the client to make this decision for themselves.
> I would say nope for map and yes for zip, viewing it from the perspective
> of the underlying iterator. The .map() instance method only refers to the
> underlying iterator so it should only take a function that will transform
> every
> element in the underlying iterator. For zip, we can take multiple
> iterables
> because we are zipping them with the underlying iterator. But this is my
> opinion and is a detail that we can come to a consensus to later.
>
> > 6. What are the names that you propose for the APIs? They need to
> > indicate the implementations since there are various ways to
> > implement.
> I propose the names be similar to those in builtin + itertools
>
> map (map_every to indicate a different implementation? though not
> conventional), filter, reduce, starmap, starfilter, zip, enumerate
>
> some from the Itertools Recipes section that might be more common:
>
> flatten, nth, take
>
> some 'reductional' ones:
>
> reduce, sum, all, any, min, max, join (for string iterators)
>
> some hybrid
>
> flat_map, filter_map
>
> some which
>
> for_each (returning None, though this is a for-loop).
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-leave@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/BKX4ZYPRJQU7Y6WB43ZR3NDVBZNYRQ4I/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

On Sun, Nov 28, 2021 at 9:30 AM Evpok Padding <evpok.padding@gmail.com> wrote:
Back in the day Python had map, filter, and reduce. Then comprehensions were added, which are another way to express map and filter in one go. I really like comprehensions. But since then, there has been an informal movement, if you can call it that, back to map and filter. Itertools is great, but it's not all that compatible with comprehensions :-( I'm a bit confused as to why comprehensions seem to have become second-class citizens myself. Granted, deeply nesting multiple comprehensions is pretty ugly, so a fluent interface to nesting transformations on iterables does have its appeal. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, Nov 29, 2021 at 5:54 AM Christopher Barker <pythonchb@gmail.com> wrote:
Citation needed? What informal movement has there been, and where? The only problem with itertools and comprehensions is that you get code written in two different ways. I don't recall seeing anyone say "don't use a comprehension if you're going to then use itertools.tee on it" or anything like that. Maybe I just don't hang out in the right places? ChrisA

On Sun, Nov 28, 2021 at 11:14 AM Chris Angelico <rosuav@gmail.com> wrote:
maybe even "informal" movement was too strong. It's a trend I've observed. At least on this list and others I follow -- maybe not in the broader community. One example was a proposal a while back on this list for a "groupby" in itertools (it kind of withered on the vine, like many good ideas here), I suggested that it be made comprehension friendly, and some others including the OP) didn't see the point. Maybe that's not representative, but it's a trend I've noticed. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sun, Nov 28, 2021, 7:49 PM Christopher Barker
I liked Michael Selik's idea. But I think there were a few edge cases where you, and I, and Mike, and others, each had sightly different ideas of best behavior. I don't really make much of the fact that no one actually got as far as a PEP (both Mike and I had changing job situations during that discussion period, I happen to know because I know him outside this list).

On Sun, Nov 28, 2021 at 10:52:03AM -0800, Christopher Barker wrote:
Back in the day Python had map, filter, and reduce.
And still does :-)
I don't think there is a movement as such, I think that in the excitement of comprehensions, people rejected "confusing" functional idioms like map and filter and especially reduce (which got relegated from a builtin to a function in functools). But as comprehensions became commonplace and boring, and more people became aware of functional idioms from other languages, we've become more comfortable with them. Beside, map is often shorter: map(len, strings) (len(s) for s in strings) and people like shorter code.
Itertools is great, but it's not all that compatible with comprehensions :-(
That's an odd thing to say. itertools operates on pretty much any iterable, comprehensions are iterables. What do you think itertools is lacking when it comes to comprehensions?
I'm a bit confused as to why comprehensions seem to have become second-class citizens myself.
I don't think they are.
Granted, deeply nesting multiple comprehensions is pretty ugly, so a fluent interface to nesting transformations on iterables does have its appeal.
Indeed. -- Steve

On Sun, Nov 28, 2021 at 05:28:58PM +0000, Evpok Padding wrote:
Nothing, and everything :-) Of course sometimes code is clearer when it is split over multiple statements, using named variables to hold intermediate results. But often code is *less* clear when it is split over multiple statements. That's why we have expressions: y = sqrt(3*(x+2) - 5) rather than being forced to perform one operation at a time: tmp = x + 2 tmp = 3*tmp tmp = tmp - 5 y = sqrt(tmp) and why we can chain methods of immutable objects: text.strip().lower().expandtabs().encode('utf-8') instead of being forced to perform one operation at a time. But beyond some certain level of complexity, it may be helpful to separate a complex expression into temporary steps. That's not going to change. Sometimes code is better when it is more terse, and sometimes when it is more verbose. An expressive language allows us to make the choice ourselves.
I don't understand how this thread, which is about pipelines of functions, has become fixated on *iterators*. You don't need lazy iterators to operate with a pipeline of functions, or chained methods. We often use method chains with non-lazy strings, and bash or other shells use pipelines of non-lazy functions on collections of data. Pipelines are syntax, not functionality. With pipe syntax, you can pipe anything into a function, whether it is a concrete collection, a lazy iterator, or a single atomic value. -- Steve

On Sat, Nov 27, 2021 at 08:50:37PM +0900, Stephen J. Turnbull wrote:
Or read the FAQs :-) https://docs.python.org/3/faq/design.html#why-doesn-t-list-sort-return-the-s...
And yet it is indisputable that chained methods are useful even for methods which modify the object they work on. Look at pandas: https://towardsdatascience.com/the-unreasonable-effectiveness-of-method-chai... If you format your code nicely, it even looks like a sequence of procedure calls: https://tomaugspurger.github.io/method-chaining so you have a choice whether to simulate Pascal or Java while still being 100% Pythonic :-) https://duckduckgo.com/?q=pandas+fluent+interface
We already use method chaining on immutable strings. Its only when it comes to mutable objects that is it possible to confuse the two situations: - the method modifies the object in place, and returns self; - versus the method returns a new, modified, copy of the object. Pandas uses the first style (I think; corrections welcome). Strings use the second, because they have no choice.
2. Is the method chaining syntax preferable to an alternative operator?
What do you mean, a different operator? Are you suggesting we should have two operators for method lookups? That might even be workable, if we had a rule: obj.method follows the regular descriptor protocol when doing attribute lookups, and (let's say...): obj::method wraps the method object with a proxy, so that: proxy(args) calls the original method, and then returns obj instead of whatever the method returned.
a. is obviously impossible, since strings already support method chaining, as do any other method that returns something other than None. And even those that return None can chain methods, if you are careful about which methods you call! >>> mylist = [] >>> mylist.sort().__bool__().as_integer_ratio() (0, 1) So its not that we can't use chained methods on lists. Its just that we can't use them to do anything *useful*. I don't understand your option b. What do nested iterators have to do with chaining methods? You can chain methods on any object or objects so long as the methods return something useful with methods that you want to call. They're not restricted to iterators. Since most iterators don't have many methods, it's not clear to me that iterators are even a little bit relevant. Maybe I have missed something. (It's been a looooong thread and I admit I have skimmed a few posts.) -- Steve

Steven D'Aprano writes:
And yet it is indisputable that chained methods are useful even for methods which modify the object they work on. Look at pandas:
Guido disputed that it was useful *enough*. My point was advice to the proponent to get his proposal adopted (despite the fact that I personall think it's a YAGNI), not a characterization of universal best practice.
No, I'm suggesting that pipelines could have an alternative syntax using a different operator. This probably isn't really feasible since (unless we actually added syntax) it would require some sort of contortion or additional boilerplate to handle non-iterator arguments.
a. is obviously impossible, since strings already support method chaining,
The OP wasn't talking about general method chaining, and neither was I.

On Mon, Nov 29, 2021 at 5:15 AM Steven D'Aprano <steve@pearwood.info> wrote:
Pandas returns SOMETHING when a method is called. Maybe it's self, maybe it's after mutation, maybe it's a new DataFrame, Grouper, Series, etc. Pandas goes out of its way not to promise what gets returned. Indeed, that answer often changes between minor, or even micro, versions of Pandas. I'm not sure I love that fact, but so it is. As a consequence, the Pandas developers have made a "soft deprecation" of the `inplace=True` parameter that most methods take. I still like that style better, often, but they really want to avoid guaranteeing which operations will or won't save memory in the implementation, and feel like that flag can mislead users. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Mon, Nov 29, 2021 at 2:12 AM Steven D'Aprano <steve@pearwood.info> wrote: Since most iterators don't have many methods, it's not clear to me that
iterators are even a little bit relevant.
I think you just answered your own question. Since iterators in general don’t have methods, they can not be chained. I believe the OP was suggesting that they have some methods so that they could be chained. There are two tricks here: 1) What methods to add? There are literally an infinite number of possibilities. 2) there are multiple ways to create Iterators, how does one make these methods universal? -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, Nov 30, 2021 at 4:39 AM Christopher Barker <pythonchb@gmail.com> wrote:
They cannot be chained using method lookups. One of the proposals is to have a different form of chaining, which passes the preceding object as a first parameter.
Both can be solved if the construct gets syntactic support rather than type support. For instance, if this: 1 |> add(2) is exactly equivalent to this: add(1, 2) then neither the iterator nor the consumer needs to be aware of the new protocol. I don't like that syntax though, and this will live or die on a good syntax. ChrisA

How about using the typing return arrow -> to indicate the return of the preceding goes as a first parameter in the function: 1 -> add(2) I don’t like the fact this is used only as a first parameter. What if you want the preceding output to go as a second parameter? Abdulla Sent from my iPhone

On Wed, Dec 1, 2021 at 9:09 AM Abdulla Al Kathiri <alkathiri.abdulla@gmail.com> wrote:
How about using the typing return arrow -> to indicate the return of the preceding goes as a first parameter in the function: 1 -> add(2)
That's a possibility. The same arrow then means "this function produces that value" in a definition, and "this value then goes into that function" in an expression.
I don’t like the fact this is used only as a first parameter. What if you want the preceding output to go as a second parameter? Abdulla
Then write it the classic way. The vast majority of the time, you'll want it to be the first parameter. It's like how dot chaining of methods only works for the return value becoming the 'self' parameter - yes, that's incredibly restrictive, but in real world situations, that's easily the most common form needed. Otherwise you end up with a much more complicated proposal with far less beauty to it. ChrisA

On Tue, Nov 30, 2021 at 2:17 PM Chris Angelico <rosuav@gmail.com> wrote:
I don’t like the fact this is used only as a first parameter. What if you want the preceding output to go as a second parameter?
<snip>
in real world situations, that's easily the most common form needed.
Exactly -- the goal here (to me) is to have an easy and intuitive way to chain operations -- not to have a new way to call functions in the general case. - CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Thanks for the clarification. Yeah I agree it will look ugly if we use it not as a first argument many times in a row but what if there is one or two functions in the middle that they are not playing along and don’t have teamwork ethics, meaning they put the parameter we are interested in as a second or third parameter or as a keyword only parameter and it will make really good sense to chain those calls? I believe an option should be given to the programmer to do this, but it shouldn’t be encouraged to do it a lot. _ could be used as a placeholder or anything really. 1 -> add(2) -> pow(3, _) equivalent to 1 -> add(_, 2) -> pow(3, _). _ is where your output goes into. _ as a first argument can be omitted.

On Wed, Dec 1, 2021 at 6:14 PM Abdulla Al Kathiri <alkathiri.abdulla@gmail.com> wrote:
It's always possible to make a proposal more general by making it less clean. Is the loss of generality from "must pipe into the first argument" (or "must pipe into last argument", pick whichever one you want to go with) actually as restrictive as you think? People don't tend to write add() and pow() like this, because we have operators. With real-world code, how commonly is this actually going to be a problem? ChrisA

Sometimes when I do complicated regular expressions, I like to divide the patterns from a big pattern to small patterns until I reach what I want. re module functions take the text as a second argument. Furthermore, you can’t pass the Match object directly, you need to extract the group/text from it. If we have a placeholder for the passed argument, we could do both 1) put it wherever we want, not necessarily first 2) further manipulate it so it becomes a valid argument number = text -> re.search(pattern1, _) -> re.search(pattern2, _.group(0)) -> re.search(pattern3, _.group(1)) -> float(_.group(0)) Yeah that can look ugly I agree (here intentionally the placeholder is overused). But the person can still write a flowing chained operation if they want or if it’s easier for them to understand. Or just do it the traditional way … text1 = re.search(pattern1, text).group(0) text2 = re.search(pattern2, text1).group(1) text3 = re.search(pattern3, text2).group(0) number = float(text3)

On Wed, 1 Dec 2021 at 19:38, Abdulla Al Kathiri <alkathiri.abdulla@gmail.com> wrote:
It looks like you could just write as follows without using any placeholder number = text -> (pattern1 -> re.search)().group(0)\ -> (pattern2 -> re.search)().group(1)\ -> (pattern3 -> re.search)().group(0)\ -> float() but anyway, a question seems: once you start chaining applications of functions from the right, might you start wanting assignment (expression) to the right (of an object on the lhs to a variable on the rhs)? I think that could be easier with chained applications from the right for the reader of the code. Best regards, Takuo Matsuoka

Abdulla Al Kathiri writes:
Function definitions or lambdas are cheap.
1 -> add(2) -> pow(3, _) equivalent to 1 -> add(_, 2) -> pow(3, _). _ is where your output goes into. _ as a first argument can be omitted.
def rpow(x, y): return pow(y, x) 1 -> add(2) -> rpow(3) But in many cases it won't be necessary because you will be defining functions for the pipeline, rather than fitting the pipeline to preexisting functions.

Yeah very good points! As long as you designed the functions to be used for the pipeline, then we’re fine. However, you might force the programmer to write unnecessary functions (almost boil-plate code of switching arguments around) based on already existing optimized functions scattered everywhere so they fit with the pipeline unless you do them on the fly (lambda function) which has its drawbacks as well. 1 -> add(2) -> ((y, x) => pow(x, y))(3) is way less readable than 1 -> add(2) -> pow(3, _) The above is just an example. It could be written better using operators 3 ** (1 + 2) Abdulla Sent from my iPhone

On Tue, 30 Nov 2021 at 04:55, Chris Angelico <rosuav@gmail.com> wrote:
Along with such an operator form of functools.partial: ``` x |> f # equivalent to partial(f, x) as a callable object ``` I would also like x *|> f or something for partial(f, *x) and x **|> f or something for partial(f, **x) . Assuming that the interpretation of the operator is going to be controlled by a new dunder method (logically to be provided by the built-in class 'object' for general objects, I understand), another, close possibility for enabling application of functions from the right (not conflicting with the other) may be (just) a method (again provided by the class 'object') through which any function can be applied on the object, say ``` x.apply(function, *args, **kwargs) # # ---> to be equivalent to function(*args, x, **kwargs) # or functools.partial(function, *args, **kwargs)(x) ``` (so partial(operator.methodcaller, "apply") would be a restricted version of partial). Instead of the class 'object', a specialized class can provide such a method, and can be created by a user of course. However, since not every function returns an instance of that class, a bit more complication will then be involved in chaining applications of arbitrary functions through the method, as well as the applications of other methods. Thus, I think it would be the simplest if the universal base class provided such a method. Finally, we can also imagine "unpacking" versions of the above: ``` x.apply1(function, *args, **kwargs) # ---> function(*args, *x, **kwargs) x.apply2(function, *args, **kwargs) # ---> function(*args, **x, **kwargs) ``` (as well as perhaps something more flexible..). Best regards, Takuo Matsuoka

On Tue, Nov 23, 2021 at 1:18 AM Remy <raimi.bkarim@gmail.com> wrote:
Here's the equivalent as a list comprehension, which I think looks better than either of the above: [x + 1 for x in [1,2,3] if x % 2 == 0]
Comprehensions ARE lazily evaluated. If you use to_list at the end (or call list(it), either way), that's equivalent to a list comp; if you don't, it's equivalent to a generator expression. Either way, Python doesn't construct each intermediate list before moving on to the next one.
📌On list comprehension vs. method chaining I don't think the aim of this API should be to replace list comprehension and the like. Rather it offers programmers an alternative pattern to transform and reason about their data.
It's not too hard to create your own dataflow class if you want one. It can start with any arbitrary iterable, and then have your map and filter methods just the same. Cool trick: you can even call your class iter! :) class iter: _get_iterator = iter # snapshot the original def __init__(self, basis): self.basis = self._get_iterator(basis) def map(self, func): return type(self)(map(func, self.basis)) # etc def __iter__(self): return self def __next__(self): return next(self.basis) You should then be able to method-chain onto your iter constructors. Personally, I wouldn't use this sort of thing (comprehensions are far easier to work with IMO), but if you want it, the power is in your hands. ChrisA

Chris Angelico writes:
One thing I noticed when implementing this class (yours is better, so I'm not posting mine :-) is that you and I both implemented .map in the obvious way for this use case, but the map function takes multiple iterables. On the other hand, filter takes only one iterable argument. Obviously, you can implement multifilter def multifilter(func, *iterables): filter(lambda x: func(*x), zip(*iterables)) I think generalizing to this is a YAGNI, since it's so simple. Also, returning an iterable of tuples may not be the right thing. That is, you might want it to return a tuple of iterables, but that would be messy to implement, and in general can't be done space-efficiently I think. This apparently is a "no one ever needed it." Changing map to take a sequence of iterables is a non-starter, since that would be backward incompatible. There's also implementing zip's strict argument, eg, def zippymap(func, *iterables, strict=False): return map(lambda x: func(*x), zip(*iterables, strict)) and corresponding zippymappers for any other mappers (including filter). This seems like it might be useful extension to the functions in the stdlib for the same reason that it's useful for zip itself. Even though it's so easy to implement in terms of zip, it would be more discoverable as a documented argument to the functions. Comments? Steve

On Tue, Nov 23, 2021 at 7:47 PM Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Extending map to more than just one function and one iterable is done in different ways by different languages. I don't think it necessarily needs to be implemented the same way in a pipeline as it is in a stand-alone map function; the pipeline is, by its nature, working with a single iterable, so if its map method took anything more than a single argument, it could just as easily be interpreted as "pass these arguments to the function" (so you could >>iter(x).map(int, 16)<< to convert hex to decimal) rather than as extra iterables. Which is why I kept it simple and didn't allow more args :)
Agreed, not really a lot of point.
I think you're right there.
Given that I don't actually want a pipeline like this, I'm not the best one to ask, but I would strongly favour ultra-simple APIs. ChrisA

Chris Angelico writes:
On Tue, Nov 23, 2021 at 7:47 PM Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Ah, but I changed the subject here. Sorry about not making that clear. This isn't a method on a dataflow, it would be a a change to map itself. Steve

On Wed, Nov 24, 2021 at 12:39 AM Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Oh, oh, gotcha. That may be worth doing, yeah. It doesn't make a lot of sense in the pipeline form, but map() as it currently is could benefit from that. Prior comment withdrawn as it was responding to what you weren't saying :) ChrisA

Chris Angelico wrote:
That's not equivalent. You produce [3] instead of [2, 4]. So you rather proved that the proposal does have merit, as it's apparently easy to get the list comprehension wrong. Actually equivalent list comprehension: [x for x in [1,2,3] for x in [x + 1] if x % 2 == 0] Or spread across lines: [x for x in [1,2,3] for x in [x + 1] if x % 2 == 0]

On Fri, Dec 24, 2021 at 10:22:29AM -0000, Stefan Pochmann wrote:
Chris Angelico wrote:
Or simpler still: [x + 1 for x in [1,2,3] if x % 2 != 0] Or we can use the mighty walrus: [y for x in [1, 2, 3] if (y:=x+1) % 2 == 0] (although y leaks out of the comprehension). Using a hypothetical pipeline syntax with an even more hypothetical arrow-lambda syntax: [1, 2, 3] | map(x=>x+1) | filter(a=>a%2) | list

Stefan Pochmann writes:
Steven D'Aprano wrote:
But you didn't, really. As-is would use genexps to "inline" the map and filter calls: list(x for x in (y + 1 for y in [1, 2, 3]) if x % 2 == 0) which in this case is much harder to get wrong, and still reads as well as, if not better than, the original "chained methods of dataflow object" idiom. Sure, it's a little verbose, but I think it's time for proponents to find persuasive examples, preferably not having nice genexp implementations (and in the stdlib). I don't recall if Steven said or I'm inferring that he'd like the whole proposal better if there were a persuasive "pipeline" syntax proposal with it, but that's where I am. I don't find "method chaining as pipeline" to be an attractive syntax. That's a big IMO FYI, of course YMMV. Yet another Steve

Stephen J. Turnbull wrote:
But you didn't, really.
Yes I did, really. Compare (in fixed-width font): ( [x source [1,2,3].iter() for x in [1,2,3] increment .map(lambda x: x+1) for x in [x + 1] if even .filter(lambda x: x % 2 == 0) if x % 2 == 0] .to_list() ) Same source and two transformations, executed in the same order, written in the same order. Steven executes the increment before the filter, keeps odds instead of evens, and wrote the increment even before the source. You do have the same steps (increment and keep evens) as the original and execute them in the same order, but you *wrote* the first transformation *before* the source, nested instead of flat, which reads inside-out in zig-zag fashion. Not that bad with so few transformations, but if we want to do a few more, it'll likely get messy. While the OP's and mine will still read straight from top to bottom.

Stefan Pochmann writes:
Stephen J. Turnbull wrote:
But you didn't, really.
Yes I did, really. Compare (in fixed-width font):
I had no trouble reading the Python as originally written. Obviously you wrote a comprehension that gets the right answer, and uses the bodies of the lambdas verbatim. The point is that you focus on the lambdas, but what I'm interested in is the dataflows (the implicitly constructed iterators). In fact you also created a whole new subordinate data flow that doesn't exist in the original (the [x+1]). I haven't checked but I bet that a complex comprehension in your style will need to create a singleton iterable per source element for every mapping except the first. One point in favor of doing this calculation with chained iterators is to avoid creating garbage. The nested genexp I proposed creates the same iterators as the proposed method chain, and iterates them the same way, implicitly composing the functions in a one-pass algorithm.
We are well-used to reading parenthesized expressions, though. Without real-world examples, I don't believe the fluent idiom has enough advantages over comprehensions and genexps to justify support in the stdlib, especially given that it's easy to create your own dataflow objects. We don't have a complete specification for a generic facility to be put into the stdlib, except the OP's most limited proposal to add iter, map, filter, and to_list methods to iterators (the first and last of which are actually pointless). But I don't think that would get support from the core devs. It's also not obvious to me that the often awkward comprehension syntax that puts the element-wise transformation first isn't frequently optimal. In log(gdp) for gdp in gdpseries economists don't really care about the dataflow, as it's the same in many many cases. We care about the log transformation, as that's what differentiates this model from others. So putting the transformation before the data source makes a lot of sense for readability (in what is admittedly a case I chose to make the point). I'll grant that putting the source ("x in iterable") between the mapping ("f(x) for") and the filter ("if g(x)") does create readability issues for nested genexps like the one I suggested, if there are more than one or two such filters.
Not that bad with so few transformations, but if we want to do a few more, it'll likely get messy.
If it were up to me (it isn't, but I represent at least a few others in this), "likely" doesn't cut it. I mean, I already admitted that as a *possibility*. We want to see a specification, and real applications that benefit *more* from a generic facility like that proposed than they would from application-specific dataflow objects. Steve

In fact you also created a whole new subordinate data flow that doesn't exist in
Stephen J. Turnbull wrote: the original (the [x+1]). I bet that a complex
comprehension in your style will need to create a singleton iterable per source element for every mapping except the first.
I don't think so. Sounds like you missed that `for x in [x + 1]` is now treated as `x = x + 1` not only by humans but also by Python, see the first item here: https://docs.python.org/3/whatsnew/3.9.html#optimizations From disassembling my expression (in Python 3.10): Disassembly of <code object <listcomp> at 0x00000207F8D599A0, file "<dis>", line 1>: 1 0 BUILD_LIST 0 2 LOAD_FAST 0 (.0) >> 4 FOR_ITER 14 (to 34) 6 STORE_FAST 1 (x) 8 LOAD_FAST 1 (x) 10 LOAD_CONST 0 (1) 12 BINARY_ADD 14 STORE_FAST 1 (x) 16 LOAD_FAST 1 (x) 18 LOAD_CONST 1 (2) 20 BINARY_MODULO 22 LOAD_CONST 2 (0) 24 COMPARE_OP 2 (==) 26 POP_JUMP_IF_FALSE 2 (to 4) 28 LOAD_FAST 1 (x) 30 LIST_APPEND 2 32 JUMP_ABSOLUTE 2 (to 4) >> 34 RETURN_VALUE That's *more* efficient than your separate iterators having to interact, not less. I also tried timing it, with `source = [1,2,3] * 10**6`. Mine took 1.15 seconds to process that, yours took 1.75 seconds. Turning your outer one into a list comp got it to 1.58 seconds, still much slower.

On Mon, Dec 27, 2021 at 07:19:12PM -0000, Stefan Pochmann wrote:
I'm afraid that is doubly wrong. Humans still read `for x in [x + 1]` as a loop, because that's what it is. So says at least this human. And Python the language makes no promise about that being optimized into a simple assignment. Only CPython 3.9 and above does so. Other implementations may or may not do so, and being a mere optimization, it could be removed at any time without notice if it were found to be interfering with some other feature or more powerful optimization. Optimizations are great, but we must be careful not to treat them as language features unless they are documented as such. None is a singleton and always will be, so it is safe (and recommended!) to test `is None`. But 211 may or may not be a singleton, and so testing `is 211` is risky, even it it happens to work for you under some circumstances. Caching of small ints is an implementation-dependent optimization, not a language feature. And so is the comprehension inner loop speed-up. You can rely on it being fast if you like, but that ties you to a specific implementation and version. -- Steve

Stefan Pochmann writes:
OK, I missed that change.
That's *more* efficient than your separate iterators having to interact, not less.
And more efficient than the separate iterators that would be created by method chaining in Chris's implementation AIUI. Whose side are you on here? ;-) Jokes aside, my opinion matters almost not at all, but I think a lot of the core devs and in particular the SC will want to see multiple examples of existing production code that uses alternative idioms such as comprehensions, generators, and mapping functions that would be improved by the proposed change. I doubt you'll find many in the stdlib because Guido and most other core devs have never been fans of map and filter (even though eventually those functions moved from functools to builtins). Then, unless you can present implementations of .map and .filter that transform iter(iterable).map(f).filter(g) to (x for x in iterable for x in [f(x)] if g(x)) as optimized by Python 3.9+, you'll have to deal with the argument that even though "readability counts", the better performance of some of the other idioms reduces the applicability of the method-chaining idiom quite a lot in production code. Steve

On Tue, Dec 28, 2021 at 12:49:13AM +0900, Stephen J. Turnbull wrote:
I believe that Serhiy has optimized the case where a comprehension loops over a singleton list or tuple. If you go back to the pre-walrus operator discussion (PEP 572) one of the alternatives was to use a second loop. [func(y) for x in items for y in [x+1] if condition(y)] # inner loop ...........^^^^^^^^^^^^^^ The second loop was not mentioned in the PEP, but it was discussed during the mega-threads (note plural). If my recollection serves me correctly, at some point Serhiy optimized the inner loop away. If you inspect the output of: import dis dis.dis("[y for x in items for y in [x]]") you will see that the list [x] is never actually created, and the for y loop is turned into just an assignment to y. But this is a CPython implementation detail, not a language promise, so it may not apply to other Python implementations. [...]
We are well-used to reading parenthesized expressions, though.
Just because we're used to them doesn't make them easy to read. If only 14th century mathematicians had discovered reverse Polish notation, instead of using + as a short-hand for the Latin "et" (and), all those stupid internet memes arguing about the value of 6÷2(2+1) (which is ambiguous in standard maths notation, valid answers are 1 or 9) would be obsolete. We wouldn't need brackets around expressions, parsers would be much simpler, and big complex expressions with lots of function calls would be much easier to understand. On the other hand, slice notation (which is nice) would seem bizarre. And currying in Haskell would be much harder.
My brain can't parse that sentence. Are you for it or against that "often awkward comprehension syntax"?
Julia (if I recall correctly) has a nice syntax for automatically turning any function or method into an element-wise function: # Take the log of one value. log(gdp) # Take the log of each value in the series. log.(gdpseries) -- Steve

On Mon, Dec 27, 2021 at 4:07 PM Steven D'Aprano
Julia (if I recall correctly) has a nice syntax for automatically turning any function or method into an element-wise function:
And numpy has an even easier one: np.log(a_scalar) np.log(an_array) I’m only being a little bit silly. In fact, array-oriented operations are really nifty, a heck of a lot easier to parse than map, or comprehensions, etc. Filtering can be a bit ugly, but not too bad. MATLAB has a duplicate set of operators: the matrix ones and element wise ones. We struggled for years how to do that in Python — until someone realized that matrix multiplication is the only actually useful matrix operation. Hence @ and now we’re good. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, Dec 27, 2021 at 10:15:05PM -0800, Christopher Barker wrote:
Ah, but numpy has to use their own special log function that does something like this: # Obviously just pseudocode def numpy.log(obj): if obj is a scalar: return log(obj) # Scalar version. else: # Apply the log function to every element of the # vector, array or matrix. elements = [log(x) for x in obj] return type(obj)(elements) and has to repeat this boilerplate for every single function that operates on both scalars and vectors etc. Whereas Julia allows you to write the scalar log function and then *automatically* apply it to any vector, with no extra code, just by using the "dot-call" syntax log.(obj) https://docs.julialang.org/en/v1/manual/functions/#man-vectorized And it works with operators too. -- Steve

On Tue, Dec 28, 2021 at 1:15 AM Christopher Barker <pythonchb@gmail.com> wrote:
I have an @elementwise decorator I use for teaching decorators. I could dig it up, but everyone here could write it too. The main work is simply returning the same kind of collection that was passed in (as opposed to, e.g. always a list). But that's 2 lines of work. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Tue, Dec 28, 2021 at 5:31 AM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
and numpy has a vectorize() function, which, I only just realized, can be used as a decorator -- as long as you're happy with the defaults.
Ah, but numpy has to use their own special log function that does something like this:
# Obviously just pseudocode def numpy.log(obj): if obj is a scalar: return log(obj) # Scalar version. else: # Apply the log function to every element of the # vector, array or matrix. elements = [log(x) for x in obj] return type(obj)(elements) well, yes, numpy provides special functions, but they look more like this: def numpy.log(obj): obj = np.asarray(obj) return np._log(obj) where np._log is written in C. (yes, np.vectorize does indeed wrap the input function in a loop) Anyway, the point is that numpy works by having an nd-array as a first class object -- I suppose it's "only" for performance reasons that that's necessary, but it's why having a special notation to vectorize any function wouldn't be that helpful. it doesn't have to check "is this a scalar?", because ndarrays can be any (well up to 32) dimensionality -- a scalar, 1D, 2D, etc .... And I'm a bit confused as to why Julia needs that -- it's also based on arrays-as-first-class-objects, but I haven't looked at Julia for ages. Having said that, I do think that the vectorized approach makes for more readable, and less error prone, code for a large class of problems. I often use numpy when performance is a non-issue. In fact, numpy is slower than "pure python" for very small arrays, but I still use it. So having a built-in way to do vectorized operations would be pretty cool. I've often thought that a "numpython" interpreter would be pretty nifty -- it would essentially make ndarrays builtins, so that you could apply all sorts of nifty optimizations at run time. But I've never fleshed out that idea. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, Dec 29, 2021 at 4:54 AM Christopher Barker <pythonchb@gmail.com> wrote:
I'm not sure about that.
If you pass a scalar to log(), you get back a scalar, not an array. Whereas asarray would return an array with no dimensions. Not sure how significant that is, but it does still distinguish between values and arrays. ChrisA

On Tue, Dec 28, 2021 at 10:19 AM Chris Angelico <rosuav@gmail.com> wrote:
well, yes, there is a odd wart in numpy that there are zero-dimensional arrays, and there are scalars, which almost, but not quite the same: In [81]: arr = np.array(3.0) In [82]: arr.shape Out[82]: () In [83]: scalar = np.float64(3.0) In [84]: scalar.shape Out[84]: () In [85]: len(arr) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-85-23f199ad2330> in <module> ----> 1 len(arr) TypeError: len() of unsized object In [86]: len(scalar) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-86-d68a51ddf681> in <module> ----> 1 len(scalar) TypeError: object of type 'numpy.float64' has no len() In [87]: 5.0 * arr Out[87]: 15.0 In [88]: 5.0 * scalar Out[88]: 15.0 But they are almost interchangeable -- the reason the distinction is there at all is for some internal numpy structural / legacy reasons. I think it boils down to numpy scalars being more-or-less interchangable with the built in python number types (https://numpy.org/neps/nep-0027-zero-rank-arrarys.html) Note that if you index into an ndarray, you get one less dimension, until you get to zero dimensions, which is a scalar. In [98]: arr2 = np.ones((3,4)) In [99]: arr2.shape Out[99]: (3, 4) In [100]: arr1 = arr2[0] In [101]: arr1.shape Out[101]: (4,) In [102]: arr0 = arr1[0] In [103]: arr0.shape Out[103]: () In [104]: arr0[0] --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-104-29631edfe735> in <module> ----> 1 arr0[0] IndexError: invalid index to scalar variable. Anyway, the point is that the distinction between scalar and array is the same as between 1D array and 2D array, and that individual functions and operators don't need to be different for operating on different dimensions of arrays. Not sure how significant that is, but it does still distinguish between
values and arrays.
While numpy is strictly making a distinction between the *type* of scalar values and arrays, it isn't making a distinction in the *concept* of a scalar value. In the context of this thread, I think the relevant point is that "array-oriented" operations are another syntactical way of expressing operations on collections of data, as opposed to various ways to spell loopsing, which is what map() and comprehensions are. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Steven D'Aprano writes:
I believe that Serhiy has optimized the case where a comprehension loops over a singleton list or tuple.
Yeah, I missed that.
We are well-used to reading parenthesized expressions, though.
Just because we're used to them doesn't make them easy to read.
I didn't say that. I said that it's not clear to me that that (as yet unproposed) change has big enough advantages to be added to the stdlib, and being used to parentheses is one reason for that.
Why not both? Sometimes it's awkward, sometimes it's optimal.
Julia (if I recall correctly) has a nice syntax for automatically turning any function or method into an element-wise function:
Sure, but syntax is not (yet) what we're talking about here.

On Sun, 26 Dec 2021 at 14:19, Steven D'Aprano <steve@pearwood.info> wrote:
What is the pipeline syntax like indeed? It looks as if your ``|`` is an operator which produces callable objects, e.g., [1, 2, 3] | map such that calling it like [1, 2, 3] | map(x=>x+1) will be equivalent to map(x=>x+1, [1, 2, 3]) except that <an object> | list is apparently supposed to be a list without being called. But you might have actually meant to call it like [1, 2, 3] | map(x=>x+1) | filter(a=>a%2) | list() to get the list. The character ``|`` is OK with [1, 2, 3] , but it's already given a meaning as an operator e.g., with {1, 2, 3} . Can the syntax separate the different uses? I suppose it may be a new operator that you want. I thought some people had already essentially proposed an operator version of functools.partial although [1, 2, 3] | map will not exactly be equivalent to partial(map, [1, 2, 3]) because it's not map([1, 2, 3], x=>x+1) that you want. You want the arguments to be in a different order, but that's the only difference. Best regards, Takuo Matsuoka

On 23/11/21 3:15 am, Remy wrote:
Iterators to implement 'transformational' functions like map, filter, flat_map, 'reductional' functions like reduce, sum, join, and 'evaluate' functions like to_list and to_set.
This would place a burden on all iterators to implement a large and complex interface. This goes directly against the philosophy of Python protocols, which is to be as minimal as possible. Do one thing, and do it well. It would also be a huge backwards-incompatible change. And where do you stop? You've picked an arbitrary subset of things one might want to do with an iterator. Why those particular ones? What about the contents of the itertools module? Should they be included too? Why or why not? -- Greg

Hi Remy, On Mon, Nov 22, 2021 at 02:15:08PM -0000, Remy wrote:
Hi, I'd like to revive this thread after what seemed like a decade and see where this might take us 😃
Reviving old threads from a decade ago is fine, if something has changed. Otherwise we're likely to just going to repeat the same things that were said a decade ago. Has anything changed in that time? If not, then your only hope is that people's sense of what is Pythonic code has changed. Python is a purely object-oriented language that does not enforce, or even prefer, object-oriented syntax. We prefer procedural syntax (functions) in many cases, especially for protocols. What I mean by this is: - all values in Python are objects; there are no "unboxed" or machine values; - but we don't force everything to use "object.method" syntax when functions are better. We even have a FAQ for why Python is designed this way: https://docs.python.org/3/faq/design.html#why-does-python-use-methods-for-so... You might also get some insight from a classic critique of Java: https://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html Note that, *technically*, Python is a "kingdom of nouns" language too. Everything we have, including functions and methods and even classes, is an object. But you will rarely see a `ThingDoer.do` method when a `do` function is sufficient. You want to write: mylist.iter() But that's just a different spelling of: iter(mylist) Instead of: any_iterable.to_list() just write: list(any_iterable) `list` already knows how to iterate over any object which provides either of the two iterable protocols: - the iterator protocol; - or the sequence protocol; so your classes just need to implement one protocol or the other, and they will *automatically and for free* get the functional equivalent of all of your methods: - to_list, to_tuple, to_set, to_frozenset, iter, map, reduce, filter... without any extra work, or bloating your class' namespace and documentation with dozens or hundreds or thousands of methods that you don't use. And most importantly, since you cannot possibly anticipate every possible container class and functional process, you don't have to worry about all the methods you left out: - mapreduce, union, product, to_deque ... Since you cannot provide a dot-method for every possible function your consumers may want, you are never going to eliminate procedural syntax: # Why is there no to_rainbox_forest method??? obj = rainbox_forest(obj) so you might as well embrace it and stop fighting it. Protocol-based functions are better than methods. Any function that operates on an iterable object will automatically Just Work. -- Steve

In my previous post, I suggested that the status quo: iter(myobj) is superior to the suggested method-based syntax: myobj.iter() I stand by that. But I will give one exception, and suggest that so long as we don't have a good syntax for that, this request will never go away for long: Function and method chaining. Procedural/function syntax for chains of function calls suck. It is too verbose (heavy on parentheses) and written backwards: print(sort(filter(transform(merge(extract(data)), args)))) To understand it, you have to read forward to find the *last* function call, which is actually the *first* call, then read backwards. An alternative that works with mutable data is verbose and expensive in vertical real estate: data.extract() data.merge() data.transform(args) data.filter() data.sort() data.print() There is a powerful design pattern to fix this, that works great with immutable data and functions: https://martinfowler.com/articles/collection-pipeline/ Shells such as bash have an excellent syntax for this: data | extract | merge | transform args | filter | sort | print Method chaining is good too: data.extract().merge.transform(args).filter().sort().print() except for the downsides discussed previously. It would be very, very nice if we had syntactic sugar for that chain of function calls that would work on general functions and methods. A long time ago, I wrote a helper class to do that: https://code.activestate.com/recipes/580625-collection-pipeline-in-python/?i... Heavy data processing frameworks and libraries like Pandas already use method chaining extensively. It would be great if we could chain function calls. -- Steve

(Fyi I am both 'Remy' and 'Raimi bin Karim', I don't know how that happened). 📌Goal Based on the discussion in the past few days, I’d like to circle back to my first post to refine the goal of this proposal: to improve readability of chaining lazy functions (map, filter, etc.) for iterables. This type of chainingis otherwise known as the collection pipeline pattern (thank you Steve for the article by Martin Fowler). Also, the general sentiment I am getting from this thread is that chaining function calls is unreadable. 📌Not plausible Extending the iterobject, based on previous discussions. 📌Proposed implementation Earlier in the thread, Chris proposed a custom class for this kind of pipeline. But what if we exposed this as a Python module in the standard library, parking it under the group of functional programming modules? https://docs.python.org/3/library/functional.html. 📜 Lib/iterpipeline.py (adapted from Chris's snippet) class pipeline: def __init__(self, iterable): self.__iterator = iter(iterable) def __iter__(self): return self.__iterator def __next__(self): return next(self.__iterator) def map(self, fn): self.__iterator = map(fn, self.__iterator) return self def filter(self, fn): self.__iterator = filter(fn, self.__iterator) return self def flatten(...): ... ... 📜 client_code.py from iterpipeline import pipeline ( pipeline([1,[2,3],4]) .flatten(…) .map(…) .filter(…) .reduce(…) ) 📌Design At first sight it might seem ridiculous because all we are doing is reusing builtin methods and functions from itertools. But that is exactly what the iterpipeline module offers — a higher-level API for the itertools module that allows users to construct a more fluent collection pipeline. The cons of this design is of course a bloated class which Steve previously mentioned. 📌Up for discussion * Naming * Implementation of the pipeline class * How to evaluate the pipeline. list(…) or to_list(…) * What methods to offer in the API and where do we stop (we don't have to implement everything) 📌On being Pythonic I don’t think we can say if it’s Pythonic because filter(map(…, …), …) wasn’t really a fair fight. But an indication of likeability lies largely in libraries for data processing like PySpark. There are other method-chaining functional programming libraries that have also gained popularity like https://github.com/EntilZha/PyFunctional. 📌On the collection pipeline pattern Because the collection pipeline pattern is more accessible now, I believe it would be a fresh perspective for Python programmers on how they view their data, and how to get to the final result. It becomes an addition to their current toolbox for data flow which is currently list comprehensions and for-loops. 📌On relying on 3rd party libraries instead Personally, this kind of response would make me a little sad. I started out this proposal because I feel strongly about this — I don’t want my fellow Python programmers to be missing out on this alternative way of reasoning about their data transformations. I learnt about this pattern the hard way. After Python, I picked up JavaScript and Kotlin at work. And then Rust as a hobby. Then I learnt PySpark. And I realised that these languages and frameworks had something in common — a fluent pipeline pattern. It just feels different to reason about your data in a sequential manner, rather than jumbled up clauses (no offence, I love list comprehension!). And then it hit me — I actually never thought about my data in this manner in Python. As a language that is commonly used for data processing in this era, Python's missing out this feature. So this is more of a heartfelt note rather than an objective one — I would love my fellow Python programmers to be exposed to this mental model, and that could only be done by implementing it in the standard library.

On Sat, Nov 27, 2021 at 1:39 AM Raimi bin Karim <raimi.bkarim@gmail.com> wrote:
I'm not certain that being in the standard library exposes Python programmers to something. Without looking up any references, which of these can you do with just the Python standard library (shelling out to external programs doesn't count)? * Get the dimensions of your terminal window * Connect to a PostgreSQL database * Extract login information from an FTP URL * Hash and verify passwords with bcrypt * Build an SMTP server * Build an FTP server * Parse a WAV file * Parse an MP3 file * Change the value of 3 * Build an MSI file (Windows installer) * Convert a file from UTF-7 to UTF-8 * Enumerate all Unicode characters that represent the digit 6 More importantly: If you didn't know that one of these was possible, would spending time writing Python code have exposed you to it? For instance (sorry, spoilers), the last one is most definitely possible, but if you were parsing input, would you think to check for anything other than the ASCII character '6'? Once you've thought of something, it's easy to think "it'd be cool if Python already had this". But the exact set of iteration tools that *you* want is probably not the same as the set of iteration tools that someone else wants. It would be extremely hard for the stdlib to have the perfect set of tools available, and as soon as something isn't available, you have to appeal to the core devs to add it, then wait for a release of Python that includes it. In contrast, you can simply use your own pipeline class without synchronizing with anyone else. You can choose what to call it, what tools to make available (and can add more as you find the need), and can make your own decisions about style, like whether it should be "pipe(x) | filter(...) | sort(...)" or "pipe(x).filter(...).sort(...)" or even "pipe(x) @filter(...) @sort(...)". There's no need to convince anyone else of what you think is right - you can just go ahead and do it! BTW, if you want more iteration tools, there are plenty to choose from. The more-itertools library has a ton of really cool stuff. Should they also be available in the pipeline? If you're making your own, then absolutely yes, you can have them available. But if it's part of the standard library, it can't depend on an external module like that. ChrisA

Just a note here: On Fri, Nov 26, 2021 at 6:37 AM Raimi bin Karim <raimi.bkarim@gmail.com> wrote:
to improve readability of chaining lazy functions (map, filter, etc.) for iterables.
I think there is a slight misperception here. I've seen the term lazy used a couple times, and at least once in contrast to list comprehensions. However, there are, of course, generator comprehensions (AKA generator expressions) which are also lazy. So this is about syntax, not capability. Another note: I'm not recommending it, but we could add a bunch of things to the Iterator ABC, and then it could be available everywhere. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 11/26/2021 1:59 PM, Christopher Barker wrote:
Is that true? I'm genuinely curious. I have lots of code with is the logical equivalent of: class Foo: def __init__(self, s): self.s = s self.idx = -1 def __iter__(self): return self def __next__(self): self.idx += 1 if self.idx < len(self.s): return self.s[self.idx] raise StopIteration() Would adding something to the Iterator ABC really also add it to my class Foo? Eric

On 27/11/21 11:34 am, Eric V. Smith wrote:
Would adding something to the Iterator ABC really also add it to my class Foo?
No, your class would need changing to inherit from the Iterator ABC. This is a big problem in general with "just add it to the ABC" ideas. A huge number of existing classes don't inherit from the relevant ABC because there currently isn't any need to do so. Fundamentally, Python is not organised around the concept of ABCs the way some other languages such as Java are. Instead, it's organised around informal protocols. They're informal in the sense that there's no need to inherit from a particular class in order to conform -- you just implement the required methods and you're good to go. As a consequence, there is strong pressure to keep the number of required methods to a minimum. It also means that adding required methods to a protocol late in the life of the language is effectively impossible. -- Greg

On Fri, Nov 26, 2021 at 4:12 PM Eric V. Smith <eric@trueblade.com> wrote:
No, but if you subclasses from the ABC it would. Python ABCs are a mysterious beast. Python itself is mostly Duck Typed. So you example above it a full fledged Iterator. In this case, ABCs serve primarily as formal documentation. But they are a bit more than that, as some of them provided non-abstract functionality, that you can get by subclassing from them. And some type checking systems will check the presence of attributes in the ABCs, so that, for instance, your example would type check as an Iterator. -CHB Eric
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Fri, 26 Nov 2021 at 14:39, Raimi bin Karim <raimi.bkarim@gmail.com> wrote:
I'm somewhat ambivalent about this pattern. Sometimes I find it readable and natural, other times it doesn't fit my intuition for the problem domain. I do agree that helping people gain familiarity with different approaches and ways of expressing a computation, is a good thing. I get your point that putting this functionality in a 3rd party library might not "expose" it as much as you want. In fact, I'd be pretty certain that something like this probably already exists on PyPI, but I wouldn't know how to find it. However, just because that doesn't provide the exposure you're suggesting, doesn't mean that it "could only be done by implementing it in the standard library". This isn't a technical problem, it's much more of a teaching and evangelisation issue. Building a library and promoting it via blogs, social media, demonstrations, etc, is a much better way of getting people interested. Showcasing the approach in an application that lots of people use is another (Pandas, for example, shows off the "fluent" style of chained method calls, which some people love and some hate, that's very similar to your proposal here). It's a lot of work, though, and not the type of work that a programmer is necessarily good at. Many great libraries are relatively obscure, because the author doesn't have the skills/interest/luck to promote them. What you *do* get from inclusion in the stdlib is a certain amount of "free publicity" - the "What's new" notices, people discussing new features, the general sense of "official sanction" that comes from stdlib inclusion. Those are all useful in promoting a new style - but you don't get them just by asking, the feature needs to qualify for the stdlib *first*, and the promotion is more a "free benefit" after the fact. And in any case, as others have mentioned, even being in the stdlib isn't guaranteed visibility - there's lots of stuff in the stdlib that gets overlooked and/or ignored. Sorry - I don't have a good answer for you here. But I doubt you'll find anyone who would be willing to help you champion this for the stdlib. Paul

It's supported with several syntaxes in macropy ( https://pypi.org/project/MacroPy/) but I remember seeing it in a more serious (for lack of a better term) package too, I just can't remember which one. E On Fri, 26 Nov 2021 at 19:41, Paul Moore <p.f.moore@gmail.com> wrote:

Ah yes, it's pipeop ! https://pypi.org/project/pipeop/ On Fri, 26 Nov 2021 at 22:39, Evpok Padding <evpok.padding@gmail.com> wrote:

On Fri, Nov 26, 2021 at 07:40:43PM +0000, Paul Moore wrote:
I don't think that even the most mad shell-scripting fanboi would say that pipelining is the One True software pattern and we should solve all problems that way :-) Collection pipelines are a natural and readable solution to *some* problems, not all, but in Python code it is difficult to use collection pipelines, so we end up writing the code backwards using function-call syntax. # The algorithm: filter, process, sort, print # data | filter something | process arg | sort | print # In Python: print(sort(process(filter(data, something), arg)))
I get your point that putting this functionality in a 3rd party library might not "expose" it as much as you want.
I fear that we don't yet have a good sense of the right syntax for this and how it will interact with the rest of Python syntax. As much as I love pipelines, I think that it's too early to push for this feature in the language. We need more experiments with syntax and functionality first. To me, the most natural syntax looks like this: value | function *args, **kwargs equivalent to `function(value, *args, **kwargs)` but of course we've already used the pipe for bitwise-or and set intersection. `>>` would be another equally good operator. I don't really like `|>` as an operator. If we were to invent a new operator, I'd prefer `->`. We can experiment by providing a wrapper class that takes a function, method or other callable and gives them a `__ror__` method for the pipe, and a `.p` (for partial) method for the partial application: value | Wrapper(function).p(*args, **kwargs) --> partial(function, *args, **kwargs)(value) That's sufficient for experimentation, but needing that wrapper adds too much friction. Ultimately nobody is going to use this idiom to its full potential until it works with arbitrary functions and callables without the wrapper. And by experiment, I don't mean experiment in the stdlib. I think that, like pathlib, this needs a few years of development outside the stdlib to mature. https://www.python.org/dev/peps/pep-0428/
I would be willing to help design an API and write a PEP but I would *not* champion it for 3.11. I think that a premature attempt to add it to the language would doom it to premature rejection. -- Steve

On Sat, Nov 27, 2021 at 02:58:07AM -0000, Raimi bin Karim wrote:
This syntactic sugar imo is powerful because it's not limited to iterables but generalises to possibly any object.
Indeed. There is no reason to limit pipelines to only collections, any object which can be transformed in any way will work.
But I guess since method chaining (for collection pipeline) is more commonplace across many languages, it might be easier to catch on.
We should be careful about the terminology. Method chaining and pipelining are related, but independent, design patterns or idioms: Method chaining or cascading, also called fluent interfaces, relies on calling a chain of methods, usually of the same object: obj.method().foo().bar().baz().foobar() This is very common in Python libraries like pandas, and in immutable classes like str, but not for mutable builtins like list. So it is very simple to implement chaining in your own classes, by having your methods either return a new instance, or by returning self. Just don't return None and you'll probably be fine :-) Pipelining involves calling a sequence of independent functions, not necessarily methods: obj | func | spam | eggs | cheese | aardvark In mathematical notation, using the ring operator for function composition, that is the same as: (func∘spam∘eggs∘cheese∘aardvark)(obj) In concatenative languages like Factor or Forth, you would write it in reverse polish notation (no operator required): obj func spam eggs cheese aardvark compared to regular function notation, where functions are written in prefix order, which puts them in the reverse of executation order: aardvark(cheese(eggs(spam(func(obj))))) Even though they logically go together in some ways, let's be careful to not confuse these two idioms. Note that chaining, pipelining and function composition go together very well: (obj.method().foo() | func∘spam | eggs).bar().baz() executes from left-to-right, exactly as it is written. (Assuming that the ring operator has a higher precedence than the pipe operator, otherwise you can use parentheses.) Contrast how well that reads from left to right compared to: eggs(spam(func(obj.method().foo()))).bar().baz() where the order of executation starts in the middle, runs left to right for a bit, then back to the middle, runs right to left, then jumps to the end and runs left to right again. -- Steve

I like the "obj -> func1 -> func2” idiom If func1 and func2 take only one argument. If func1 takes two arguments (arg1, arg2), it would be like the following: obj -> func1(arg2) -> func2. Suppose you want to insert the returned object not as a first argument. We could do something like the following: obj -> func1(arg1, ?) -> func2. The returned object will be captured in ?, which happens to be the second argument. If ? is the first positional argument and we only have one ? to pass, it can be omitted. That is, obj -> func1(arg2) -> func2 is equivalent to obj -> func1(?, arg2) -> func2(?) If you don’t see the “?”, you can always assume the returned object is used as the first argument because each chained function needs at least one “?” explicitly or implicitly. This implies chained functions need to take at least one argument which makes sense because we want them to transform the data we pass in. If func1 takes zero argument, then … obj -> func1 -> func2 will throw this error TypeError: func1() takes 0 positional arguments but 1 was given, which is the implicit ?. We can use chaining/piping with keyword arguments. obj -> func1(arg1, arg2=?) -> func2 Suppose our only_kw_func signature like this: def only_kw_func(*, arg1, arg2): … , then the following will work. obj -> only_kw_func(arg1=?, arg2=arg2) -> func2 We can probably pass only part of the returned object with pattern matching. If obj is a tuple of three elements, then the third element will be the returned object in the following expression: obj -> only_kw_func(arg1=(_, _, ?), arg2=arg2) -> func2. The “?” captures what we want to pass. If the pattern does not match, a missing argument error will occur. You can pass the returned object or part of it using pattern matching in multiple arguments because each chained function needs at least one “?” explicitly or implicitly. But now you cannot omit ? for the first positional argument. Or, missing argument error will occur. obj -> func1(?, ?) -> func2. obj will be captured twice in func1; one for arg1 and another for arg2. Let’s do a contrived example. Let’s say our obj is (1, 2, [1, 2, 3]) def func1(x: int, y: int, seq: Sequence[int]) -> bool: return (x in seq) and (y in seq) def func2(contains: bool) -> str: if contains: return “Success” else: return “Failure” obj -> func1((?, _, _), (_, ?, _), (_, _, ?)) -> func2 (1, 2, [1, 2, 3]) => func1(1, 2, [1, 2, 3]) => func2(True) => “Success” Abdulla

Forget about pattern matching in the previous email. The ? should always refer to the whole passed object. You can further manipulate the passed/returned object. Consider the following [1.32, 1.1, 1.4] -> map(round, ?) -> map(operator.mul, ?, (? -> list -> len) * [2]) -> tuple [1.32, 1.1, 1.4] => map(round, [1.32, 1.1, 1.4]) -> map(operator.mul, Map(1, 1, 1), (Map(1, 1, 1) => list(Map(1, 1, 1)) => len([1, 1, 1] => 3) * [2]) => tuple(Map(2, 2, 2)) => (2, 2, 2) Another example: [1, 2, 3] -> len -> operator.add(3) is equivalent to [1, 2, 3] -> operator.add(len(?), 3). Of course the first option looks better. Maybe the pattern matching should be flagged with something (maybe “m? pattern” as in (m)atch passed obj (?) with this pattern) and if it encounters “??”, that will be captured and passed for that particular argument. [1, 2, 3] -> operator.add(m? [_, ??, _], 2) # match [1, 2, 3] with [_, ??, _] and pass the captured ?? to the argument [1, 2, 3] => operator.add(2, 2) => 4 The previous email examples for pattern matching would look like this.. Obj is a tuple with three elements, the third element will be passed for arg1 below. obj -> only_kw_func(arg1=m? (_, _, ??), arg2=arg2) -> func2 The other example would be like the following: say our obj is (1, 2, [1, 2, 3]) def func1(x: int, y: int, seq: Sequence[int]) -> bool: return (x in seq) and (y in seq) def func2(contains: bool) -> str: if contains: return “Success” else: return “Failure” obj -> func1(m? (??, _, _), m? (_, ??, _), m? (_, _, ?)) -> func2 (1, 2, [1, 2, 3]) => func1(1, 2, [1, 2, 3]) => func2(True) => “Success” obj -> func(m? ??) is equivalent to obj -> func(?)

Raimi bin Karim writes:
The thing is, the reason such a module is needed at all is that Guido decided ages ago that mutating methods should return None, and in particular they don't return self. I'm not sure why he did that, you'd have to ask him, but we respect his intuition enough that to get it in, it would help to have answers to some of the following questions in advance: 1. Is dataflow/fluent programming distinguishable from whatever it was that Guido didn't like about method chaining idioms? If so, how? 2. Is the method chaining syntax preferable to an alternative operator? 3. Is there an obvious choice for the implementation? Specifically, there are at least three possibilities: a. Restrict it to mutable sequences, and do the transformations in place. b. Construct nested iterators and listify the result only if desired. c. Both. 4. Is this really so tricky that the obvious implementation of the iterator approach (Chris's) needs to be in the stdlib with tons of methods on it, or does it make more sense have applications write one with the specific methods needed for the application? Or perhaps instead of creating a generic class prepopulated with methods, maybe this should be a factory function which takes a collection of mapping functions, and adds them to the dataflow object on the fly? 5. map() and zip() take multiple iterables. Should this feature handle those cases? Note that the factory function approach allows the client to make this decision for themselves. 6. What are the names that you propose for the APIs? They need to indicate the implementations since there are various ways to implement.
I'm with Chris on this. My experience with responding to people on mailing lists is that very few read the documentation until they need to solve a problem that way, and then they read the part that solves their problem, only. Heck, I'm the kind of person who kept a copy of Python Essential Reference at my bedside for a couple years, and *I* don't know half of what's in the stdlib any more. I don't really think putting it in the stdlib will have the promotional effect you hope. As for "only way," I think _Dataflow Programming in Python_ and _Fluent Programming in Python_ are great titles for books. Maybe you could write one of those? I'm half-joking, of course, because writing a book is not something anyone should impose on somebody else. But you do have the passion part down already. :-) (And don't forget to have a cute animal for the cover if you go O'Reilly for the publisher!)
participants (15)
-
Abdulla Al Kathiri
-
Chris Angelico
-
Christopher Barker
-
David Mertz, Ph.D.
-
Eric V. Smith
-
Evpok Padding
-
Greg Ewing
-
Matsuoka Takuo
-
Paul Bryan
-
Paul Moore
-
Raimi bin Karim
-
Remy
-
Stefan Pochmann
-
Stephen J. Turnbull
-
Steven D'Aprano