[Python-ideas] Re: Enhancing iterator objects with map, filter, reduce methods

Nov. 26, 2021

      (Fyi I am both 'Remy' and 'Raimi bin Karim', I don't know how that happened).

📌Goal
Based on the discussion in the past few days, I’d like to circle back to my first 
post to refine the goal of this proposal: to improve readability of chaining lazy 
functions (map, filter, etc.) for iterables. This type of chainingis otherwise known 
as the collection pipeline pattern (thank you Steve for the article by Martin Fowler). 
Also, the general sentiment I am getting from this thread is that chaining function
calls is unreadable.

📌Not plausible
Extending the iterobject, based on previous discussions.

📌Proposed implementation
Earlier in the thread, Chris proposed a custom class for this kind of pipeline. 
But what if we exposed this as a Python module in the standard library, 
parking it under the group of functional programming modules? 
https://docs.python.org/3/library/functional.html. 

📜 Lib/iterpipeline.py (adapted from Chris's snippet)
class pipeline:

  def __init__(self, iterable):
    self.__iterator = iter(iterable)

  def __iter__(self):
    return self.__iterator

  def __next__(self):
    return next(self.__iterator)

  def map(self, fn):
    self.__iterator = map(fn, self.__iterator)
    return self

  def filter(self, fn):
    self.__iterator = filter(fn, self.__iterator)
    return self

  def flatten(...):
    ...
  ...

📜 client_code.py
from iterpipeline import pipeline
(
  pipeline([1,[2,3],4])
  .flatten(…)
  .map(…)
  .filter(…)
  .reduce(…)
)

📌Design
At first sight it might seem ridiculous because all we are doing is reusing 
builtin methods and functions from itertools. But that is exactly what the 
iterpipeline module offers — a higher-level API for the itertools module that 
allows users to construct a more fluent collection pipeline. The cons of this
design is of course a bloated class which Steve previously mentioned. 

📌Up for discussion
* Naming
* Implementation of the pipeline class
* How to evaluate the pipeline. list(…) or to_list(…)
* What methods to offer in the API and where do we stop (we don't have to
implement everything)

📌On being Pythonic
I don’t think we can say if it’s Pythonic because filter(map(…, …), …) wasn’t 
really a fair fight. But an indication of likeability lies largely in libraries for 
data processing like PySpark. There are other method-chaining functional 
programming libraries that have also gained popularity like 
https://github.com/EntilZha/PyFunctional.

📌On the collection pipeline pattern
Because the collection pipeline pattern is more accessible now, I believe
it would be a fresh perspective for Python programmers on how they view 
their data, and how to get to the final result. It becomes an addition to their 
current toolbox for data flow which is currently list comprehensions and 
for-loops.

📌On relying on 3rd party libraries instead
Personally, this kind of response would make me a little sad. I started out 
this proposal because I feel strongly about this — I don’t want my fellow 
Python programmers to be missing out on this alternative way of reasoning 
about their data transformations. 

I learnt about this pattern the hard way. After Python, I picked up 
JavaScript and Kotlin at work. And then Rust as a hobby. Then I learnt 
PySpark. And I realised that these languages and frameworks had something 
in common — a fluent pipeline pattern. It just feels different to reason about 
your data in a sequential manner, rather than jumbled up clauses (no offence, 
I love list comprehension!). And then it hit me — I actually never thought about 
my data in this manner in Python. As a language that is commonly used for
data processing in this era, Python's missing out this feature.

So this is more of a heartfelt note rather than an objective one — I would love 
my fellow Python programmers to be exposed to this mental model, and that 
could only be done by implementing it in the standard library.