Mailman 3 sequence.apply(function) - Python-ideas

newer
itertools.chunks(iterable, size,...

sequence.apply(function)

anatoly techtonik

Sept. 1, 2012

1:29 a.m.

Idea: Apply function to every element of a sequence and return new sequence. It's more pythonic than map(), because clearly works only as a list method. -- anatoly t.

Show replies by date

Steven D'Aprano

September 2012

1:56 a.m.

On 01/09/12 16:29, anatoly techtonik wrote:

...

I think you mean "less pythonic". -1 We already have map, and it works lazily on any iterable. Why do we need something less efficient and more limited? -- Steven

Ned Batchelder

7 a.m.

On 9/1/2012 2:29 AM, anatoly techtonik wrote:

...

Python 2 has itertools.imap, and Python 3 has map, both of which do exactly what you want. --Ned.

...

Guido van Rossum

12:06 p.m.

On Sat, Sep 1, 2012 at 8:29 AM, anatoly techtonik <techtonik@gmail.com> wrote:

...

It's less Pythonic, because every sequence-like type (not just list) would have to reimplement it. Similar things get proposed for iterators (e.g. it1 + it2, it[:n], it[n:]) regularly and they are (and should be) rejected for the same reason. -- --Guido van Rossum (python.org/~guido)

Yuval Greenfield

4:55 p.m.

On Sat, Sep 1, 2012 at 8:06 PM, Guido van Rossum <guido@python.org> wrote:

...

Python causes some confusion because some things are methods and others builtins. Is there a PEP or rationale that defines what goes where? Yuval Greenfield

Antoine Pitrou

5:02 p.m.

On Sun, 2 Sep 2012 00:55:39 +0300 Yuval Greenfield <ubershmekel@gmail.com> wrote:

...

When something only applies to a single type or a couple of types, it is a method. When it is generic enough, it is a builtin. Of course there are grey areas but that's the basic idea. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Nick Coghlan

9:14 p.m.

On Sun, Sep 2, 2012 at 8:02 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Yes, it comes down to the fact that we are *very* reluctant to impose required base classes (I believe the only ones currently enforced anywhere are object, BaseException and str - everything else should fall back to a protocol method, ABC or interface specific registration mechanism. Most interfaces that used to require actual integer objects are now using operator.index, or one of its C API equivalents). In Python, we also actively discourage "reopening" classes to add new methods (this is mostly a cultural thing, though - the language doesn't actually contain any mechanism to stop you by default, although it's possible to add such enforcement via metaclasses) Thus, protocols are born which define "has this behaviour", rather than "is one of these". That's why we have the len() builtin and associated __len__() protocol to say "taking the length of this object is a meaningful operation" rather than mandatory inheritance from a Container class that has a ".len()" method. They're most obviously beneficial when there are *multiple* protocols that can be used to implement a particular behaviour. For example, with iter(), the __iter__ protocol is only the first option tried. If that fails, then it will instead check for __getitem__ and if that exists, return a standard sequence iterator instead. Similarly, reversed() checks for __reversed__ first, and then checks for __len__ and __getitem__, producing a reverse sequence iterator in the latter case. Similarly, next() was moved from a standard method to a builtin function in 3.x? Why? Mainly to add the "if not found, return this default value" behaviour. That kind of thing is much easier to add when the object is only handling a piece of the behaviour, with additional standard mechanisms around it (in this case, optionally returning a default value when StopIteration is thrown by the iterator). Generators are another good illustration of the principle: For iter() and next(), they follow the standard protocol and rely on the corresponding builtins. However, g.send() and g.throw() require deep integration with the interpreter's eval loop. There's currently no way to implement either of those behaviours as an ordinary type, thus they're exposed as ordinary methods, since they're genuinely generator specific. As to *why* this is a good thing: procedural APIs encourage low coupling. Yes, object oriented programming is a good way to scale an application architecture up to more complicated problems. The issue is with fetishising OOP to the point where you disallow the creation of procedural APIs that hide the OOP details. That approach sets a minimum floor to the complexity of your implementations, as even if you don't *need* the power of OOP, you're forced to deal with it because the language doesn't offer anything else, and that way lies Java. There's a reason Java is significantly more popular on large enterprise projects than it is in small teams - it takes a certain, rather high, level of complexity for the reasons behind any of that boilerplate to start to become clear :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Yuval Greenfield

3:50 a.m.

On Sun, Sep 2, 2012 at 5:14 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Sun, Sep 2, 2012 at 8:02 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
On Sun, 2 Sep 2012 00:55:39 +0300 Yuval Greenfield <ubershmekel@gmail.com> wrote:

...
On Sat, Sep 1, 2012 at 8:06 PM, Guido van Rossum <guido@python.org> wrote:

...
It's less Pythonic, because every sequence-like type (not just list) would have to reimplement it.

Similar things get proposed for iterators (e.g. it1 + it2, it[:n], it[n:]) regularly and they are (and should be) rejected for the same reason.

Python causes some confusion because some things are methods and others builtins. Is there a PEP or rationale that defines what goes where?

When something only applies to a single type or a couple of types, it is a method. When it is generic enough, it is a builtin. Of course there are grey areas but that's the basic idea.

Yes, it comes down to the fact that we are *very* reluctant to impose required base classes (I believe the only ones currently enforced anywhere are object, BaseException and str - everything else should fall back to a protocol method, ABC or interface specific registration mechanism. Most interfaces that used to require actual integer objects are now using operator.index, or one of its C API equivalents).

In Python, we also actively discourage "reopening" classes to add new methods (this is mostly a cultural thing, though - the language doesn't actually contain any mechanism to stop you by default, although it's possible to add such enforcement via metaclasses)

Thus, protocols are born which define "has this behaviour", rather than "is one of these". That's why we have the len() builtin and associated __len__() protocol to say "taking the length of this object is a meaningful operation" rather than mandatory inheritance from a Container class that has a ".len()" method.

They're most obviously beneficial when there are *multiple* protocols that can be used to implement a particular behaviour. For example, with iter(), the __iter__ protocol is only the first option tried. If that fails, then it will instead check for __getitem__ and if that exists, return a standard sequence iterator instead. Similarly, reversed() checks for __reversed__ first, and then checks for __len__ and __getitem__, producing a reverse sequence iterator in the latter case.

Similarly, next() was moved from a standard method to a builtin function in 3.x? Why? Mainly to add the "if not found, return this default value" behaviour. That kind of thing is much easier to add when the object is only handling a piece of the behaviour, with additional standard mechanisms around it (in this case, optionally returning a default value when StopIteration is thrown by the iterator).

Generators are another good illustration of the principle: For iter() and next(), they follow the standard protocol and rely on the corresponding builtins. However, g.send() and g.throw() require deep integration with the interpreter's eval loop. There's currently no way to implement either of those behaviours as an ordinary type, thus they're exposed as ordinary methods, since they're genuinely generator specific.

As to *why* this is a good thing: procedural APIs encourage low coupling. Yes, object oriented programming is a good way to scale an application architecture up to more complicated problems. The issue is with fetishising OOP to the point where you disallow the creation of procedural APIs that hide the OOP details. That approach sets a minimum floor to the complexity of your implementations, as even if you don't *need* the power of OOP, you're forced to deal with it because the language doesn't offer anything else, and that way lies Java. There's a reason Java is significantly more popular on large enterprise projects than it is in small teams - it takes a certain, rather high, level of complexity for the reasons behind any of that boilerplate to start to become clear :)

Cheers, Nick.

Thanks, that's some interesting reasoning. Maybe I'm old fashioned but I like running dir(x) to find out what an object can do, and the wall of double underscores is hard to read. Perhaps we could add to the inspect module a "dirprotocols" function which returns a list of builtins that can be used on an object. I see that the builtins are listed in e.g. help([]) but on user defined classes it might be less obvious. Maybe we could just add a dictionary: inspect.special_methods = {'__len__': len, '__getitem__': 'x.__getitem__(y) <==> x[y]', '__iter__': iter, ... } and then dirprotocols would be easy to implement. Yuval

Steven D'Aprano

September 2012

6:56 a.m.

On 01/09/12 16:29, anatoly techtonik wrote:

...

I think you mean "less pythonic". -1 We already have map, and it works lazily on any iterable. Why do we need something less efficient and more limited? -- Steven

Ned Batchelder

noon

On 9/1/2012 2:29 AM, anatoly techtonik wrote:

...

Python 2 has itertools.imap, and Python 3 has map, both of which do exactly what you want. --Ned.

...

Guido van Rossum

5:06 p.m.

On Sat, Sep 1, 2012 at 8:29 AM, anatoly techtonik <techtonik@gmail.com> wrote:

...

Yuval Greenfield

9:55 p.m.

On Sat, Sep 1, 2012 at 8:06 PM, Guido van Rossum <guido@python.org> wrote:

...

Python causes some confusion because some things are methods and others builtins. Is there a PEP or rationale that defines what goes where? Yuval Greenfield

Antoine Pitrou

10:02 p.m.

On Sun, 2 Sep 2012 00:55:39 +0300 Yuval Greenfield <ubershmekel@gmail.com> wrote:

...

Nick Coghlan

2:14 a.m.

On Sun, Sep 2, 2012 at 8:02 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Yuval Greenfield

September 2012

8:50 a.m.

On Sun, Sep 2, 2012 at 5:14 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Sun, Sep 2, 2012 at 8:02 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
On Sun, 2 Sep 2012 00:55:39 +0300 Yuval Greenfield <ubershmekel@gmail.com> wrote:

...
On Sat, Sep 1, 2012 at 8:06 PM, Guido van Rossum <guido@python.org> wrote:

...
It's less Pythonic, because every sequence-like type (not just list) would have to reimplement it.

Similar things get proposed for iterators (e.g. it1 + it2, it[:n], it[n:]) regularly and they are (and should be) rejected for the same reason.

Python causes some confusion because some things are methods and others builtins. Is there a PEP or rationale that defines what goes where?

When something only applies to a single type or a couple of types, it is a method. When it is generic enough, it is a builtin. Of course there are grey areas but that's the basic idea.

Yes, it comes down to the fact that we are *very* reluctant to impose required base classes (I believe the only ones currently enforced anywhere are object, BaseException and str - everything else should fall back to a protocol method, ABC or interface specific registration mechanism. Most interfaces that used to require actual integer objects are now using operator.index, or one of its C API equivalents).

In Python, we also actively discourage "reopening" classes to add new methods (this is mostly a cultural thing, though - the language doesn't actually contain any mechanism to stop you by default, although it's possible to add such enforcement via metaclasses)

Thus, protocols are born which define "has this behaviour", rather than "is one of these". That's why we have the len() builtin and associated __len__() protocol to say "taking the length of this object is a meaningful operation" rather than mandatory inheritance from a Container class that has a ".len()" method.

They're most obviously beneficial when there are *multiple* protocols that can be used to implement a particular behaviour. For example, with iter(), the __iter__ protocol is only the first option tried. If that fails, then it will instead check for __getitem__ and if that exists, return a standard sequence iterator instead. Similarly, reversed() checks for __reversed__ first, and then checks for __len__ and __getitem__, producing a reverse sequence iterator in the latter case.

Similarly, next() was moved from a standard method to a builtin function in 3.x? Why? Mainly to add the "if not found, return this default value" behaviour. That kind of thing is much easier to add when the object is only handling a piece of the behaviour, with additional standard mechanisms around it (in this case, optionally returning a default value when StopIteration is thrown by the iterator).

Generators are another good illustration of the principle: For iter() and next(), they follow the standard protocol and rely on the corresponding builtins. However, g.send() and g.throw() require deep integration with the interpreter's eval loop. There's currently no way to implement either of those behaviours as an ordinary type, thus they're exposed as ordinary methods, since they're genuinely generator specific.

As to *why* this is a good thing: procedural APIs encourage low coupling. Yes, object oriented programming is a good way to scale an application architecture up to more complicated problems. The issue is with fetishising OOP to the point where you disallow the creation of procedural APIs that hide the OOP details. That approach sets a minimum floor to the complexity of your implementations, as even if you don't *need* the power of OOP, you're forced to deal with it because the language doesn't offer anything else, and that way lies Java. There's a reason Java is significantly more popular on large enterprise projects than it is in small teams - it takes a certain, rather high, level of complexity for the reasons behind any of that boilerplate to start to become clear :)

Cheers, Nick.

4581

Age (days ago)

4582

Last active (days ago)

List overview

Download

7 comments

7 participants

participants (7)

anatoly techtonik
Antoine Pitrou
Guido van Rossum
Ned Batchelder
Nick Coghlan
Steven D'Aprano
Yuval Greenfield

sequence.apply(function)

tags

participants (7)