On May 15, 2020, at 21:25, Steven D'Aprano email@example.com wrote:
On Fri, May 15, 2020 at 01:00:09PM -0700, Christopher Barker wrote:
I know you winked there, but frankly, there isn't a clear most Pythonic API here. Surely you do'nt think PYhton should have no methods?
That's not what I said. Of course Python should have methods -- it's an OOP language after all, and it's pretty hard to have objects unless they have behaviour (methods). Objects with no behaviour are just structs.
But seriously, and this time no winking, Python's design philosophy is very different from that of Java and even Ruby and protocols are a hugely important part of that. Python without protocols wouldn't be Python, and it would be a much lesser language.
[Aside: despite what the Zen says, I think *protocols* are far more important to Python than *namespaces*.]
I agree up to this point. But what you’re missing is that Python (even with stdlib stuff like pickle/copy and math.floor) has only a couple dozen protocols, and hundreds and hundreds of methods.
Some things should be protocols, but not everything should, or even close. Very few things should be protocols. More to the point, things should be protocols if and only if they have a specific reason to be a protocol. For example:
1. You need something more complicated than just a single straightforward call, like the fallback behavior for __contains__ and __iter__ with “old-style sequences”, or the whole pickle__getnewargs_ex__ and friends, or __add__ vs. __radd__.
2. Syntax, especially operator overloading, like __contains__ and __add__.
3. The function is so ubiquitously important that you don’t want anything else using the same name for different meanings, like __len__.
(There are probably other good reasons.)
When you have a reason like this, you should design a protocol. But when you don’t, dot syntax is the default. And it’s not just complexity, or “too many builtins” (after all, pickle.dump and math.ceil aren’t builtins). It’s that dot syntax gives you built-in disambiguation that function call syntax doesn’t. If I have a sequence, xs.index(x) has an obvious meaning. But index(xs, x) would not, because means too many different things (in fact, we already have an __index__ protocol that does one of those different things), and it’s not like len where one of those meanings is so fundamental that we a actually want to discourage all the others.
As I said elsewhere, I think we probably can’t have dot syntax in this case for other reasons. But that _still_ doesn’t necessarily mean we need a protocol. If we need to be able to override behavior but we can’t have dot syntax, *that* might be a good reason for a protocol, but either of those on its own is not a good reason, only the combination.
It’s worth comparing C++, where “free functions are part of a class’s interface”. They don’t spell their protocols with underscores, or call them protocols, but they idea is all over the place. x+y tries x.operator+(y) plus various fallbacks. The way you get an iterator is begin(xs) which by default calls xs.begin() so that’s the standard place to customize it but there are fallbacks. Converting a C to a D tries (among other things) both C::operator D() and D::D(C). And so on. But, unlike Python, they don’t try to distinguish what is and isn’t a protocol; the dogma is basically that everything should be a protocol if it possibly can be. Which doesn’t work. They keep trying to solve the compiler-ambiguity problem by adding features like argument-dependent lookup, and almost adding D’s uniform call syntax every 3 years, but none of that will ever solve the human-ambiguity problem. Things like + and begin and swap belong at the top level because they should always mean the same thing even if they have to be implemented differently, but things like draw should be methods because they mean totally different things on different types, and even if the compiler can tell which one is meant, even if an IDE can help you, deck.draw(5) vs. shape.draw(ctx) is still more readable than draw(deck, 5) vs. draw(shape, ctx). Ultimately, it’s just as bad as Java; it just goes too far in the opposite direction, which is still too far, and that’s what always happens when you’re looking for a perfect and simple dogma that applies to both iter and index so you never have to think about design.
Python tends to use protocol-based top-level functions:
len, int, str, repr, bool, iter, list
etc are all based on *protocols*, not inheritance.
The most notable counter-example to that was `iterator.next` which turned out to be a mistake and was changed in Python 3 to become a protocol based on a dunder.
No, the most notable counter examples are things like insert, extend, index, count, etc. on sequences; keys, items, update, setdefault, etc. on mappings; add, isdisjoint, etc. on sets; real, imag, etc. on numbers; send, throw, and close on generators… not to mention the dozens of public methods on string and bytes-like types. None of these things are functions that call protocol dunders, they’re all (still) methods that you call directly (or data attributes, in a few cases). And that’s as it should be.
Also, inheritance isn’t even relevant here. List doesn’t inherit index from anywhere. Duck typing already solves the problem of unnecessary inheritance; if that’s the only thing you’re trying to avoid, you don’t need a protocol.
That's not to say that methods aren't sometimes appropriate, or that there may not be grey areas where we could go either way. But in general, the use of protocols is such a notable part of Python, and so unusual in other OOP languages,
And yet, there are still far, far more methods than protocols even in Python.
Metaclasses are also a notable part of Python; most OO languages don’t have them, and even In Smalltalk (which I think is the language that mainstreamed the idea) they’re not as powerful and flexible as in Python. But that doesn’t mean most classes in Python should have a custom metaclass. Even things like namedtuple and module that at first glance seem like a case for metaclass often don’t need them. They’re used whenever they’re useful, not whenever possible. In the same way, Python uses protocols whenever they’re useful, not whenever possible.
that it trips up newcomers often enough that there is a FAQ about it:
Notice that this question is specifically about the difference between len, which is a protocol, and index, which is not. Even though they’re both things that all sequences support, that different sequences have to implement in different ways, etc., none of that is a sufficient reason to be a protocol.
There is a *lot* of hate for Python's use of protocols, especially among people who have drunk the "not real object oriented" Koolaid
Sure. Notice that C++, PHP, and Go get a lot of the same hate. The difference is that Python doesn’t deserve it, and it’s worth looking at why. In Python terms, C++ tries to make everything you could conceivably need to treat as a protocol work that way; PHP and Go are completely haphazard and arbitrary (and most of the things that look like protocols aren’t actually hookable except by a couple of special classes with compiler support); Python makes most of the things you’d want to be protocols work like protocols and most things you don’t, not. Unless you’re willing to learn to think Pythonically, it’s hard to believe that could work—there are an infinite number of things that you could in theory want to hook that you can’t—but in reality it’s a lot easier to identify the things you actually need and get 95% of the way there than to get 100% of what anyone could need in theory. (And then there’s always the incredibly dynamic runtime for when you really, really need some of that last 5%.) It sucks that some people will never get that (and it’s funny that so many of them like Unix…), but for the rest of us, Python works great.
And that’s why we have to approach these questions by looking for the tradeoffs and how they apply to this instance, not looking for the rule that will tell us the answer without having to think. A general argument that protocols can be useful is not an argument that any particular X should be a protocol.