[Python-ideas] Another attempt at a sum() alternative: the concatenation protocol
Andrew Barnert
abarnert at yahoo.com
Tue Jul 23 09:24:56 CEST 2013
On Jul 19, 2013, at 4:16, Sergey <sergemp at mail.ru> wrote:
> On Jul 17, 2013 David Mertz:
>
>>> Imagine a type, that somehow modifies items that it stores, removes
>>> duplicates, or sorts them, or something else, e.g.:
>>> class aset(set):
>>> def __add__(self, other):
>>> return self|other
Why would you do that? When sets were added, there was a long discussion about what operator to use for union, and | was chosen over + because + would misleadingly imply concatenation.
>>> Now we have a code:
>>> list_of_sets = [ aset(["item1","item2","item3"]) ] * 1000
>>> [...]
>>> for i in sum(list_of_sets, aset()):
>>> deal_with(i)
>>>
>>> If you replace `sum` with `chain` you get something like:
>>> for i in chain.from_iterable(list_of_sets):
>>> deal_with(i)
>>>
>>> Which works! (that's the worst part) but produces WRONG result!
No, it's not the wrong result. Nobody in his right mind would expect a function called "chain" to union a bunch of iterables; they'd expect it to chain a bunch of iterables. Which is exactly what it does.
>>
>> In this example you can use:
>>
>> aset(chain(*list_of_sets))
>>
>> This gives the same answer with the same big-O runtime.
>
> Sure, that's why I called it "error-prone" replacement.
> When you have a code like:
>>> for i in sum(list_of_sets, aset()):
>>> deal_with(i)
> You have pretty much no place for error.
>
> Well, it would be much better, if it was just:
>>> for i in sum(list_of_sets):
>>> deal_with(i)
> but for historical reasons we already have second parameter,
> so we have to deal with it.
It's not just historical reasons. It's the only way you can handle a potentially empty iterable. With reduce, it's an error to call it with an empty iterable and no start value; with sum, because it's about summing numbers rather than about general folding, you get 0. But there's no third alternative in a dynamically typed language.
> And now some newbie tries to use chain. So she does:
>>> for i in chain(list_of_sets):
>>> deal_with(i)
> oops, does not work. Ah, missing star (you miss it yourself!)
>>> for i in chain(*list_of_sets):
>>> deal_with(i)
> works, but incorrectly. Ok, let's hope that our newbie was careful
> enough with tests and noticed, that it does not do what it should.
> She reads the tutorial again, and notices that the example there was
> like:
> all_elems = list(chain(*list_of_lists))
> So she tries:
>>> for i in list(chain(*list_of_sets)):
>>> deal_with(i)
This is a mistake right off the bat, and shows a fundamental misunderstanding of iterables. It's the exact same problem we always see with people writing "for i in list(my_str)" to iterate characters, or "for i in list(my_file)" to iterate lines. People will presumably run into it and learn that list(iterable) gives you the same iteration as iterable itself before they get to chain. But if not, this is as good a time to learn as any.
> Nope, still wrong. Just in case she tries to remove a star, that she
> don't understand anyway:
>>> for i in list(chain(list_of_sets)):
>>> deal_with(i)
> Still no go. So after all these attempts she asks someone smart and
> finally gets the correct code:
>>> for i in aset(chain(list_of_sets)):
>>> deal_with(i)
This isn't really a good solution. It may work, but if you want to union a bunch of sets, you shouldn't try to spell it as chaining iterables into a set constructor. For example:
for i in union(list_of_sets):
for i in aset.union(list_of_sets):
If you really want to write it as an expression over the 2-element union operator, you can:
for i in reduce(aset.union, list_of_sets, aset()):
But really, as with many such uses of reduce, this is probably more readable as a loop. Especially when you consider that there is no reason this needs to be an expression inside the for loop. So:
bigset = aset()
for i in list_of_sets:
bigset |= i
for i in bigset:
All of these make it clear that we're creating the union of a bunch of sets. Note that in mathematical notation, you'd use a big U with the set of sets, not a sigma.
More generally, You're trying to make it possible for people to write looping code without understanding looping. This is silly.
Chain is a function for chaining iterables. If that's not what you want, don't use it.
Meanwhile, if your hypothetical newbie created the aset class himself, he's not a newbie--novices don't know how to create classes that implement the Iterable and Sequence protocols.
If he is at the stage where he's learning about that, it's a good time to learn that he's implemented an incorrect class. On the other hand, if he's using a class created by someone else, this will teach him that the class it buggy. Either way, the right way for him to use a class that misleadingly acts like a sequence even though it isn't is to stop using the class, or use it very carefully. For a newbie, the first answer is the answer.
> As I said, `chain` is a nice feature for smart people. But it is
> neither good for beginners, nor obvious, nor it's good as a sum
> replacement.
It's not good as a sum replacement, because it doesn't do the same thing. One sums numbers, the other chains iterables. Why should either one be a good replacement for the other?
Needless to say, neither of the two is good as a union replacement. So what?
>> It's possible to come up with more perverse customizations where
>> this won't hold. But I think all of them involve redefining
>> __add__ as something with little relation to it's normal meaning.
>> Odd behavior in those cases is to be expected.
>
> Hah. Easy. Even for commonly used type — for strings:
> str(chain(*list_of_strings))
> it does not work.
>
> So we have:
>
> * chain(*list_of_something)
> may be correct or may be not
No, it's always correct. If you want to iterate over a list of strings, this does exactly what you want.
> * something(chain(*list_of_something))
> may be correct or may be not
This is not something you should generally want to do.
Remember that the whole point of the iteration protocols is that you generally don't care what type you're iterating over. And when you do care, you usually want to build a collection of some specific type out of an iterable, again without caring about the original type. You want a list, or a blist.sortedlist, or whatever, and it doesn't matter that what was passed in was a list, a tuple, or something else.
More information about the Python-ideas
mailing list