Resolve append/add API inconsistency between set and list

Currently, it is not possible to use a single method name to add an item to a collection that might be a set or a list. There have been other requests for a "push" method to be a complement to "pop", so if that were added to both set and list (synonym for <set>.add and <list>.append) that would solve the issue. Additionally, there have been other requests for an operator for adding to a collection, and in some other languages such as C++ and Ruby, "<<" is used for the same or similar purposes. Perhaps both "push" and "__lshift__" could be added as synonymous operations.

Failing fast is good, but it is also a very common case to want to add an item to a collection regardless of whether it is set-like or sequence-like. Does the fact that the `append` and `add` methods would still exist mitigate your objection? Call `append` when expecting a list, call `add` when expecting a set, or call `push` or use `<<` when either of those would be appropriate targets.

On Oct 13, 2019, at 12:20, Steve Jorgensen <stevej@stevej.name> wrote:
Failing fast is good, but it is also a very common case to want to add an item to a collection regardless of whether it is set-like or sequence-like.
Does the fact that the `append` and `add` methods would still exist mitigate your objection? Call `append` when expecting a list, call `add` when expecting a set, or call `push` or use `<<` when either of those would be appropriate targets.
Is there any reason this has to be a method? You can write a function that does this today: def push(collection, value): if isinstance(collection, collections.abc.MutableSet): collection.add(value) elif isinstance(collection, collections.abc.MutableSequence): collection.append(value) else: raise TypeError(f"Don’t know how to push to '{type(collection).__name__}' objects") This works with existing code today. And it allows you to customize it to handle not-quite-sequence types like a circular buffer that you found off PyPI without modifying those types. If you expect to do a lot of that, `singledispatch` might be better than explicit type-switching code. Or, if you have a lot of your own classes that you can modify that aren’t quite sequences but want to be pushable, you can even add your own protocol method that it checks for first.

It seems pretty trivial to create your own push utility function: APPEND = "append" ADD = "add" def push(pushable, value): try: (getattr(t:=type(pushable), APPEND, None) or getattr(t, ADD)) (pushable, value) except AttributeError: raise TypeError(f"{t.__qualname__} object has no attributes {APPEND} or {ADD}") The benefits of the core language keeping the usage of sets and lists utterly separate has been demonstrated over and over again. Bringing them together at the language feature level definitely seems like a mistake. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler On Sun, Oct 13, 2019 at 3:24 PM Steve Jorgensen <stevej@stevej.name> wrote:

IMO, the problem with the "you can create your own" solution is that this is such a common thing that it seems like it should simply be present rather than having a vast number of applications and libraries each implementing their own unique solutions.

I have definitely hit this difficulty many times. The part of the code that "just puts something in the collection" doesn't need to care conceptually about the kind of collection. Taking the data back out somewhere else more often needs to worry about order, efficiency, concurrency, etc. But as I see other comments, I think an external function makes more sense. Sure, list and set could grow a common method. But then what about deque or queue? What about the excellent third part sortedcontainers package? Or a Panda Series? Do all of these need to grow that same method? I think more flexible would be a callable that also provided a way to register how to "just put the item in the container." This could be expanded by users only as far as was useful to them. Sufficiently refined, I can imagine wanting such a thing in the collections module itself. So maybe an API like: just_add_it.register(set, set.add) just_add_it.register(list, list.append) just_add_it.register(Queue, Queue.put) just_add_it.register(MyCollection, lambda stuff, item: ...) # Later just_add_it(the_things, item) Then a user can decide where to weigh issues like sets not allowing mutable objects or queues being finite size. On Sun, Oct 13, 2019, 5:22 PM Steve Jorgensen <stevej@stevej.name> wrote:

I like the way you're thinking. In any case, I think there should be some standard pattern for this implemented in the standard library. Perhaps, in addition to the using registration, implement a `__push__` method convention, and then the generic function for pushing an item to a collection would look for `__push__`, `push`, `append`, or `add` and try the first of those that it finds on an object of a type that is not registered.

On Oct 13, 2019, at 14:50, Steve Jorgensen <stevej@stevej.name> wrote:
Perhaps, in addition to the using registration, implement a `__push__` method convention, and then the generic function for pushing an item to a collection would look for `__push__`, `push`, `append`, or `add` and try the first of those that it finds on an object of a type that is not registered.
This is _also_ trivial to do with singledispatch. Just change the base function to look up the protocol: @singledispatch def just_add_it(collection, value): try: collection.push(value) except AttributeError: raise TypeError(whatever string you want) There are a few different variations you can do with the lookup and error handling, depending on whether you want to exactly simulate normal special method lookup, or be a little more lenient. Any types you register overloads for will use those overloads; any other types will use the `push` method protocol if it exists, and it’ll raise whatever exception you want otherwise.

On Sun, Oct 13, 2019 at 05:42:21PM -0400, David Mertz wrote:
And that's why I am skeptical that there is "very common" code that doesn't care about whether it has a list or a set. We surely almost always will care when we're taking the data out again, because of ordering and efficiency etc. If you require a set for the data-output side, why would you accept a list for the data-input side when you won't be able to use it further on? And vice versa. In the absense of any concrete examples demonstrating that this is useful, I think this is an over-generalisation of little practical use. -- Steven

Steven D'Aprano wrote:
Having come from Ruby to Python, this seems like an odd statement to me. It always felt very natural that there is one way to add something to a simple collection, and the distinction between an "array" and a set is irrelevant in that regard. Of course, just because I'm used to that as a natural thing doesn't make it right for Python. Thinking about examples, the ones that immediately come to mind are mainly collectors — objects passed into a method to record things that will be examined afterward by the caller. In those cases, what I'm talking about could actually be handled (actually with more versatility) by having the called method expect a callable though, since it will only do one thing with the object that is passed in. The caller would then pass in `<some-set>.add`, `<some.list>.append` in that case (or whatever else is appropriate). That, of course, is simply a coding pattern and doesn't require any changes to the standard lib. Having said all of that, I have a feeling that there are categories of example that I'm not thinking of right now. Maybe others will chime in with such examples — or to agree there there are none (or extremely few).

On Mon, Oct 14, 2019 at 11:26 AM Steve Jorgensen <stevej@stevej.name> wrote:
Sets are somewhat like lists, but also somewhat like dictionaries. A set is a list that doesn't allow duplicates, or it's a dictionary where the values are irrelevant. If you think of a set as more list-like, then it's logical to "set.append(x)"; if you think of it as more dict-like, you should "set[x]=True". It's not quite either, which is why a generic "stick this thing in that collection" doesn't really exist. Though a set.__setitem__() method might be helpful here. If you set an item to any truthy value, it is added to the set; set it to a falsy value and it will be discarded (without error if it wasn't there, so assignment in this way is idempotent). That would make them more similar to collections.Counter, which can be seen as a multiset of sorts. ChrisA

On Mon, Oct 14, 2019 at 5:41 PM Serhiy Storchaka <storchaka@gmail.com> wrote:
It doesn't guarantee that if you do any set-like operations, though. c = Counter() c[1] = 0 c |= Counter() # a no-op if thinking set-wise assert 1 not in c So if we're talking analogies with sets, there's no inherent problem with setting-to-zero meaning "remove". ChrisA

On Oct 13, 2019, at 17:25, Steve Jorgensen <stevej@stevej.name> wrote:
Thinking about examples, the ones that immediately come to mind are mainly collectors — objects passed into a method to record things that will be examined afterward by the caller. In those cases, what I'm talking about could actually be handled (actually with more versatility) by having the called method expect a callable though, since it will only do one thing with the object that is passed in. The caller would then pass in `<some-set>.add`, `<some.list>.append` in that case (or whatever else is appropriate).
That, of course, is simply a coding pattern and doesn't require any changes to the standard lib.
I like the way C++ handles this, actually. The C++ notion of “Iterator” is a lot more complicated than the Python notion of “Iterator”, and usually, this is a huge pain, but occasionally, it allows for some nice things that you can’t do in Python. One of those things is that collectors don’t take a collection to add to, they take an “output Iterator”. The caller decides what to pass in. If you’ve got a vector (like a Python list), that’s usually a back inserter (which just appends to the end). If you’ve got an ordered set or a hash set, it’s usually a set inserter. If you’ve got a deque, it’s often a back inserter, but it might be a front inserter instead. If you’ve got a threaded queue, it’s a synchronized back inserter, or maybe a cooperatively-lock-free one. If you’ve got a double-linked list, it could be a back inserter, a front inserter, or even a splice inserter at a particular location. And so on. The burden to choose is on the caller of the collector, who knows what the right thing to do is for its use of its collection, rather than on the collector implementation, which is generic and stupid. Anyway, this doesn’t come up that often in Python, but when it does, passing a callable, as you suggest, is usually the best solution. So, I think you’ve just found some of those cases, and found the best solution to them relatively easily, without needing anything to be changed. Which implies that Python is fine as it is. :)

On Oct 13, 2019, at 14:42, David Mertz <mertz@gnosis.cx> wrote:
This is exactly what `singledispatch` already gives you: from functools import singledispatch @singledispatch def just_add_it(collection, value): raise TypeError(blah blah) Now you can explicitly register methods like this: just_add_it.register(set, set.add) just_add_it.register(list, list.append) just_add_it.register(Queue, Queue.put) And, while you can do the same thing with lambda, you can also write it as a decorator: @just_add_it.register(MyCollection) def _(stuff, item): # .... … which is more flexible and often more readable, especially for functions whose whole point is a side effect. And you can even use type annotations to make it even simpler: @just_add_it.register def _(stuff: MyCollection, item): # .... More generally, both you and Steve Jorgensen seem to be proposing a bunch of things that already exist. It’s worth taking the time to figure out what’s already there before suggesting changes.

On Sun, Oct 13, 2019, 6:12 PM Andrew Barnert
Sort of. I proposed the spelling as .register() because single dispatch was the obvious way to implement it. The `just_add_it()` function would be a new thing, whether in a personal toolkit or in the collections module. Hopefully with a better name than my placeholder. But yeah, given that your three line implementation is basically feature complete, probably there's no reason it needs to live in stdlib. My point was really just that a function could be more flexible than a method added to every collection type, or even than a protocol.

On Oct 13, 2019, at 15:25, David Mertz <mertz@gnosis.cx> wrote:
Sort of. I proposed the spelling as .register() because single dispatch was the obvious way to implement it.
Ah, sorry; I misread that as implying that you wanted to build something like singledispatch but less flexible, rather than just wanting to use it. Now that I read it more carefully… I agree 100%. :)

On Sun, Oct 13, 2019 at 07:20:29PM -0000, Steve Jorgensen wrote:
Is that correct though? To the best of my memory, I've never wanted to add an item to a list-or-set in 15 years, nor have I seen code in practice that does so. I don't think it is "very common", but I would like to see some examples of your code, and third-party libraries, which do this. The problem is that lists are sequences, and sets are not. Sequences are not very set-like, and sets are not very list-like. We even use different terminology to describe their contents: lists contain *items* while sets contain *elements*. So it isn't clear to me under what circumstances you would expect either one. I think that I will need to see some of these very common examples of wanting to add an element to a set-or-list before I can take this proposal too seriously. -- Steven

On 10/13/2019 5:57 PM, Steven D'Aprano wrote:
On a few occasions I have modified code that used lists to instead use sets. Generally I was porting very old code, before sets were built in, so lists had been used. However, sets were more appropriate when I refactoring. It would have saved me some hassle had lists and sets worked the same way as far as adding members. But I can't say it's very common. I don't think I've ever seen code that wanted to use duck-typing for adding an element/item to a container. I'm sure it exists, but probably is not very common. Eric

On Mon, Oct 14, 2019 at 9:09 AM Steven D'Aprano <steve@pearwood.info> wrote:
I don't often actually want to add to list-or-set, but I do sometimes have half-written code using one or the other, and then a future design constraint makes me rethink my choice of data type. Oh, now I have to change how I'm adding those items. It'd be helpful to have the API more consistent, but I don't actually need it to be polymorphic per se. ChrisA

OK, quick search reveals some code in my actual codebase that illustrates the concept. This isn't code I was actually looking at this week, but it is in one of the company repos. def form_to_kwarg(form): """Converts the form into the kwarg used in creates and updates. If the form isn't validated, returns an empty dictionary. """ if form.validate(): attrs_kwarg = {} keys = get_column_names() keys.remove('cost') keys.remove('id') keys.append('client_id') keys.append('ratelimit') for attr in keys: attrs_kwarg[attr] = getattr(form, attr).data attrs_kwarg['measurements'] = slugs_to_string(attrs_kwarg[ 'measurements']) return attrs_kwarg else: return None This put a certain thumb on the scale since I searched for .append() which will automatically only find lists. Or maybe it's a deque. Or something else. I don't actually know for sure (but I kinda think the programmers who did this wouldn't have used something more "exotic"). So who knows what `get_column_names()` gives me. I can see on the next line that we remove one of them. But that could be either a set or a list (or some other things). Then there are the .append()s that give it away... but what if those hadn't been there and I just discovered the need to "just_add_it()" to address the current issue? So then I loop through this 'keys' thing, whatever it is. That could be any iterable at that point, but in particular any collection. Indeed, there *is* a determinate answer to what kind of thing keys is. But while I'm looking at just this function, I might well not know which type that is... and usually I don't want to think about it. And maybe the implementation of `get_column_names()` will change later to use a different collection. I shouldn't need to worry about that. In this example, moreover, it sure looks like all the items inside the collection are immutable (strings), and it sure does suggest that they each occur once but that I don't care about the order (since I'm using those items as keys to a new dictionary, and duplicates would likely cause bugs). Btw, yes I also see that the docstring doesn't match the behavior :-). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Mon, Oct 14, 2019 at 1:34 AM David Mertz <mertz@gnosis.cx> wrote:
I guess the (Python) language is pretty consistent in this particular case that it calls the same things same and the different things different. E.g. `remove` and `pop` have the same meaning for a list or a set, but `append` does not. What would "generic" `add` defined for a list-or-set do exactly on the list? Append, insert, and if the latter, where exactly? In your code, if you do not care about the type and just need one common 'add' then you either are using sets (even if they are not Python sets, but lists, queues, etc. you are using them as sets), or you need to specify, whether this 'add' actually appends, or inserts and then it is no longer just a generic 'add'. Richard

On Oct 13, 2019, at 15:29, Chris Angelico <rosuav@gmail.com> wrote:
I sometimes run into that too, but usually it turns out to be a sign that what I really wanted was a generator function, and then passing the result of that function to the list or set constructor or to a list or set comprehension. Especially since it’s also usually around the time I realize that the 5-line loop I thought I had to write is 20 lines long and not even complete yet, so it’s calling out for refactoring into a function anyway. When for whatever reason it can’t be written that way, that’s often a sign that I need a collector function that takes an adder function, as the OP suggested later in the thread, rather than a function that just builds and returns a collection. The cases where it really is just a matter of wanting to change some too-small-to-refactor loop to build a set rather than a list, there’s probably only one append to change anyway. I’m sure that doesn’t cover 100% of all the times this comes up, but I think it covers so many of them that the need for a unified push function is very rare. If it had come up more than once or twice, I would have written that singledispatch push implementation long ago, but I’ve never thought of a need for it until this thread.

On Sun, Oct 13, 2019, 6:12 PM Steven D'Aprano:
Really?! I've seen it several times this week. Maybe we mean something slightly different though. My real world is looking at some function in an existing codebase that handles a nested data structure. At some leaf I can tell I'm handling a bunch of scalars by looking at the code (i.e. a loop or a reduction). Somewhere far upstream that data was added to the structure. And somewhere downstream we might do something sensitive to the data type. But right here, it's just "a bunch of stuff." I rarely see the "bunch" being genuinely polymorphic, but I also just don't want or need to think about which collection it is to edit this code.... Except I do if I want to "just add item". Moreover, it's not uncommon to want to optimize the upstream choice of container and need as little refactoring downstream as possible. But yes, set and list are not generically interchangeable, of course. Maybe the stuff really needs to have mutables. Maybe it needs to allow repeats. Refactoring isn't just drop in replacement, and one needs to think of such issues.

On Sun, Oct 13, 2019 at 06:41:40PM -0400, David Mertz wrote:
My real world is looking at some function in an existing codebase that handles a nested data structure.
That rules out sets, since sets aren't hashable and cannot be inserted into a set.
Let me see if I understand your workflow: 1. upstream: you build up a nested data structure, adding elements as needed 2. midstream: you treat the nested data structure as "a bunch of stuff" 3. downstream: you pull the nested data structure apart In the midstream, what are you doing to this bunch of stuff? That's the critical point. It seems to me that there's not very much you can do (apart from printing it) without knowing something about the structure. The upstream and the downstream will care about the structure, and will care about the distinction between sets and lists. Upstream needs to know that adding a set will fail, but appending a set will be fine: list.append(myset) # fine, sets can go into lists set.add(myset) # fails, because sets aren't hashable Giving list-or-set a common "push/add/append" method won't help you here, you still need to distinguish between pushing to the end of a list and pushing into an unordered set. If midstream is adding stuff, then it's not really midstream, its part of upstream. If midstream *isn't* adding stuff, then this proposed push/add/append common method is irrelevant to midstream.
Can you add a set? If you append an element, will you introduce a duplicate? You surely care about this, which means you're not really indifferent to whether "bunch-of-stuff" is a list or a set. I might have misunderstood your use-case, but so far I don't see that this is a convincing case for a common method. I will admit that there's a tiny bit of convenience: you can write "bunch.push()" without thinking about what you are pushing into (is it a set? a list? nested or flat? a custom class?), and whether it will succeed and whether it will mess up your data structure. So "convenience" is a benefit of this common method. But I think that argument from convenience is a weak argument. And a convenience function in your own module is probably sufficient, rather than burdening every Python user with having to learn another (redundant) method.
Moreover, it's not uncommon to want to optimize the upstream choice of container and need as little refactoring downstream as possible.
There is that, but you have to balance the convenience of not having to mechanically change method names (don't you have an IDE with a refactor command for that?^1 ) versus the benefit of having two different operations (add to a set versus append to a list) having different, self-explanatory names. ^1 I don't, so I'm not entirely unsympathetic to this argument. -- Steven

I think the real code sample I located is pretty typical. Look at that other post. On Sun, Oct 13, 2019, 9:31 PM Steven D'Aprano <steve@pearwood.info> wrote:
Let me see if I understand your workflow:
Well... "My workflow" is "Get a bunch of code that was written years before I was at my company, and enhance it or fix it without breaking everything else." I don't think my workflow is uncommon. :-) But in terms of dataflow, your characterization is accurate. 1. upstream: you build up a nested data structure, adding elements as needed
Yes. Obviously there are constraints and differences between different collections. But the example I found is both commonplace and allows us not to worry about more of that. If you have a collection of distinct strings, a bunch of different collections can hold them. `for thing in collection` works for various data types. Collections of numbers are similar. `sum(the_numbers)` is happy with anything iterable, for example. (don't you have an IDE with a refactor command for that?^1 )
^1 I don't, so I'm not entirely unsympathetic to this argument.
I really don't! It would be cool to have some keystrokes that identified everything that might be affected is I changed 'foo' from a list to a set... but it seems subtle. Everywhere foo might occur in call chains through every renaming as a formal parameter or otherwise. My hunch is this is simpler than solving the halting problem, but it doesn't feel easy. Maybe tools do it. Even getting CTAGS integrated with vim would help somewhat.

On Sun, Oct 13, 2019 at 09:53:41PM -0400, David Mertz wrote: [in response to my comment]
I'm not an expert, but I understand that "Change variable name" is a well-known, and solved, problem for refactoring tools. Certainly Martin Fowler's books act as if it's just a menu command away. If (generic) you change change variable names, it can change attribute names, and if it can do that, it can change method names too, so I expect this is a solved problem for people who have sufficiently advanced IDEs and/or refactoring tools. Bicyle Repairman hasn't had an update for 16 years, but it claims to be able to rename methods. https://pypi.org/project/bicyclerepair/ So all you need do is downgrade your work projects to Python 2.3, and your problems are all solved! *wink* -- Steven

On 13/10/2019 20:20, Steve Jorgensen wrote:
Failing fast is good, but it is also a very common case to want to add an item to a collection regardless of whether it is set-like or sequence-like.
I have to say that for "a very common case" it's something that I've never needed. Never. Seriously. -- Rhodri James *-* Kynesim Ltd

Failing fast is good, but it is also a very common case to want to add an item to a collection regardless of whether it is set-like or sequence-like. Does the fact that the `append` and `add` methods would still exist mitigate your objection? Call `append` when expecting a list, call `add` when expecting a set, or call `push` or use `<<` when either of those would be appropriate targets.

On Oct 13, 2019, at 12:20, Steve Jorgensen <stevej@stevej.name> wrote:
Failing fast is good, but it is also a very common case to want to add an item to a collection regardless of whether it is set-like or sequence-like.
Does the fact that the `append` and `add` methods would still exist mitigate your objection? Call `append` when expecting a list, call `add` when expecting a set, or call `push` or use `<<` when either of those would be appropriate targets.
Is there any reason this has to be a method? You can write a function that does this today: def push(collection, value): if isinstance(collection, collections.abc.MutableSet): collection.add(value) elif isinstance(collection, collections.abc.MutableSequence): collection.append(value) else: raise TypeError(f"Don’t know how to push to '{type(collection).__name__}' objects") This works with existing code today. And it allows you to customize it to handle not-quite-sequence types like a circular buffer that you found off PyPI without modifying those types. If you expect to do a lot of that, `singledispatch` might be better than explicit type-switching code. Or, if you have a lot of your own classes that you can modify that aren’t quite sequences but want to be pushable, you can even add your own protocol method that it checks for first.

It seems pretty trivial to create your own push utility function: APPEND = "append" ADD = "add" def push(pushable, value): try: (getattr(t:=type(pushable), APPEND, None) or getattr(t, ADD)) (pushable, value) except AttributeError: raise TypeError(f"{t.__qualname__} object has no attributes {APPEND} or {ADD}") The benefits of the core language keeping the usage of sets and lists utterly separate has been demonstrated over and over again. Bringing them together at the language feature level definitely seems like a mistake. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler On Sun, Oct 13, 2019 at 3:24 PM Steve Jorgensen <stevej@stevej.name> wrote:

IMO, the problem with the "you can create your own" solution is that this is such a common thing that it seems like it should simply be present rather than having a vast number of applications and libraries each implementing their own unique solutions.

I have definitely hit this difficulty many times. The part of the code that "just puts something in the collection" doesn't need to care conceptually about the kind of collection. Taking the data back out somewhere else more often needs to worry about order, efficiency, concurrency, etc. But as I see other comments, I think an external function makes more sense. Sure, list and set could grow a common method. But then what about deque or queue? What about the excellent third part sortedcontainers package? Or a Panda Series? Do all of these need to grow that same method? I think more flexible would be a callable that also provided a way to register how to "just put the item in the container." This could be expanded by users only as far as was useful to them. Sufficiently refined, I can imagine wanting such a thing in the collections module itself. So maybe an API like: just_add_it.register(set, set.add) just_add_it.register(list, list.append) just_add_it.register(Queue, Queue.put) just_add_it.register(MyCollection, lambda stuff, item: ...) # Later just_add_it(the_things, item) Then a user can decide where to weigh issues like sets not allowing mutable objects or queues being finite size. On Sun, Oct 13, 2019, 5:22 PM Steve Jorgensen <stevej@stevej.name> wrote:

I like the way you're thinking. In any case, I think there should be some standard pattern for this implemented in the standard library. Perhaps, in addition to the using registration, implement a `__push__` method convention, and then the generic function for pushing an item to a collection would look for `__push__`, `push`, `append`, or `add` and try the first of those that it finds on an object of a type that is not registered.

On Oct 13, 2019, at 14:50, Steve Jorgensen <stevej@stevej.name> wrote:
Perhaps, in addition to the using registration, implement a `__push__` method convention, and then the generic function for pushing an item to a collection would look for `__push__`, `push`, `append`, or `add` and try the first of those that it finds on an object of a type that is not registered.
This is _also_ trivial to do with singledispatch. Just change the base function to look up the protocol: @singledispatch def just_add_it(collection, value): try: collection.push(value) except AttributeError: raise TypeError(whatever string you want) There are a few different variations you can do with the lookup and error handling, depending on whether you want to exactly simulate normal special method lookup, or be a little more lenient. Any types you register overloads for will use those overloads; any other types will use the `push` method protocol if it exists, and it’ll raise whatever exception you want otherwise.

On Sun, Oct 13, 2019 at 05:42:21PM -0400, David Mertz wrote:
And that's why I am skeptical that there is "very common" code that doesn't care about whether it has a list or a set. We surely almost always will care when we're taking the data out again, because of ordering and efficiency etc. If you require a set for the data-output side, why would you accept a list for the data-input side when you won't be able to use it further on? And vice versa. In the absense of any concrete examples demonstrating that this is useful, I think this is an over-generalisation of little practical use. -- Steven

Steven D'Aprano wrote:
Having come from Ruby to Python, this seems like an odd statement to me. It always felt very natural that there is one way to add something to a simple collection, and the distinction between an "array" and a set is irrelevant in that regard. Of course, just because I'm used to that as a natural thing doesn't make it right for Python. Thinking about examples, the ones that immediately come to mind are mainly collectors — objects passed into a method to record things that will be examined afterward by the caller. In those cases, what I'm talking about could actually be handled (actually with more versatility) by having the called method expect a callable though, since it will only do one thing with the object that is passed in. The caller would then pass in `<some-set>.add`, `<some.list>.append` in that case (or whatever else is appropriate). That, of course, is simply a coding pattern and doesn't require any changes to the standard lib. Having said all of that, I have a feeling that there are categories of example that I'm not thinking of right now. Maybe others will chime in with such examples — or to agree there there are none (or extremely few).

On Mon, Oct 14, 2019 at 11:26 AM Steve Jorgensen <stevej@stevej.name> wrote:
Sets are somewhat like lists, but also somewhat like dictionaries. A set is a list that doesn't allow duplicates, or it's a dictionary where the values are irrelevant. If you think of a set as more list-like, then it's logical to "set.append(x)"; if you think of it as more dict-like, you should "set[x]=True". It's not quite either, which is why a generic "stick this thing in that collection" doesn't really exist. Though a set.__setitem__() method might be helpful here. If you set an item to any truthy value, it is added to the set; set it to a falsy value and it will be discarded (without error if it wasn't there, so assignment in this way is idempotent). That would make them more similar to collections.Counter, which can be seen as a multiset of sorts. ChrisA

On Mon, Oct 14, 2019 at 5:41 PM Serhiy Storchaka <storchaka@gmail.com> wrote:
It doesn't guarantee that if you do any set-like operations, though. c = Counter() c[1] = 0 c |= Counter() # a no-op if thinking set-wise assert 1 not in c So if we're talking analogies with sets, there's no inherent problem with setting-to-zero meaning "remove". ChrisA

On Oct 13, 2019, at 17:25, Steve Jorgensen <stevej@stevej.name> wrote:
Thinking about examples, the ones that immediately come to mind are mainly collectors — objects passed into a method to record things that will be examined afterward by the caller. In those cases, what I'm talking about could actually be handled (actually with more versatility) by having the called method expect a callable though, since it will only do one thing with the object that is passed in. The caller would then pass in `<some-set>.add`, `<some.list>.append` in that case (or whatever else is appropriate).
That, of course, is simply a coding pattern and doesn't require any changes to the standard lib.
I like the way C++ handles this, actually. The C++ notion of “Iterator” is a lot more complicated than the Python notion of “Iterator”, and usually, this is a huge pain, but occasionally, it allows for some nice things that you can’t do in Python. One of those things is that collectors don’t take a collection to add to, they take an “output Iterator”. The caller decides what to pass in. If you’ve got a vector (like a Python list), that’s usually a back inserter (which just appends to the end). If you’ve got an ordered set or a hash set, it’s usually a set inserter. If you’ve got a deque, it’s often a back inserter, but it might be a front inserter instead. If you’ve got a threaded queue, it’s a synchronized back inserter, or maybe a cooperatively-lock-free one. If you’ve got a double-linked list, it could be a back inserter, a front inserter, or even a splice inserter at a particular location. And so on. The burden to choose is on the caller of the collector, who knows what the right thing to do is for its use of its collection, rather than on the collector implementation, which is generic and stupid. Anyway, this doesn’t come up that often in Python, but when it does, passing a callable, as you suggest, is usually the best solution. So, I think you’ve just found some of those cases, and found the best solution to them relatively easily, without needing anything to be changed. Which implies that Python is fine as it is. :)

On Oct 13, 2019, at 14:42, David Mertz <mertz@gnosis.cx> wrote:
This is exactly what `singledispatch` already gives you: from functools import singledispatch @singledispatch def just_add_it(collection, value): raise TypeError(blah blah) Now you can explicitly register methods like this: just_add_it.register(set, set.add) just_add_it.register(list, list.append) just_add_it.register(Queue, Queue.put) And, while you can do the same thing with lambda, you can also write it as a decorator: @just_add_it.register(MyCollection) def _(stuff, item): # .... … which is more flexible and often more readable, especially for functions whose whole point is a side effect. And you can even use type annotations to make it even simpler: @just_add_it.register def _(stuff: MyCollection, item): # .... More generally, both you and Steve Jorgensen seem to be proposing a bunch of things that already exist. It’s worth taking the time to figure out what’s already there before suggesting changes.

On Sun, Oct 13, 2019, 6:12 PM Andrew Barnert
Sort of. I proposed the spelling as .register() because single dispatch was the obvious way to implement it. The `just_add_it()` function would be a new thing, whether in a personal toolkit or in the collections module. Hopefully with a better name than my placeholder. But yeah, given that your three line implementation is basically feature complete, probably there's no reason it needs to live in stdlib. My point was really just that a function could be more flexible than a method added to every collection type, or even than a protocol.

On Oct 13, 2019, at 15:25, David Mertz <mertz@gnosis.cx> wrote:
Sort of. I proposed the spelling as .register() because single dispatch was the obvious way to implement it.
Ah, sorry; I misread that as implying that you wanted to build something like singledispatch but less flexible, rather than just wanting to use it. Now that I read it more carefully… I agree 100%. :)

On Sun, Oct 13, 2019 at 07:20:29PM -0000, Steve Jorgensen wrote:
Is that correct though? To the best of my memory, I've never wanted to add an item to a list-or-set in 15 years, nor have I seen code in practice that does so. I don't think it is "very common", but I would like to see some examples of your code, and third-party libraries, which do this. The problem is that lists are sequences, and sets are not. Sequences are not very set-like, and sets are not very list-like. We even use different terminology to describe their contents: lists contain *items* while sets contain *elements*. So it isn't clear to me under what circumstances you would expect either one. I think that I will need to see some of these very common examples of wanting to add an element to a set-or-list before I can take this proposal too seriously. -- Steven

On 10/13/2019 5:57 PM, Steven D'Aprano wrote:
On a few occasions I have modified code that used lists to instead use sets. Generally I was porting very old code, before sets were built in, so lists had been used. However, sets were more appropriate when I refactoring. It would have saved me some hassle had lists and sets worked the same way as far as adding members. But I can't say it's very common. I don't think I've ever seen code that wanted to use duck-typing for adding an element/item to a container. I'm sure it exists, but probably is not very common. Eric

On Mon, Oct 14, 2019 at 9:09 AM Steven D'Aprano <steve@pearwood.info> wrote:
I don't often actually want to add to list-or-set, but I do sometimes have half-written code using one or the other, and then a future design constraint makes me rethink my choice of data type. Oh, now I have to change how I'm adding those items. It'd be helpful to have the API more consistent, but I don't actually need it to be polymorphic per se. ChrisA

OK, quick search reveals some code in my actual codebase that illustrates the concept. This isn't code I was actually looking at this week, but it is in one of the company repos. def form_to_kwarg(form): """Converts the form into the kwarg used in creates and updates. If the form isn't validated, returns an empty dictionary. """ if form.validate(): attrs_kwarg = {} keys = get_column_names() keys.remove('cost') keys.remove('id') keys.append('client_id') keys.append('ratelimit') for attr in keys: attrs_kwarg[attr] = getattr(form, attr).data attrs_kwarg['measurements'] = slugs_to_string(attrs_kwarg[ 'measurements']) return attrs_kwarg else: return None This put a certain thumb on the scale since I searched for .append() which will automatically only find lists. Or maybe it's a deque. Or something else. I don't actually know for sure (but I kinda think the programmers who did this wouldn't have used something more "exotic"). So who knows what `get_column_names()` gives me. I can see on the next line that we remove one of them. But that could be either a set or a list (or some other things). Then there are the .append()s that give it away... but what if those hadn't been there and I just discovered the need to "just_add_it()" to address the current issue? So then I loop through this 'keys' thing, whatever it is. That could be any iterable at that point, but in particular any collection. Indeed, there *is* a determinate answer to what kind of thing keys is. But while I'm looking at just this function, I might well not know which type that is... and usually I don't want to think about it. And maybe the implementation of `get_column_names()` will change later to use a different collection. I shouldn't need to worry about that. In this example, moreover, it sure looks like all the items inside the collection are immutable (strings), and it sure does suggest that they each occur once but that I don't care about the order (since I'm using those items as keys to a new dictionary, and duplicates would likely cause bugs). Btw, yes I also see that the docstring doesn't match the behavior :-). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Mon, Oct 14, 2019 at 1:34 AM David Mertz <mertz@gnosis.cx> wrote:
I guess the (Python) language is pretty consistent in this particular case that it calls the same things same and the different things different. E.g. `remove` and `pop` have the same meaning for a list or a set, but `append` does not. What would "generic" `add` defined for a list-or-set do exactly on the list? Append, insert, and if the latter, where exactly? In your code, if you do not care about the type and just need one common 'add' then you either are using sets (even if they are not Python sets, but lists, queues, etc. you are using them as sets), or you need to specify, whether this 'add' actually appends, or inserts and then it is no longer just a generic 'add'. Richard

On Oct 13, 2019, at 15:29, Chris Angelico <rosuav@gmail.com> wrote:
I sometimes run into that too, but usually it turns out to be a sign that what I really wanted was a generator function, and then passing the result of that function to the list or set constructor or to a list or set comprehension. Especially since it’s also usually around the time I realize that the 5-line loop I thought I had to write is 20 lines long and not even complete yet, so it’s calling out for refactoring into a function anyway. When for whatever reason it can’t be written that way, that’s often a sign that I need a collector function that takes an adder function, as the OP suggested later in the thread, rather than a function that just builds and returns a collection. The cases where it really is just a matter of wanting to change some too-small-to-refactor loop to build a set rather than a list, there’s probably only one append to change anyway. I’m sure that doesn’t cover 100% of all the times this comes up, but I think it covers so many of them that the need for a unified push function is very rare. If it had come up more than once or twice, I would have written that singledispatch push implementation long ago, but I’ve never thought of a need for it until this thread.

On Sun, Oct 13, 2019, 6:12 PM Steven D'Aprano:
Really?! I've seen it several times this week. Maybe we mean something slightly different though. My real world is looking at some function in an existing codebase that handles a nested data structure. At some leaf I can tell I'm handling a bunch of scalars by looking at the code (i.e. a loop or a reduction). Somewhere far upstream that data was added to the structure. And somewhere downstream we might do something sensitive to the data type. But right here, it's just "a bunch of stuff." I rarely see the "bunch" being genuinely polymorphic, but I also just don't want or need to think about which collection it is to edit this code.... Except I do if I want to "just add item". Moreover, it's not uncommon to want to optimize the upstream choice of container and need as little refactoring downstream as possible. But yes, set and list are not generically interchangeable, of course. Maybe the stuff really needs to have mutables. Maybe it needs to allow repeats. Refactoring isn't just drop in replacement, and one needs to think of such issues.

On Sun, Oct 13, 2019 at 06:41:40PM -0400, David Mertz wrote:
My real world is looking at some function in an existing codebase that handles a nested data structure.
That rules out sets, since sets aren't hashable and cannot be inserted into a set.
Let me see if I understand your workflow: 1. upstream: you build up a nested data structure, adding elements as needed 2. midstream: you treat the nested data structure as "a bunch of stuff" 3. downstream: you pull the nested data structure apart In the midstream, what are you doing to this bunch of stuff? That's the critical point. It seems to me that there's not very much you can do (apart from printing it) without knowing something about the structure. The upstream and the downstream will care about the structure, and will care about the distinction between sets and lists. Upstream needs to know that adding a set will fail, but appending a set will be fine: list.append(myset) # fine, sets can go into lists set.add(myset) # fails, because sets aren't hashable Giving list-or-set a common "push/add/append" method won't help you here, you still need to distinguish between pushing to the end of a list and pushing into an unordered set. If midstream is adding stuff, then it's not really midstream, its part of upstream. If midstream *isn't* adding stuff, then this proposed push/add/append common method is irrelevant to midstream.
Can you add a set? If you append an element, will you introduce a duplicate? You surely care about this, which means you're not really indifferent to whether "bunch-of-stuff" is a list or a set. I might have misunderstood your use-case, but so far I don't see that this is a convincing case for a common method. I will admit that there's a tiny bit of convenience: you can write "bunch.push()" without thinking about what you are pushing into (is it a set? a list? nested or flat? a custom class?), and whether it will succeed and whether it will mess up your data structure. So "convenience" is a benefit of this common method. But I think that argument from convenience is a weak argument. And a convenience function in your own module is probably sufficient, rather than burdening every Python user with having to learn another (redundant) method.
Moreover, it's not uncommon to want to optimize the upstream choice of container and need as little refactoring downstream as possible.
There is that, but you have to balance the convenience of not having to mechanically change method names (don't you have an IDE with a refactor command for that?^1 ) versus the benefit of having two different operations (add to a set versus append to a list) having different, self-explanatory names. ^1 I don't, so I'm not entirely unsympathetic to this argument. -- Steven

I think the real code sample I located is pretty typical. Look at that other post. On Sun, Oct 13, 2019, 9:31 PM Steven D'Aprano <steve@pearwood.info> wrote:
Let me see if I understand your workflow:
Well... "My workflow" is "Get a bunch of code that was written years before I was at my company, and enhance it or fix it without breaking everything else." I don't think my workflow is uncommon. :-) But in terms of dataflow, your characterization is accurate. 1. upstream: you build up a nested data structure, adding elements as needed
Yes. Obviously there are constraints and differences between different collections. But the example I found is both commonplace and allows us not to worry about more of that. If you have a collection of distinct strings, a bunch of different collections can hold them. `for thing in collection` works for various data types. Collections of numbers are similar. `sum(the_numbers)` is happy with anything iterable, for example. (don't you have an IDE with a refactor command for that?^1 )
^1 I don't, so I'm not entirely unsympathetic to this argument.
I really don't! It would be cool to have some keystrokes that identified everything that might be affected is I changed 'foo' from a list to a set... but it seems subtle. Everywhere foo might occur in call chains through every renaming as a formal parameter or otherwise. My hunch is this is simpler than solving the halting problem, but it doesn't feel easy. Maybe tools do it. Even getting CTAGS integrated with vim would help somewhat.

On Sun, Oct 13, 2019 at 09:53:41PM -0400, David Mertz wrote: [in response to my comment]
I'm not an expert, but I understand that "Change variable name" is a well-known, and solved, problem for refactoring tools. Certainly Martin Fowler's books act as if it's just a menu command away. If (generic) you change change variable names, it can change attribute names, and if it can do that, it can change method names too, so I expect this is a solved problem for people who have sufficiently advanced IDEs and/or refactoring tools. Bicyle Repairman hasn't had an update for 16 years, but it claims to be able to rename methods. https://pypi.org/project/bicyclerepair/ So all you need do is downgrade your work projects to Python 2.3, and your problems are all solved! *wink* -- Steven

On 13/10/2019 20:20, Steve Jorgensen wrote:
Failing fast is good, but it is also a very common case to want to add an item to a collection regardless of whether it is set-like or sequence-like.
I have to say that for "a very common case" it's something that I've never needed. Never. Seriously. -- Rhodri James *-* Kynesim Ltd
participants (12)
-
Anders Hovmöller
-
Andrew Barnert
-
Chris Angelico
-
David Mertz
-
Eric V. Smith
-
Greg Ewing
-
Rhodri James
-
Richard Musil
-
Ricky Teachey
-
Serhiy Storchaka
-
Steve Jorgensen
-
Steven D'Aprano