Enhance list.index with a comparison function

Have a look at this short thread on discuss: https://discuss.python.org/t/searching-array-of-data-objects/4251/1 After I answered that question, it dawned on me that I have probably written something like that loop, or variations of it, a thousand times: for obj in somelist: if comparison(obj, needle): do_something(obj) break and now it's a thousand and one times *wink* So here's a thought... suppose we gave list.index an optional keyword- only comparison function? The typical name for those is "pred", as in predicate. Then the above becomes: try: i = somelist.index(needle, pred=comparison) except ValueError: pass else: do_something(somelist[i]) If you know for sure the search item exists, then you can drop the try...except...else: do_something(somelist[somelist.index(needle, pred=comparison)]) If `pred` is missing or None, a standard equality search is performed. Together with the operator module, this would allow us to perform common searches at C speed: # find the first item less than needle somelist.index(needle, pred=operator.lt) We can search a sublist efficiently: # copies the sublist :-( for obj in somelist[25:95354]: ... # save memory by inefficiently walking the entire list :-( for i, obj in enumerate(somelist): if i < 25: continue if i >= 95354: break ... # avoids both wasting memory and wasting time :-) i = somelist.index(needle, 25, 95354, pred=comparison) What do you think? -- Steven

After I answered that question, it dawned on me that I have probably
written something like that loop, or variations of it, a thousand times:
Why not just this (by object, not by its index, but that seems simpler):
My style works on general iterables, including infinite ones (that hopefully eventually have something fulfilling pred()). -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On Fri, May 22, 2020 at 08:11:16PM -0400, David Mertz wrote:
Sure, that works too. But have you ever written it for real? I haven't. And having seen it, I'll probably forget all about it within an hour. People who think in functional programming terms will probably love the `next(filter(...))` idiom, but not everyone thinks or likes functional programming idioms, and there are many people who would reject a patch containing that idiom. Here are some difficulties with the functional version: 1. It requires teaching people about iterators and `next` first. 2. If you want to operate on a sublist, you have to teach them about slicing, so you can introduce itertools.islice, rather than just say "give the start and end positions as arguments". 3. If you need the index instead of the value, using filter becomes downwrite obfuscated: next(filter(lambda t: pred(t[1]), enumerate(iterable)))[0] so it becomes a question of having horses for courses. We have a simple answer to a simple question: "How do I search a list?" "Use list.index" With this proposal, we get: "How do I search a list by some key?" "Use list.index with a key function" -- Steven

On Fri, May 22, 2020 at 8:59 PM Steven D'Aprano <steve@pearwood.info> wrote:
Yes, I've written that, or something very similar for real. More than once even. But I do think in functional terms fairly easily. However, even if you don't do it that way, changing the signature on method on `list` specifically feels really clunky. What if I want to do something to the first item in a tuple that meets a condition? Or the first from a deque. Or from a set. Or even the first matching key from a dictionary. Or the first match from some third-party collection? Even if next(filter(...)) isn't how you think, an imperatively written function can be much more generic. For example (untested): def process_first(it, pred=lambda _: True, func=lambda _: _, sentinel=None): for item in it: if pred(item): return func(item) return sentinel Just stick that somewhere, and it does everything you want and more. I use a sentinel rather than raise an exception if nothing matches. That feels much more natural to me, but you can change that if such makes more sense to you. Likewise, if you want to return the "index", it's a simple enough change (but what is the index of an unsized iterable? I mean, of course you can enumerate() or i+=1... but it might not really be an index. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

for obj in somelist:
I wouldn't reject it, but I don't love the functional idioms (well, I don't love the functional functions (map, filter, vs comprehensions) -- I'd be inclined to go the comprehension way -- which woukld only require understanding next() if you only wanted the first one, but I suspect most of the time you don't only want the first anyway. 1. It requires teaching people about iterators and `next` first.
Interesting -- in other recent threads, Ive felt that those of us that thought "iterators and `next" were relatively advanced concepts that newbies didn't need to learn were dismissed ... But in this case: next(i for i in a_list if pred(i)) doesn't really require the full concept of iterators and iterables -- it requires comprehension syntax, and a quick "next() gives you the next item in an 'iterator' and an 'iterator' is something you can put in a for loop." -- yes, that leaves out all kinds of important details, but gets the job done for now. In fact, in my intro class, I've used "how to get the first item from a thing" as my first introduction to next() :-) 2. If you want to operate on a sublist, you have to teach them about
slicing, so you can introduce itertools.islice, rather than just say "give the start and end positions as arguments".
hmm -- a great reason to provide an easily accessible "lazy" slice -- like the sequence view proposal recently discussed (and waiting for me to flesh out ...)
indeed. but how often DO people need the index? I suspect many uses of .index() are used to then get the object anyway. And I'm not sure it's THAT obfuscated -- we teach folks to use enumerate() pretty early as a way to avoid explicitly looping through indexes. We have a simple answer to a simple question:
Interestingly, this seems to contradict API design-wise, what's been pushed on recent threads: * Rather than adding a keyword argument that changes behavior, we should add another function to itertools (or the like) and * rather than add functionality to a built-in (or ABC), we should add utility functions that can act on many related types Granted, this is not a flag (neither boolean, ternary or mode-switching), and it's not a new method, but the API principles may still apply. In particular, a function in itertools would have a lot more utility for various iterables, rather than just lists (or the /sequence ABC?) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Christopher Barker writes:
I for one don't *dismiss* that idea iterators and next are advanced, but I wonder if there aren't halfway explanations for *many* of the issues that come up that
doesn't really require the full concept of iterators and iterables
I also believe that *some* of the halfway explanations are useful to new Pythonistas, and don't (necesssarily) leave them with misconceptions, such David's implementation of "first_one_like_this" and Andrew's explanation of files as streams ("they know where they're going to read next"). (Note: AFAICS the only definition of iterable you need until you've introduced ABCs and/or protocols is "you can usefully put iterables, and nothing that isn't iterable, in the 'in' clause of a 'for' statement". This is much simpler than explaining reasonably completely what an iterator is and why you might care. It's obvious why you care about iterables, of course.)
This is a matter of taste. I subscribe to it; I thought Guido did too, but he says that in this case at least he likes the flag.
This is a principle some subscribe to, yes, but it doesn't apply in the zip case, since zip already acts on an tuple of arbitrary iterables. It does apply to ABCs, but it's most persuasive when a highly abstract ABC already has all the prerequisites for the function to work. For example, iter() has an obvious implementation for any sequence, it's basically just def iterseq(seq): def wrapper(): for i in range(len(seq)): yield seq[i] return wrapper() and it works because all sequences already have __len__ and __getitem__. Then for *unordered* containers like sets and dicts, you can add a protocol (__iter__).

On Fri, May 22, 2020 at 06:37:11PM -0700, Ethan Furman wrote:
Good question! Thank you for pushing me to think about this a little harder. I read David's example as "first item that is divisible by 5" so the predicate would be rubbish: # I can never remember if the needle comes first or second... somelist.index(0, pred=lambda a, b: a%5 == b) somelist.index(0, pred=lambda a, b: b%5 == a) Yuck. So I think I have the wrong mental model here. Going back to the thread on discuss that inspired the question, I think a *key function* is probably better: for index, blwr in enumerate(blowers): if blwr.id_ == value: print(index) break # becomes blowers.index(value, key=operator.attrgetter('id_')) # or if you prefer blowers.index(value, key=lambda obj: obj.id_) although I expect the attrgetter version may be faster, for those who care about such things. With a key function, David's example becomes: somelist.index(0, key=lambda n: n%5) Here are a few more possibilities. # number not divisible by 5 somelist.index(True, key=lambda x: bool(x%5)) # case-insensitive search somelist.index('spam', key=str.casefold) # number less than 0 somelist.index(True, key=lambda x: x<0) # item of length exactly 2 somelist.index(2, key=len) # identity search somelist.index(True, key=lambda obj: obj is someobj) # find a non-NAN somelist.index(False, key=math.isnan) # find a particular user somelist.index('guido', key=lambda user: user.name) # instance of a class somelist.index(MyClass, key=type) # doesn't match subclasses # does match subclasses somelist.index(True, key=lambda obj: isinstance(obj, MyClass)) # z-coordinate of a 3D point equals -7 somelist.index(-7, key=lambda point: point[2]) # sum of values in the item equals 99 somelist.index(99, key=sum) # sum of sales for the month is greater than 1000 somelist.index(True, key=lambda x: sum(x.sales() > 1000)) Obviously not all such key functions are that simple and you may need to write a helper function, but the same applies to filter. Some of these examples are a little artificial, in that you might want *all* of the items that match, not just the first. In that case, a filter function is probably better. Or you could loop, adjusting the start index each time. But if you do want only the first matching value, or if you know that there is only one such value, then a key function is more obvious than calling next and filter. -- Steven

On Sat, May 23, 2020, 12:26 AM Steven D'Aprano
Obviously not all such key functions are that simple and you may need to write a helper function, but the same applies to filter.
I like the key function much better than the predicate. In large part that's because as soon as you say predicate, I think filter. Particularly if I care about laziness in not looking through extra items. If you wanted everything matching a predicate, using a comprehension just seems more obvious. It could be lazy with a generator comprehension, not them you need next() to get the first. Key= has the obvious parallel with sorted() and friends. Even then, a function similar to sorted() feels more general than a method on lists only. I.e. index_of(needle, haystack, key=func) I'm not sure where that would live exactly, maybe itertools. I think you were on my side in believing an import from standard library is not a huge burden. I think if such a function existed, I'd want another argument to match n != 1 results. But then it gets ugly because one index is an integer, and multiple indices are some sort of collection. So maybe one is best...

On Sat, May 23, 2020, 10:54 AM Rob Cliffe via Python-ideas
The times it doesn't sound like a list comprehension is when you have a million items in the list, 100k of which match the predicate, but you only care about the first one... Or even just the first ten. Still, generator comprehension are great. And next() is an excellent function.

On Saturday, May 23, 2020, at 11:02 -0400, David Mertz wrote:
Still, generator comprehension are great. And next() is an excellent function.
Agreed, on both counts. I often end up needing an arbitrary element of a set (or the only element of a single-element set), and next(iter(set)) scratches that itch well.

After I answered that question, it dawned on me that I have probably
written something like that loop, or variations of it, a thousand times:
Why not just this (by object, not by its index, but that seems simpler):
My style works on general iterables, including infinite ones (that hopefully eventually have something fulfilling pred()). -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On Fri, May 22, 2020 at 08:11:16PM -0400, David Mertz wrote:
Sure, that works too. But have you ever written it for real? I haven't. And having seen it, I'll probably forget all about it within an hour. People who think in functional programming terms will probably love the `next(filter(...))` idiom, but not everyone thinks or likes functional programming idioms, and there are many people who would reject a patch containing that idiom. Here are some difficulties with the functional version: 1. It requires teaching people about iterators and `next` first. 2. If you want to operate on a sublist, you have to teach them about slicing, so you can introduce itertools.islice, rather than just say "give the start and end positions as arguments". 3. If you need the index instead of the value, using filter becomes downwrite obfuscated: next(filter(lambda t: pred(t[1]), enumerate(iterable)))[0] so it becomes a question of having horses for courses. We have a simple answer to a simple question: "How do I search a list?" "Use list.index" With this proposal, we get: "How do I search a list by some key?" "Use list.index with a key function" -- Steven

On Fri, May 22, 2020 at 8:59 PM Steven D'Aprano <steve@pearwood.info> wrote:
Yes, I've written that, or something very similar for real. More than once even. But I do think in functional terms fairly easily. However, even if you don't do it that way, changing the signature on method on `list` specifically feels really clunky. What if I want to do something to the first item in a tuple that meets a condition? Or the first from a deque. Or from a set. Or even the first matching key from a dictionary. Or the first match from some third-party collection? Even if next(filter(...)) isn't how you think, an imperatively written function can be much more generic. For example (untested): def process_first(it, pred=lambda _: True, func=lambda _: _, sentinel=None): for item in it: if pred(item): return func(item) return sentinel Just stick that somewhere, and it does everything you want and more. I use a sentinel rather than raise an exception if nothing matches. That feels much more natural to me, but you can change that if such makes more sense to you. Likewise, if you want to return the "index", it's a simple enough change (but what is the index of an unsized iterable? I mean, of course you can enumerate() or i+=1... but it might not really be an index. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

for obj in somelist:
I wouldn't reject it, but I don't love the functional idioms (well, I don't love the functional functions (map, filter, vs comprehensions) -- I'd be inclined to go the comprehension way -- which woukld only require understanding next() if you only wanted the first one, but I suspect most of the time you don't only want the first anyway. 1. It requires teaching people about iterators and `next` first.
Interesting -- in other recent threads, Ive felt that those of us that thought "iterators and `next" were relatively advanced concepts that newbies didn't need to learn were dismissed ... But in this case: next(i for i in a_list if pred(i)) doesn't really require the full concept of iterators and iterables -- it requires comprehension syntax, and a quick "next() gives you the next item in an 'iterator' and an 'iterator' is something you can put in a for loop." -- yes, that leaves out all kinds of important details, but gets the job done for now. In fact, in my intro class, I've used "how to get the first item from a thing" as my first introduction to next() :-) 2. If you want to operate on a sublist, you have to teach them about
slicing, so you can introduce itertools.islice, rather than just say "give the start and end positions as arguments".
hmm -- a great reason to provide an easily accessible "lazy" slice -- like the sequence view proposal recently discussed (and waiting for me to flesh out ...)
indeed. but how often DO people need the index? I suspect many uses of .index() are used to then get the object anyway. And I'm not sure it's THAT obfuscated -- we teach folks to use enumerate() pretty early as a way to avoid explicitly looping through indexes. We have a simple answer to a simple question:
Interestingly, this seems to contradict API design-wise, what's been pushed on recent threads: * Rather than adding a keyword argument that changes behavior, we should add another function to itertools (or the like) and * rather than add functionality to a built-in (or ABC), we should add utility functions that can act on many related types Granted, this is not a flag (neither boolean, ternary or mode-switching), and it's not a new method, but the API principles may still apply. In particular, a function in itertools would have a lot more utility for various iterables, rather than just lists (or the /sequence ABC?) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Christopher Barker writes:
I for one don't *dismiss* that idea iterators and next are advanced, but I wonder if there aren't halfway explanations for *many* of the issues that come up that
doesn't really require the full concept of iterators and iterables
I also believe that *some* of the halfway explanations are useful to new Pythonistas, and don't (necesssarily) leave them with misconceptions, such David's implementation of "first_one_like_this" and Andrew's explanation of files as streams ("they know where they're going to read next"). (Note: AFAICS the only definition of iterable you need until you've introduced ABCs and/or protocols is "you can usefully put iterables, and nothing that isn't iterable, in the 'in' clause of a 'for' statement". This is much simpler than explaining reasonably completely what an iterator is and why you might care. It's obvious why you care about iterables, of course.)
This is a matter of taste. I subscribe to it; I thought Guido did too, but he says that in this case at least he likes the flag.
This is a principle some subscribe to, yes, but it doesn't apply in the zip case, since zip already acts on an tuple of arbitrary iterables. It does apply to ABCs, but it's most persuasive when a highly abstract ABC already has all the prerequisites for the function to work. For example, iter() has an obvious implementation for any sequence, it's basically just def iterseq(seq): def wrapper(): for i in range(len(seq)): yield seq[i] return wrapper() and it works because all sequences already have __len__ and __getitem__. Then for *unordered* containers like sets and dicts, you can add a protocol (__iter__).

On Fri, May 22, 2020 at 06:37:11PM -0700, Ethan Furman wrote:
Good question! Thank you for pushing me to think about this a little harder. I read David's example as "first item that is divisible by 5" so the predicate would be rubbish: # I can never remember if the needle comes first or second... somelist.index(0, pred=lambda a, b: a%5 == b) somelist.index(0, pred=lambda a, b: b%5 == a) Yuck. So I think I have the wrong mental model here. Going back to the thread on discuss that inspired the question, I think a *key function* is probably better: for index, blwr in enumerate(blowers): if blwr.id_ == value: print(index) break # becomes blowers.index(value, key=operator.attrgetter('id_')) # or if you prefer blowers.index(value, key=lambda obj: obj.id_) although I expect the attrgetter version may be faster, for those who care about such things. With a key function, David's example becomes: somelist.index(0, key=lambda n: n%5) Here are a few more possibilities. # number not divisible by 5 somelist.index(True, key=lambda x: bool(x%5)) # case-insensitive search somelist.index('spam', key=str.casefold) # number less than 0 somelist.index(True, key=lambda x: x<0) # item of length exactly 2 somelist.index(2, key=len) # identity search somelist.index(True, key=lambda obj: obj is someobj) # find a non-NAN somelist.index(False, key=math.isnan) # find a particular user somelist.index('guido', key=lambda user: user.name) # instance of a class somelist.index(MyClass, key=type) # doesn't match subclasses # does match subclasses somelist.index(True, key=lambda obj: isinstance(obj, MyClass)) # z-coordinate of a 3D point equals -7 somelist.index(-7, key=lambda point: point[2]) # sum of values in the item equals 99 somelist.index(99, key=sum) # sum of sales for the month is greater than 1000 somelist.index(True, key=lambda x: sum(x.sales() > 1000)) Obviously not all such key functions are that simple and you may need to write a helper function, but the same applies to filter. Some of these examples are a little artificial, in that you might want *all* of the items that match, not just the first. In that case, a filter function is probably better. Or you could loop, adjusting the start index each time. But if you do want only the first matching value, or if you know that there is only one such value, then a key function is more obvious than calling next and filter. -- Steven

On Sat, May 23, 2020, 12:26 AM Steven D'Aprano
Obviously not all such key functions are that simple and you may need to write a helper function, but the same applies to filter.
I like the key function much better than the predicate. In large part that's because as soon as you say predicate, I think filter. Particularly if I care about laziness in not looking through extra items. If you wanted everything matching a predicate, using a comprehension just seems more obvious. It could be lazy with a generator comprehension, not them you need next() to get the first. Key= has the obvious parallel with sorted() and friends. Even then, a function similar to sorted() feels more general than a method on lists only. I.e. index_of(needle, haystack, key=func) I'm not sure where that would live exactly, maybe itertools. I think you were on my side in believing an import from standard library is not a huge burden. I think if such a function existed, I'd want another argument to match n != 1 results. But then it gets ugly because one index is an integer, and multiple indices are some sort of collection. So maybe one is best...

On Sat, May 23, 2020, 10:54 AM Rob Cliffe via Python-ideas
The times it doesn't sound like a list comprehension is when you have a million items in the list, 100k of which match the predicate, but you only care about the first one... Or even just the first ten. Still, generator comprehension are great. And next() is an excellent function.

On Saturday, May 23, 2020, at 11:02 -0400, David Mertz wrote:
Still, generator comprehension are great. And next() is an excellent function.
Agreed, on both counts. I often end up needing an arbitrary element of a set (or the only element of a single-element set), and next(iter(set)) scratches that itch well.
participants (7)
-
Christopher Barker
-
Dan Sommers
-
David Mertz
-
Ethan Furman
-
Rob Cliffe
-
Stephen J. Turnbull
-
Steven D'Aprano