Argumenting in favor of first()

I just found this code: def get_product_item(jsonld_items): for item in jsonld_items: if item['@type'] == 'Product': return item else: return {} My argument is that the intent is clearer in: def get_product_item(jsonld_items): return first((item for item in jsonld_items if item['@type'] == 'Product'), {}) As a reminder, first()'s definition in Python is: def first(seq, default=None): return next(iter(seq), default=default) It could be optimized (implemented in C) if it makes it into the stdlib. -- Juancarlo *Añez*

On 12/05/2019 01:40 PM, Juancarlo Añez wrote:
I haven't followed the main thread, but I can unequivocally state that for-loop version is easier to read. The only thing missing to make the intent crystal clear is one more word in the function name: def get_first_product_item(...): Also, the for-loop version quits the moment it finds a Product type, while the `first` version has to first process the entire jsonld_items structure. -- ~Ethan~

"Also, the for-loop version quits the moment it finds a Product type, while the `first` version has to first process the entire jsonld_items structure." The first version doesn't have to process the whole structure; it's written with a generator expression, so it only tests and produces values on demand, and next stops demanding them as soon as it gets a single result. On Thu, Dec 5, 2019 at 10:13 PM Ethan Furman <ethan@stoneleaf.us> wrote:

On 12/05/2019 03:11 PM, Josh Rosenberg wrote:
"Also, the for-loop version quits the moment it finds a Product type, while the `first` version has to first process the entire jsonld_items structure."
The first version doesn't have to process the whole structure; it's written with a generator expression, so it only tests and produces values on demand, and next stops demanding them as soon as it gets a single result.
Ah, thanks. My genexp foo is weak. :( -- ~Ethan~

06.12.19 00:09, Ethan Furman пише:
I haven't followed the main thread, but I can unequivocally state that for-loop version is easier to read.
I completely agree.
It is likely there is only one dict with the key '@type' equal to 'Product' in the jsonld_items list. The structure of the data can be validated before or we just trust the source. So the name get_product_item() is correct. The rest is an implementation detail.

On Dec 5, 2019, at 13:43, Juancarlo Añez <apalala@gmail.com> wrote:
There’s already a first_true in the recipes that does the filtering and firsting together. Of course there’s the usual issue about filter vs. genexpr when your predicate is naturally an in-line expression rather than a function, but otherwise: from more_itertools import first_true def get_product_item(jsonId_items): return first_true(jsonId_items, lambda item: item['@type'] == 'Product', {}) Or even: get_product_item = partial(first_true, pred=lambda item: item['@type'] == 'Product', default={}) I think in this case, because of the expression-vs.-lambda issue, I’d write it with first and a genexpr if we had both. But a lot of itertoolsy code is full of lambdas and partials like this (to call filter and map, use as groupby and unique keys, etc.), and often the condition is something you’ve already wrapped up as a function in the first place, so I’m not sure how generally that applies without more examples.
I don’t think it needs to be optimized. Most itertoolsy things that can be built by composing existing functions (like most of the recipes) are fast enough in Python; it’s the stuff that needs to loop and yield (like most of the stuff actually in the module) that gets a major improvement in C. It’s worth measuring rather than guessing, but my guess would be that the same applies here. The only problem is that right now, the entire module is in C, so anything that’s not optimized has to be a recipe and vice-versa. I’m pretty sure the idea of splitting itertools into a Python module with a C accelerator (like a lot of other modules in the stdlib) has come up before, but either there’s never a good enough use case, or just nobody volunteers to do it. And first might well be the candidate that’s worth it. Then again, maybe it is worth optimizing. Or maybe it’s fine to just list it as a recipe (the recipes docs already link to more_itertools, which already has it).

I have encountered plenty of uses for first(), usually the argument is a list or even a dict or set that’s already computed — for sets, it’s common that it’s known there’s only one element, the question is how to get that element. On Thu, Dec 5, 2019 at 14:28 Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
-- --Guido (mobile)

On Thu, Dec 05, 2019 at 05:40:05PM -0400, Juancarlo Añez wrote:
I'm sorry, I can't tell what that is supposed to do. Is the "return {}" supposed to be inside the loop? If so, it has been accidentally dedented. Is it meant to be outside the loop? The "for-else" is redundent, since there is no break. for item in jsonld_items: if item['@type'] == 'Product': return item else: # Even this "else" is redundent too. return {} for item in jsonld_items: if item['@type'] == 'Product': return item # else is unnecessary here return {} The name says it returns an item, but the default is to return an empty dict, which seems like an unusual choice for the "default product where no product is specified". I would have guessed None if the product is missing.
That's certainly an improvement, but the helper function "first" is redundant. You can just write: return next((item for item in jsonld_items if item['@type'] == 'Product'), default={}) What's the benefit of adding a new builtin which is essentially just a thin do-almost-nothing wrapper around `next`? Have I missed something? -- Steven

On Fri, Dec 6, 2019, 12:47 AM Steven D'Aprano <steve@pearwood.info> wrote:
"return", like "break", causes the "else" suite to be skipped. https://docs.python.org/3/reference/compound_stmts.html#the-for-statement does not clearly specify this; it only says that the else suite is executed when the iterator is exhausted or empty, and that "break" skips it. Perhaps a sentence should be added to clearly and unambiguously state that "return" skips it also?

On Thu, Dec 5, 2019 at 22:08 Jonathan Goble <jcgoble3@gmail.com> wrote:
No, that follows from the semantics of return. (Same for raise.) Break is mentioned specifically because it transfers control to the code after the loop, just like exhausting the iterable. The difference between exhaustion and break is the reason the else clause (on loops) exists -- so we can do something only when no break is taken. (Only a finally clause can intercept return.)
-- --Guido (mobile)

06.12.19 07:38, Steven D'Aprano пише:
In real code this can be a fragment of the larger function and look like: for item in jsonld_items: if item['@type'] == 'Product': break else: item = {}
I would return None, but I do not know the context. It is obvious that jsonld_items is a list of dicts, so returning an empty dict can be more reasonable in the particular code. At least the result always has the same type.

On Fri, Dec 06, 2019 at 10:24:30AM +0200, Serhiy Storchaka wrote:
Sure, but in this case, it isn't a fragment of a larger function, and that's not what it looks like. If it looked like what you wrote, I would understand it. But it doesn't, so I didn't really understand what it was supposed to do, until I read the equivalent version using first/next. One unfortunate side-effect of the choice of "else" as keyword in `for-else` is that if the last statement in the `for` block is an `if`, it looks like the `else` is wrongly indented. -- Steven

On 12/05/2019 01:40 PM, Juancarlo Añez wrote:
I haven't followed the main thread, but I can unequivocally state that for-loop version is easier to read. The only thing missing to make the intent crystal clear is one more word in the function name: def get_first_product_item(...): Also, the for-loop version quits the moment it finds a Product type, while the `first` version has to first process the entire jsonld_items structure. -- ~Ethan~

"Also, the for-loop version quits the moment it finds a Product type, while the `first` version has to first process the entire jsonld_items structure." The first version doesn't have to process the whole structure; it's written with a generator expression, so it only tests and produces values on demand, and next stops demanding them as soon as it gets a single result. On Thu, Dec 5, 2019 at 10:13 PM Ethan Furman <ethan@stoneleaf.us> wrote:

On 12/05/2019 03:11 PM, Josh Rosenberg wrote:
"Also, the for-loop version quits the moment it finds a Product type, while the `first` version has to first process the entire jsonld_items structure."
The first version doesn't have to process the whole structure; it's written with a generator expression, so it only tests and produces values on demand, and next stops demanding them as soon as it gets a single result.
Ah, thanks. My genexp foo is weak. :( -- ~Ethan~

06.12.19 00:09, Ethan Furman пише:
I haven't followed the main thread, but I can unequivocally state that for-loop version is easier to read.
I completely agree.
It is likely there is only one dict with the key '@type' equal to 'Product' in the jsonld_items list. The structure of the data can be validated before or we just trust the source. So the name get_product_item() is correct. The rest is an implementation detail.

On Dec 5, 2019, at 13:43, Juancarlo Añez <apalala@gmail.com> wrote:
There’s already a first_true in the recipes that does the filtering and firsting together. Of course there’s the usual issue about filter vs. genexpr when your predicate is naturally an in-line expression rather than a function, but otherwise: from more_itertools import first_true def get_product_item(jsonId_items): return first_true(jsonId_items, lambda item: item['@type'] == 'Product', {}) Or even: get_product_item = partial(first_true, pred=lambda item: item['@type'] == 'Product', default={}) I think in this case, because of the expression-vs.-lambda issue, I’d write it with first and a genexpr if we had both. But a lot of itertoolsy code is full of lambdas and partials like this (to call filter and map, use as groupby and unique keys, etc.), and often the condition is something you’ve already wrapped up as a function in the first place, so I’m not sure how generally that applies without more examples.
I don’t think it needs to be optimized. Most itertoolsy things that can be built by composing existing functions (like most of the recipes) are fast enough in Python; it’s the stuff that needs to loop and yield (like most of the stuff actually in the module) that gets a major improvement in C. It’s worth measuring rather than guessing, but my guess would be that the same applies here. The only problem is that right now, the entire module is in C, so anything that’s not optimized has to be a recipe and vice-versa. I’m pretty sure the idea of splitting itertools into a Python module with a C accelerator (like a lot of other modules in the stdlib) has come up before, but either there’s never a good enough use case, or just nobody volunteers to do it. And first might well be the candidate that’s worth it. Then again, maybe it is worth optimizing. Or maybe it’s fine to just list it as a recipe (the recipes docs already link to more_itertools, which already has it).

I have encountered plenty of uses for first(), usually the argument is a list or even a dict or set that’s already computed — for sets, it’s common that it’s known there’s only one element, the question is how to get that element. On Thu, Dec 5, 2019 at 14:28 Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
-- --Guido (mobile)

On Thu, Dec 05, 2019 at 05:40:05PM -0400, Juancarlo Añez wrote:
I'm sorry, I can't tell what that is supposed to do. Is the "return {}" supposed to be inside the loop? If so, it has been accidentally dedented. Is it meant to be outside the loop? The "for-else" is redundent, since there is no break. for item in jsonld_items: if item['@type'] == 'Product': return item else: # Even this "else" is redundent too. return {} for item in jsonld_items: if item['@type'] == 'Product': return item # else is unnecessary here return {} The name says it returns an item, but the default is to return an empty dict, which seems like an unusual choice for the "default product where no product is specified". I would have guessed None if the product is missing.
That's certainly an improvement, but the helper function "first" is redundant. You can just write: return next((item for item in jsonld_items if item['@type'] == 'Product'), default={}) What's the benefit of adding a new builtin which is essentially just a thin do-almost-nothing wrapper around `next`? Have I missed something? -- Steven

On Fri, Dec 6, 2019, 12:47 AM Steven D'Aprano <steve@pearwood.info> wrote:
"return", like "break", causes the "else" suite to be skipped. https://docs.python.org/3/reference/compound_stmts.html#the-for-statement does not clearly specify this; it only says that the else suite is executed when the iterator is exhausted or empty, and that "break" skips it. Perhaps a sentence should be added to clearly and unambiguously state that "return" skips it also?

On Thu, Dec 5, 2019 at 22:08 Jonathan Goble <jcgoble3@gmail.com> wrote:
No, that follows from the semantics of return. (Same for raise.) Break is mentioned specifically because it transfers control to the code after the loop, just like exhausting the iterable. The difference between exhaustion and break is the reason the else clause (on loops) exists -- so we can do something only when no break is taken. (Only a finally clause can intercept return.)
-- --Guido (mobile)

06.12.19 07:38, Steven D'Aprano пише:
In real code this can be a fragment of the larger function and look like: for item in jsonld_items: if item['@type'] == 'Product': break else: item = {}
I would return None, but I do not know the context. It is obvious that jsonld_items is a list of dicts, so returning an empty dict can be more reasonable in the particular code. At least the result always has the same type.

On Fri, Dec 06, 2019 at 10:24:30AM +0200, Serhiy Storchaka wrote:
Sure, but in this case, it isn't a fragment of a larger function, and that's not what it looks like. If it looked like what you wrote, I would understand it. But it doesn't, so I didn't really understand what it was supposed to do, until I read the equivalent version using first/next. One unfortunate side-effect of the choice of "else" as keyword in `for-else` is that if the last statement in the `for` block is an `if`, it looks like the `else` is wrongly indented. -- Steven
participants (8)
-
Andrew Barnert
-
Ethan Furman
-
Guido van Rossum
-
Jonathan Goble
-
Josh Rosenberg
-
Juancarlo Añez
-
Serhiy Storchaka
-
Steven D'Aprano