
[Peter O'Connor <peter.ed.oconnor@gmail.com>]
Ok, so it seems everyone's happy with adding an initial_value argument.
Heh - that's not clear to me ;-)
Now, I claim that while it should be an option, the initial value should NOT be returned by default. (i.e. the returned generator should by default yield N elements, not N+1).
-1 on that. - It goes against prior art. Haskell's scanl does return the initial value, and nobody on the planet has devoted more quality thought to how streams "should work" than those folks. The Python itertoolz package's accumulate already supports an optional `initial=` argument, and also returns it when specified. It requires truly compelling arguments to go against prior art. - It's "obvious" that the first value should be returned if specified. The best evidence: in this thread's first message, it was "so obvious" to Raymond that the implementation he suggested did exactly that. I doubt it even occurred to him to question whether it should. It didn't occur to me either, but my mind is arguably "polluted" by significant prior exposure to functional languages. - In all but one "real life" example so far (including the slice-summer class I stumbled into today), the code _wanted_ the initial value to be returned. The sole exception was one of the three instances in Will Ness's wheel sieve code, where he discarded the unwanted (in that specific case) initial value via a plain next(wheel) Which is telling: it's easy to discard a value you don't want, but to inject a value you _do_ want but don't get requires something like reintroducing the chain([value_i_want], the_iterable_that_didn't_give_the_value_i_want) trick the new optional argument is trying to get _away_ from. Talk about ironic ;-) I would like to see a simple thing added to itertools to make dropping unwanted values easier, though: """ drop(iterable, n=None) Return an iterator whose next() method returns all but the first `n` values from the iterable. If specified, `n` must be an integer >= 0. By default (`n`=None), the iterator is run to exhaustion. """ Then, e.g., - drop(it, 0) would effectively be a synonym for iter(it). - drop((it, 1) would skip over the first value from the iterable. - drop(it) would give "the one obvious way" to consume an iterator completely (for some reason that's a semi-FAQ,, and is usually answered by suggesting the excruciatingly obscure trick of feeding the iterable to a 0-size collections.deque constructor).. Of course Haskell has had `drop` all along, although not the "run to exhaustion" part.
Example: suppose we're doing the toll booth thing, and we want to yield a cumulative sum of tolls so far. Suppose someone already made a reasonable-looking generator yielding the cumulative sum of tolls for today:
def iter_cumsum_tolls_from_day(day, toll_amount_so_far): return accumulate(get_tolls_from_day(day, initial=toll_amount_so_far))
And now we want to make a way to get all tolls from the month. One might reasonably expect this to work:
def iter_cumsum_tolls_from_month(month, toll_amount_so_far): for day in month: for cumsum_tolls in iter_cumsum_tolls_from_day(day, toll_amount_so_far = toll_amount_so_far): yield cumsum_tolls toll_amount_so_far = cumsum_tolls
But this would actually DUPLICATE the last toll of every day - it appears both as the last element of the day's generator and as the first element of the next day's generator.
I didn't really follow the details there, but the suggestion would be the same regardless: drop the duplicates you don't want. Note that making up an example in your head isn't nearly as persuasive as "real life" code. Code can be _contrived_ to "prove" anything.
This is why I think that there should be an additional "include_initial_in_return=False" argument. I do agree that it should be an option to include the initial value (your "find tolls over time-span" example shows why), but that if you want that you should have to show that you thought about that by specifying "include_initial_in_return=True"
It's generally poor human design to have a second optional argument modify the behavior of yet another optional argument. If the presence of the latter can have two distinct modes of operation, then people _will_ forget which one the default mode is, making code harder to write and harder to read. Since "return the value" is supported by all known prior art, and by the bulk of "real life" Python codes known so far, "return the value" should be the default. But far better to make it the only mode rather than _just_ the default mode. Then there's nothing to be forgotten :-)