[Python-ideas] Pre-PEP: adding a statistics module to Python

Wed Aug 7 18:28:17 CEST 2013

What if it was an option. i.e.

statistics.sum(myiter, listconv=True)

Andrew Barnert <abarnert at yahoo.com> wrote:

>On Aug 7, 2013, at 4:10, Oscar Benjamin <oscar.j.benjamin at gmail.com>
>wrote:
>
>> On Aug 6, 2013 11:19 PM, "Andrew Barnert" <abarnert at yahoo.com> wrote:
>> >
>> > On Aug 6, 2013, at 12:44, Michele Lacchia
><michelelacchia at gmail.com> wrote:
>> >>
>> >> Yes but then you lose all the advantages of iterators. What's the
>point in that?
>> >> Furthermore it's not guaranteed that you can always converting an
>iterator into a list. As it has already been said, you could run out of
>memory, for instance.
>> >
>> > And the places where the stdlib/builtins do that automatic
>conversion--even when it's well motivated and almost always harmless
>once you think about it, like str.join--are surprising to most people.
>(Following up on str.join as an example, just about every question
>whose answer is str.join([...]) ends up with someone suggesting a
>genexpr instead of a listcomp, someone else explaining that it doesn't
>actually save any memory in that case, just wastes a bit of time, then
>some back and forth until everyone finally gets it.)
>> >
>> > The question is whether it would be even _more_ surprising to
>return an error, or a less accurate result. I don't know the answer to
>that.
>> 
>> I'm going to make the claim (with no supporting data) that more than
>95% of the time, when a user calls variance(iterator) they will be
>guilty of premature optimisation.
>> 
>I think you're probably right. In the similar cases that come up with,
>e.g., str.join(iterator), there is usually no reason whatsoever to
>believe that any memory or speed cost will make any difference. Often
>people get into arguments over a half dozen strings (where, even if it
>_did_ matter, which it doesn't, N is so low that algorithmic complexity
>isn't even relevant).
>> Really the cases where you can't build a collection are rare. People
>will still do it though just because it's satisfying to do everything
>with iterators in constant memory (I'm often guilty of this kind of
>thing).
>> 
>Or so that a sequence of operations can be pipelined, possibly leading
>to better cache behavior. Or just because iterators are the pythonic
>(or python3-ic?) way to do it.
>> However unlike str.join there's no one pass algorithm that can be as
>accurate so it's not purely a performance question.
>> 
>But the point is that str.join doesn't use a one-pass algorithm, it
>just constructs a list so it can do it in two passes. And it's been
>suggested on this thread that variance could easily do the same thing.
>
>So there are three choices. Using a one-pass algorithm would be
>surprising because it's less accurate. Automatic listification would be
>surprising because you went out of your way to pass lazy iterators
>around and variance broke the benefits. An exception would be
>surprising because almost every other function in the stdlib that takes
>lists also takes iterators, even when there are good reasons not to.
>
>I think you still may be right that the error is the way to go. You'll
>learn the problem quickly, and the workaround will be obvious, and the
>reason for it will be available in the docs. The other two potential
>surprises may not be as discoverable.
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Python-ideas mailing list
>Python-ideas at python.org
>http://mail.python.org/mailman/listinfo/python-ideas

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130807/7e3dec22/attachment-0001.html>