Re: [Python-ideas] statistics module in Python3.4

Oscar Benjamin <oscar.j.benjamin@...> writes:
weights in an
I would accept this as a bug-fix, but I do not agree with you and Greg about an API trying to be too clever here. A Mapping is a clearly defined term and if the module doc stated that for Mappings passed to functions in statistics their values will be interpreted as weights/frequencies of the corresponding keys that should be clear enough. If what you really want is to do statistics just on the keys, you can easily pass just these (e.g., mean(mydict.keys()). On the other hand, separate functions would complicate the API because you would end up with a weighted counterpart for almost every function in the module. The alternative of passing weights in the form of a second sequence sounds more attractive, but I guess both specifications could coexist peacefully (just like there are several ways of constructing a dictionary). Then if you have data and weights in a Mapping already, you can just go ahead and use it, and likewise, if you have them in two separate sequences. Best, Wolfgang

Wolfgang Maier <wolfgang.maier@...> writes:
It may also help to address this from the users' perspective. What possible other use-cases could there be to pass a Mapping (let alone a Counter) to one of the functions in statistics? Best, Wolfgang

On 1 February 2014 22:00, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
Why not just pass counter.elements() to the functions if you want to use a counter as a frequency table? Maybe you're arguing that Mappings should be rejected with an exception? But that seems like an unnecessary restriction just to catch the mistaken usage of forgetting to call elements() on a Counter. Paul

The difference is that by accepting a Counter directly functions like statistics._sum and mode can do their job faster and more space- efficient. Using counter.elements means exploding the data structure, then summing up (for _sum) all the values, while you could just sum all key*value of the Counter.
I don't know if they should be rejected, an explicit warning in the docs that their treatment may be subject to change in Python3.5 may also be an option. Best, Wolfgang

On 2 February 2014 09:24, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
First make it correct, *then* make it fast. I'm inclined to favour the approach of forcing the "iter()" call here, so Counter is always treated the same as any other iterable. There's a plausible case to be made for offering a more efficient alternative API for working directly with collections.Counter objects in the statistics module, but that should be considered specifically for 3.5, rather than relying on an implicit detail of the current implementation. For a historical precedent, compare the printf-style formatting API, which changes behaviour based on whether the RHS is a tuple, dict or other type, and the newer format()/format_map() APIs where the special case to handle passing in an existing dict without going through **kwargs expansion was moved out to a suitably named dedicated method. I have now filed two related issues for this: 3.4 release blocker to avoid inadvertently special casing Counter: http://bugs.python.org/issue20478 3.5 RFE to provide tools to work directly with weight/frequency mappings: http://bugs.python.org/issue20479 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (3)
-
Nick Coghlan
-
Paul Moore
-
Wolfgang Maier