[Python-ideas] Pre-PEP 2nd draft: adding a statistics module to Python

Fri Aug 9 04:02:07 CEST 2013

On 09/08/13 08:09, Greg Ewing wrote:
>> On Aug 8, 2013 7:29 AM, "Steven D'Aprano" <steve at pearwood.info <mailto:steve at pearwood.info>> wrote:
>>  > - one-pass algorithm for variance is deferred until Python 3.5, until then, lists are the preferred data structure (iterators will continue to work, provided they are small enough to be converted to lists).
>
> Does this multi-pass algorithm being talked about use
> a predetermined number of passes? If so, then surely
> any reiterable object will do, not necessarily a list.

Correct. Any re-iterable sequence is expected to work, e.g. range objects, tuples, array.array. If you find a sequence type that doesn't work, that's a bug to be fixed.

For the record, mean and mode uses a single pass; median currently sorts the data, which I guess counts as a single pass[1]; variance uses two passes. If some future version includes skew and kurtosis[2], they will require three passes.

[1] Anyone want to contribute a version of median that uses Quickselect to avoid sorting? To make it faster than the built-in sort, it will probably need to be written in C.

[2] But which ones? I know of at least four different definitions of each, three of which are in common use. Annoyingly, the differences are not documented.

-- 
Steven