Mailman 3 Heterogeneous numeric data in statistics library - Python-ideas

May 12, 2022

      Users of the statistics module, how often do you use it with 
heterogeneous data (mixed numeric types)?

Currently most of the functions try hard to honour homogeneous data, 
e.g. if your data is Decimal or Fraction, you will (usually) get Decimal 
or Fraction results:
...
...
...
statistics.variance([Decimal('0.5'), Decimal(2)/3, Decimal(5)/2])
Decimal('1.231481481481481481481481481')
statistics.variance([Fraction(1, 2), Fraction(2, 3), Fraction(5, 2)])
Fraction(133, 108)
With mixed types, the functions usually try to coerce the values into a 
sensible common type, honouring subclasses:
...
...
...
class MyFloat(float):
...     def __repr__(self):
...             return "MyFloat(%s)" % super().__repr__()
... 
statistics.mean([1.5, 2.25, MyFloat(1.0), 3.125, 1.75])
MyFloat(1.925)
but that's harder than you might expect and the extra complexity causes 
some significant performance costs. And not all combinations are 
supported (Decimal is particularly difficult).

If you are a user of statistics, how important to you is the ability to 
**mix** numeric types, in the same data set?

Which combinations do you care about?

Would you be satisfied with a rule that said that the statistics 
functions expect homogeneous data and that the result of calling the 
functions on mixed types is not guaranteed?

-- 
Steve

Heterogeneous numeric data in statistics library

Steven D'Aprano

Jonathan Fine

Danilo J. S. Bellini

Chris Angelico

Cameron Simpson

Stephen J. Turnbull

Jonathan Fine

Danilo J. S. Bellini

Chris Angelico

Cameron Simpson

Stephen J. Turnbull

tags

participants (6)