[New-bugs-announce] [issue20575] Type handling policy for the statistics module

Oscar Benjamin report at bugs.python.org
Sun Feb 9 14:29:43 CET 2014


New submission from Oscar Benjamin:

As of issue20481, the statistics module for Python 3.4 will disallow any mixing of numeric types with the exception of int that can mix with any other type (but only one at a time). My understanding is that this change was not necessarily considered to be a permanent policy but rather a quick fix for Python 3.4 in order to explicitly prevent certain confusing situations arising from mixing Decimal with other stdlib numeric types.

issue20499 has a lot of discussion about different ways to improve accuracy and speed for the mean, variance etc. functions in the statistics module. It's tricky though to come up with a concrete implementation without having a clear specification for how the module should handle different numeric types.

There are several related issues to do with type handling. Should the statistics module
1) Use the same coercion rules as the numeric tower (pep-3141)?
2) Allow Decimal to mix with any types from the numeric tower?
3) Allow non-stdlib types that don't use the numeric tower?
4) Allow any mixing of types at all?
5) Strive to achieve the maximum possible accuracy for every type that it accepts?

I don't personally see much of a use-case for mixing e.g. Decimal and Fraction. I don't think it's unreasonable to require users to choose a numeric type and stick to it. The common cases will almost certainly be either all int or all float so those should be the main targets of any speed optimisation.

If a user is using Fraction/Decimal then they must have gone out of their way to do so and they may as well do so  consistently for all of their data. When choosing to use Fraction you do so because you want perfect accuracy. Mixing those Fractions with floating point types such as float and Decimal doesn't make any sense. Although there is a sense in which Decimals are also exact since they are always exact in their constructor. However I don't think there's any case where the Decimal constructor can be used but the Fraction constructor cannot so this mixing of types is unnecessary.

As with Fraction a user who chooses to use Decimal is going out of their way to do so because of the kind of accuracy guarantees that the type provides. It doesn't make any sense to mix these with floats that are inherently tainted with the wrong kind of rounding error. So mixing Decimal and float doesn't make any sense either.

Note that ordinary arithmetic prohibits the mixing of Decimal with Fraction/float so that on this point the statistics module is essentially maintaining a consistent position with respect to the policy of the Decimal type.

On the other hand ordinary arithmetic allows all of int, float, Fraction and complex and indeed any other type subscribing to the ABCs in the numeric tower to be mixed. As of issue20481 the statistics module does not allow any type mixing except for int:
http://hg.python.org/cpython/rev/5db74cd953ab
Note also that it uses type identity rather than subclass relationships or ABCs so that it is not even possible to mix e.g. float with a float subclass.

The most common case of mixing will almost certainly be int and float which will work. However I doubt that the current policy would be considered to be in keeping with Python's general policy on numeric types and anticipate that there will be a desire to change it in the future. The obvious candidate for a policy is the numeric tower and ABCs of PEP-3141. In that case the statistics module has a partial precedent on which to base its policy. The only tricky part is that Decimal is not part of the numeric tower. So there needs to be a special rule for Decimal such as "it only mixes with int/Integral".

Basing the policy on the numeric tower is attractive but it is worth noting that the std lib types int, float, Fraction and Decimal are the only types that actually implement and register with these ABCs. So it's not much different from saying that those particular types (and subclasses of) are accepted but I think that that is better than the current policy. 

Third party numeric types don't implement the interfaces described in PEP-3141. However one thing that is implemented by every third-party numeric type that I know of is __float__. So if there was to be a desire to support those in the statistics module then the simplest extension of the policy on types is to say that any non-numeric-tower types will simply be coerced with float. This still leaves the issue about how type mixing works there but, again, perhaps the safest option before the need arises is just to say that no type mixing is allowed if any input object is not from the numeric tower.

What do you think?

----------
components: Library (Lib)
messages: 210762
nosy: ncoghlan, oscarbenjamin, skrah, stevenjd, wolma
priority: normal
severity: normal
status: open
title: Type handling policy for the statistics module
type: enhancement
versions: Python 3.5

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20575>
_______________________________________


More information about the New-bugs-announce mailing list