How about adding rational fraction to Python?
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Wed Feb 27 21:32:19 EST 2008
On Wed, 27 Feb 2008 17:07:37 -0800, Paul Rubin wrote:
> Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:
>> Oh come on. With a function named "mean" that calculates the sum of a
>> list of numbers and then divides by the number of items, what else
>> could it be?
>
> You have a bunch of marbles you want to put into bins. The division
> tells you how many marbles to put into each bin. That would be an
> integer since you cannot cut up individual marbles.
(Actually you can. As a small child, one of my most precious possessions
was a marble which had cracked into two halves.)
No, that doesn't follow, because you don't get the result you want if the
number of marbles is entered as Decimals or floats. Maybe the data came
from a marble-counting device that always returns floats.
You're expecting the function to magically know what you want to do with
the result and return the right kind of answer, which is the wrong way to
go about it. For example, there are situations where your data is given
in integers, but the number you want is a float.
# number of 20kg bags of flour per order
>>> data = [5, 7, 20, 2, 7, 6, 1, 37, 3]
>>> weights = [20*n for n in data]
>>> mean(weights)
195.55555555555554
If I was using a library that arbitrarily decided to round the mean
weight per order to 195kg, I'd report that as a bug. Maybe I want the
next highest integer, not lowest. Maybe I do care about that extra 5/9th
of a kilo. It simply isn't acceptable for the function to try to guess
what I'm going to do with the result.
>> You can always imagine corner cases where some programmer, somewhere,
>> has some bizarre need for a mean() function that truncates when given a
>> list of integers but not when given a list of floats. Making that the
>> default makes life easy for the 0.1% corner cases and life harder for
>> the 99.9% of regular cases, which is far from the Python philosophy.
>
> I think it's more important that a program never give a wrong answer,
> than save a few keystrokes. So, that polymorphic mean function is a bit
> scary. It might be best to throw an error if the args are all integers.
> There is no definitely correct way to handle it so it's better to
> require explicit directions.
Of course there's a correct way to handle it. You write a function that
returns the mathematical mean. And then, if you need special processing
of that mean, (say) truncating if the numbers are all ints, or on
Tuesdays, you do so afterwards:
x = mean(data)
if all(isinstance(n, int) for n in data) or today() == Tuesday:
x = int(x)
I suppose that if your application is always going to truncate the mean
you might be justified in writing an optimized function that does that.
But don't call it "truncated_mean", because that has a specific meaning
to statisticians that is not the same as what you're talking about.
Paul, I'm pretty sure you've publicly defended duck typing before. Now
you're all scared of some imagined type non-safety that results from
numeric coercions. I can't imagine why you think that this should be
allowed:
class Float(float): pass
x = Float(1.0)
mean([x, 2.0, 3.0, 5.0])
but this gives you the heebie-geebies:
mean([1, 2.0, 3.0, 5.0])
As a general principle, I'd agree that arbitrarily coercing any old type
into any other type is a bad idea. But in the specific case of numeric
coercions, 99% of the time the Right Way is to treat all numbers
identically, and then restrict the result if you want a restricted
result, so the language should make that the easy case, and leave the 1%
to the developer to write special code:
def pmean(data): # Paul Rubin's mean
"""Returns the arithmetic mean of data, unless data is all
ints, in which case returns the mean rounded to the nearest
integer less than the arithmetic mean."""
s = sum(data)
if isinstance(s, int): return s//len(data)
else: return s/len(data)
--
Steven
More information about the Python-list
mailing list