How about adding rational fraction to Python?

Wed Feb 27 21:32:19 EST 2008

On Wed, 27 Feb 2008 17:07:37 -0800, Paul Rubin wrote:

> Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au> writes:
>> Oh come on. With a function named "mean" that calculates the sum of a
>> list of numbers and then divides by the number of items, what else
>> could it be?
> 
> You have a bunch of marbles you want to put into bins.  The division
> tells you how many marbles to put into each bin.  That would be an
> integer since you cannot cut up individual marbles.

(Actually you can. As a small child, one of my most precious possessions 
was a marble which had cracked into two halves.)

No, that doesn't follow, because you don't get the result you want if the 
number of marbles is entered as Decimals or floats. Maybe the data came 
from a marble-counting device that always returns floats.

You're expecting the function to magically know what you want to do with 
the result and return the right kind of answer, which is the wrong way to 
go about it. For example, there are situations where your data is given 
in integers, but the number you want is a float.

# number of 20kg bags of flour per order
>>> data = [5, 7, 20, 2, 7, 6, 1, 37, 3]
>>> weights = [20*n for n in data]
>>> mean(weights)
195.55555555555554

If I was using a library that arbitrarily decided to round the mean 
weight per order to 195kg, I'd report that as a bug. Maybe I want the 
next highest integer, not lowest. Maybe I do care about that extra 5/9th 
of a kilo. It simply isn't acceptable for the function to try to guess 
what I'm going to do with the result.

>> You can always imagine corner cases where some programmer, somewhere,
>> has some bizarre need for a mean() function that truncates when given a
>> list of integers but not when given a list of floats.  Making that the
>> default makes life easy for the 0.1% corner cases and life harder for
>> the 99.9% of regular cases, which is far from the Python philosophy.
> 
> I think it's more important that a program never give a wrong answer,
> than save a few keystrokes.  So, that polymorphic mean function is a bit
> scary. It might be best to throw an error if the args are all integers.
> There is no definitely correct way to handle it so it's better to
> require explicit directions.

Of course there's a correct way to handle it. You write a function that 
returns the mathematical mean. And then, if you need special processing 
of that mean, (say) truncating if the numbers are all ints, or on 
Tuesdays, you do so afterwards:

x = mean(data)
if all(isinstance(n, int) for n in data) or today() == Tuesday:
    x = int(x)

I suppose that if your application is always going to truncate the mean 
you might be justified in writing an optimized function that does that. 
But don't call it "truncated_mean", because that has a specific meaning 
to statisticians that is not the same as what you're talking about.

Paul, I'm pretty sure you've publicly defended duck typing before. Now 
you're all scared of some imagined type non-safety that results from 
numeric coercions. I can't imagine why you think that this should be 
allowed:

class Float(float): pass
x = Float(1.0)
mean([x, 2.0, 3.0, 5.0])

but this gives you the heebie-geebies:

mean([1, 2.0, 3.0, 5.0])

As a general principle, I'd agree that arbitrarily coercing any old type 
into any other type is a bad idea. But in the specific case of numeric 
coercions, 99% of the time the Right Way is to treat all numbers 
identically, and then restrict the result if you want a restricted 
result, so the language should make that the easy case, and leave the 1% 
to the developer to write special code:

def pmean(data):  # Paul Rubin's mean
    """Returns the arithmetic mean of data, unless data is all 
    ints, in which case returns the mean rounded to the nearest 
    integer less than the arithmetic mean."""
    s = sum(data)
    if isinstance(s, int): return s//len(data)
    else: return s/len(data)

-- 
Steven