Yet another sum function (fractions.sum)
The discussion around having a sum() function in the statistics module reminded me that despite the fact that there are two functions in the stdlib (and more in the third-party libraries I use) I have previously rolled my own sum() function. The reason is that the stdlib does not currently contain a function that can sum Fractions in linear time for many inputs even though it is possible to implement a function that achieves closer to linear performance in more cases. I propose that there could be a sum() function or class-method in the fractions module for this purpose. I would just raise this on the tracker but seeing how many feathers were ruffled by statistics.sum I thought I'd suggest it here first. To demonstrate the problem I'll show how a quick and dirty mergesum() can out-perform sum(): $ cat tmpsum.py # tmpsum.py # Generate data from random import randint, seed from fractions import Fraction as F seed(123456789) # Use the same numbers each time nums = [F(randint(-1000, 1000), randint(1, 1000)) for _ in range(100000)] # 1) mergesum() is more efficient than sum() with Fractions. # 2) It assumes associativity of addition since it reorders the sum. # 3) It performs the same number of __add__ operations as sum() # 4) A more complicated iterator version is possible. def mergesum(seq): while len(seq) > 1: new = [a + b for a, b in zip(seq[:-1:2], seq[1::2])] if len(seq) % 2: new.append(seq[-1]) seq = new return seq[0] # Just a quick test assert mergesum(nums[:101]) == sum(nums[:101]) So now let's time sum() with these numbers: $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:10]' 'sum(nums)' 1000 loops, best of 3: 206 usec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:100]' 'sum(nums)' 100 loops, best of 3: 6.24 msec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:1000]' 'sum(nums)' 10 loops, best of 3: 320 msec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:10000]' 'sum(nums)' 10 loops, best of 3: 6.43 sec per loop Each time we increase the size of the input by a factor of 10 the computation time increases by a factor of about 30. This is caused by computing the gcd() to normalise the result between each increment to the total. As the numerator and denominator of the subtotal get larger the time taken to compute the gcd() after each increment increases. The result is that the algorithm overall costs somewhere between linear and quadratic time [from the above maybe it's O(n**(3/2))]. Now let's see how mergesum() performs: $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:10]' 'mergesum(nums)' 10000 loops, best of 3: 186 usec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:100]' 'mergesum(nums)' 100 loops, best of 3: 2.16 msec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:1000]' 'mergesum(nums)' 10 loops, best of 3: 24.6 msec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:10000]' 'mergesum(nums)' 10 loops, best of 3: 256 msec per loop $ python -m timeit -s 'from tmpsum import mergesum, nums; nums=nums[:100000]' 'mergesum(nums)' 10 loops, best of 3: 2.59 sec per loop (I didn't time sum() with a 100000 input to compare with that last run). Notice that mergesum() out-performs sum() for all input sizes and that the time scaling is much closer to linear i.e. 10x the input takes 10x the time. It works by summing adjacent pairs of numbers halving the size of the list each time. This has the benefit that the numerator and denominator in most of additions are a lot smaller so that their gcd() is computed faster. Only the final additions need to use the really big numerator and denominator. The performance of both sum() functions are sensitive to the distribution of inputs and in particular the distribution of denominators but in my own usage a merge-sum algorithm has always been faster. I have found this useful for myself (using even larger lists of numbers) when using the fractions module and I propose that something similar could be added to the fractions module. Oscar
My intuition was that one might do better than mergesum() by binning
numbers. I.e. this implementation:
def binsum(seq):
bins = dict()
for f in seq:
bins[f.denominator] = bins.get(f.denominator, 0) + f
return mergesum(list(bins.values()))
Indeed I am right, but the effect doesn't show up until one looks at a
fairly large collection of fractions:
538-code % python3 -m timeit -s 'from tmpsum import mergesum, binsum,
nums;
nums=nums[:50000]' 'mergesum(nums)'
10 loops, best of 3: 806 msec per loop
539-code % python3 -m timeit -s 'from tmpsum import mergesum, binsum,
nums;
nums=nums[:50000]' 'binsum(nums)'
10 loops, best of 3: 627 msec per loop
binsum() beats sum() at much smaller sizes as well, but it doesn't beat
simple mergesum() at the small sizes. This is true, BTW, even if binsum()
only uses sum() on the last line; but there's an extra boost in speed to
use mergesum() there.
I'm not sure whether one might get a better binsum() by binning not on
denominator itself, but binning together everything with a denominator that
is a multiple of a stored denominator. In principle, that could be many
fewer bins with negligible gcd() calculation needed; however, the extra
testing needed to check for multiples might override that gain. There are
several variations that come to mind, but I haven't tested them.
On Fri, Aug 16, 2013 at 11:51 AM, Oscar Benjamin wrote: The discussion around having a sum() function in the statistics module
reminded me that despite the fact that there are two functions in the
stdlib (and more in the third-party libraries I use) I have previously
rolled my own sum() function. The reason is that the stdlib does not
currently contain a function that can sum Fractions in linear time for
many inputs even though it is possible to implement a function that
achieves closer to linear performance in more cases. I propose that
there could be a sum() function or class-method in the fractions
module for this purpose. I would just raise this on the tracker but
seeing how many feathers were ruffled by statistics.sum I thought I'd
suggest it here first. To demonstrate the problem I'll show how a quick and dirty mergesum()
can out-perform sum(): $ cat tmpsum.py
# tmpsum.py # Generate data
from random import randint, seed
from fractions import Fraction as F
seed(123456789) # Use the same numbers each time
nums = [F(randint(-1000, 1000), randint(1, 1000)) for _ in
range(100000)] # 1) mergesum() is more efficient than sum() with Fractions.
# 2) It assumes associativity of addition since it reorders the sum.
# 3) It performs the same number of __add__ operations as sum()
# 4) A more complicated iterator version is possible.
def mergesum(seq):
while len(seq) > 1:
new = [a + b for a, b in zip(seq[:-1:2], seq[1::2])]
if len(seq) % 2:
new.append(seq[-1])
seq = new
return seq[0] # Just a quick test
assert mergesum(nums[:101]) == sum(nums[:101]) So now let's time sum() with these numbers: $ python -m timeit -s 'from tmpsum import mergesum, nums;
nums=nums[:10]' 'sum(nums)'
1000 loops, best of 3: 206 usec per loop
$ python -m timeit -s 'from tmpsum import mergesum, nums;
nums=nums[:100]' 'sum(nums)'
100 loops, best of 3: 6.24 msec per loop
$ python -m timeit -s 'from tmpsum import mergesum, nums;
nums=nums[:1000]' 'sum(nums)'
10 loops, best of 3: 320 msec per loop
$ python -m timeit -s 'from tmpsum import mergesum, nums;
nums=nums[:10000]' 'sum(nums)'
10 loops, best of 3: 6.43 sec per loop Each time we increase the size of the input by a factor of 10 the
computation time increases by a factor of about 30. This is caused by
computing the gcd() to normalise the result between each increment to
the total. As the numerator and denominator of the subtotal get larger
the time taken to compute the gcd() after each increment increases.
The result is that the algorithm overall costs somewhere between
linear and quadratic time [from the above maybe it's O(n**(3/2))]. Now let's see how mergesum() performs: $ python -m timeit -s 'from tmpsum import mergesum, nums;
nums=nums[:10]' 'mergesum(nums)'
10000 loops, best of 3: 186 usec per loop
$ python -m timeit -s 'from tmpsum import mergesum, nums;
nums=nums[:100]' 'mergesum(nums)'
100 loops, best of 3: 2.16 msec per loop
$ python -m timeit -s 'from tmpsum import mergesum, nums;
nums=nums[:1000]' 'mergesum(nums)'
10 loops, best of 3: 24.6 msec per loop
$ python -m timeit -s 'from tmpsum import mergesum, nums;
nums=nums[:10000]' 'mergesum(nums)'
10 loops, best of 3: 256 msec per loop
$ python -m timeit -s 'from tmpsum import mergesum, nums;
nums=nums[:100000]' 'mergesum(nums)'
10 loops, best of 3: 2.59 sec per loop (I didn't time sum() with a 100000 input to compare with that last run). Notice that mergesum() out-performs sum() for all input sizes and that
the time scaling is much closer to linear i.e. 10x the input takes 10x
the time. It works by summing adjacent pairs of numbers halving the
size of the list each time. This has the benefit that the numerator
and denominator in most of additions are a lot smaller so that their
gcd() is computed faster. Only the final additions need to use the
really big numerator and denominator. The performance of both sum()
functions are sensitive to the distribution of inputs and in
particular the distribution of denominators but in my own usage a
merge-sum algorithm has always been faster. I have found this useful for myself (using even larger lists of
numbers) when using the fractions module and I propose that something
similar could be added to the fractions module. Oscar
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
http://mail.python.org/mailman/listinfo/python-ideas --
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons. Intellectual property is
to the 21st century what the slave trade was to the 16th.
David Mertz, 16.08.2013 22:06:
My intuition was that one might do better than mergesum() by binning numbers. I.e. this implementation:
def binsum(seq): bins = dict() for f in seq: bins[f.denominator] = bins.get(f.denominator, 0) + f return mergesum(list(bins.values()))
def mergesum(seq): while len(seq) > 1: new = [a + b for a, b in zip(seq[:-1:2], seq[1::2])] if len(seq) % 2: new.append(seq[-1]) seq = new return seq[0]
Indeed I am right, but the effect doesn't show up until one looks at a fairly large collection of fractions:
538-code % python3 -m timeit -s 'from tmpsum import mergesum, binsum, nums; nums=nums[:50000]' 'mergesum(nums)' 10 loops, best of 3: 806 msec per loop 539-code % python3 -m timeit -s 'from tmpsum import mergesum, binsum, nums; nums=nums[:50000]' 'binsum(nums)' 10 loops, best of 3: 627 msec per loop
Simply sorting by denominator gives me a visible advantage over the above: def sortsum(seq): def key(f): return f.denominator if isinstance(f, F) else 1 seq = sorted(seq, key=key) if len(seq) < 3: return sum(seq) return mergesum(seq) $ python3.4 -m timeit -s '...; c=nums[:10000]' 'sortsum(c)' 10 loops, best of 3: 76.9 msec per loop $ python3.4 -m timeit -s '...; c=nums[:10000]' 'binsum(c)' 10 loops, best of 3: 83.2 msec per loop $ python3.4 -m timeit -s '...; c=nums[:10000]' 'mergesum(c)' 10 loops, best of 3: 106 msec per loop $ python3.4 -m timeit -s '...; c=nums[:1000]' 'sortsum(c)' 100 loops, best of 3: 9.49 msec per loop $ python3.4 -m timeit -s '...; c=nums[:1000]' 'binsum(c)' 100 loops, best of 3: 12.9 msec per loop $ python3.4 -m timeit -s '...; c=nums[:1000]' 'mergesum(c)' 100 loops, best of 3: 9.66 msec per loop $ python3.4 -m timeit -s '...; c=nums[:100]' 'sortsum(c)' 1000 loops, best of 3: 951 usec per loop $ python3.4 -m timeit -s '...; c=nums[:100]' 'mergesum(c)' 1000 loops, best of 3: 937 usec per loop $ python3.4 -m timeit -s '...; c=nums[:10]' 'sortsum(c)' 10000 loops, best of 3: 88.8 usec per loop $ python3.4 -m timeit -s '...; c=nums[:10]' 'mergesum(c)' 10000 loops, best of 3: 80.2 usec per loop So, it's a bit slower for small sequences (15 microseconds for <100 items sounds acceptable to me), but it's quite a bit faster for long sequences. It seems to be slowing down a bit for really long sequences, though: $ python3.4 -m timeit -s '...; c=nums[:100000]' 'mergesum(c)' 10 loops, best of 3: 1 sec per loop $ python3.4 -m timeit -s '...; c=nums[:100000]' 'sortsum(c)' 10 loops, best of 3: 748 msec per loop $ python3.4 -m timeit -s '...; c=nums[:100000]' 'binsum(c)' 10 loops, best of 3: 743 msec per loop However, unpacking the fractions for the bin summing makes this way faster for larger sequences: def binsum2(seq): bins = dict() get_bin = bins.get _isinstance = isinstance for f in seq: d, n = (f.denominator,f.numerator) if _isinstance(f,F) else (1,f) bins[d] = get_bin(d, 0) + n return mergesum([ F(n,d) for d, n in sorted(bins.items()) ]) $ python3.4 -m timeit -s '...; c=nums[:10000]' 'sortsum(c)' 10 loops, best of 3: 76.9 msec per loop $ python3.4 -m timeit -s '...; c=nums[:10000]' 'binsum2(c)' 10 loops, best of 3: 21 msec per loop $ python3.4 -m timeit -s '...; c=nums[:1000]' 'sortsum(c)' 100 loops, best of 3: 9.49 msec per loop $ python3.4 -m timeit -s '...; c=nums[:1000]' 'binsum2(c)' 100 loops, best of 3: 8.7 msec per loop But again, slower for short ones: $ python3.4 -m timeit -s '...; c=nums[:100]' 'mergesum(c)' 1000 loops, best of 3: 937 usec per loop $ python3.4 -m timeit -s '...; c=nums[:100]' 'binsum2(c)' 1000 loops, best of 3: 1.34 msec per loop $ python3.4 -m timeit -s '...; c=nums[:10]' 'mergesum(c)' 10000 loops, best of 3: 80.2 usec per loop $ python3.4 -m timeit -s '...; c=nums[:10]' 'binsum2(c)' 10000 loops, best of 3: 137 usec per loop Which is expected, because short sequences make it less likely to actually find common denominators. Assuming that the set of distinct denominators is usually small compared to the number of values, this would be the right tradeoff. Maybe inlining the denominator normalisation instead of creating Fraction instances at the end would give another boost here. In any case, this huge difference in performance speaks for providing some kind of specialised sum() function in the fractions module. Stefan
On 8/19/2013 5:35 AM, Stefan Behnel wrote:
David Mertz, 16.08.2013 22:06:
My intuition was that one might do better than mergesum() by binning numbers. I.e. this implementation:
def binsum(seq): bins = dict() for f in seq: bins[f.denominator] = bins.get(f.denominator, 0) + f return mergesum(list(bins.values()))
def mergesum(seq): while len(seq) > 1: new = [a + b for a, b in zip(seq[:-1:2], seq[1::2])] if len(seq) % 2: new.append(seq[-1]) seq = new return seq[0]
Indeed I am right, but the effect doesn't show up until one looks at a fairly large collection of fractions:
538-code % python3 -m timeit -s 'from tmpsum import mergesum, binsum, nums; nums=nums[:50000]' 'mergesum(nums)' 10 loops, best of 3: 806 msec per loop 539-code % python3 -m timeit -s 'from tmpsum import mergesum, binsum, nums; nums=nums[:50000]' 'binsum(nums)' 10 loops, best of 3: 627 msec per loop
Simply sorting by denominator gives me a visible advantage over the above:
def sortsum(seq): def key(f): return f.denominator if isinstance(f, F) else 1 seq = sorted(seq, key=key) if len(seq) < 3: return sum(seq) return mergesum(seq)
$ python3.4 -m timeit -s '...; c=nums[:10000]' 'sortsum(c)' 10 loops, best of 3: 76.9 msec per loop $ python3.4 -m timeit -s '...; c=nums[:10000]' 'binsum(c)' 10 loops, best of 3: 83.2 msec per loop $ python3.4 -m timeit -s '...; c=nums[:10000]' 'mergesum(c)' 10 loops, best of 3: 106 msec per loop
$ python3.4 -m timeit -s '...; c=nums[:1000]' 'sortsum(c)' 100 loops, best of 3: 9.49 msec per loop $ python3.4 -m timeit -s '...; c=nums[:1000]' 'binsum(c)' 100 loops, best of 3: 12.9 msec per loop $ python3.4 -m timeit -s '...; c=nums[:1000]' 'mergesum(c)' 100 loops, best of 3: 9.66 msec per loop
$ python3.4 -m timeit -s '...; c=nums[:100]' 'sortsum(c)' 1000 loops, best of 3: 951 usec per loop $ python3.4 -m timeit -s '...; c=nums[:100]' 'mergesum(c)' 1000 loops, best of 3: 937 usec per loop
$ python3.4 -m timeit -s '...; c=nums[:10]' 'sortsum(c)' 10000 loops, best of 3: 88.8 usec per loop $ python3.4 -m timeit -s '...; c=nums[:10]' 'mergesum(c)' 10000 loops, best of 3: 80.2 usec per loop
So, it's a bit slower for small sequences (15 microseconds for <100 items sounds acceptable to me), but it's quite a bit faster for long sequences.
It seems to be slowing down a bit for really long sequences, though:
$ python3.4 -m timeit -s '...; c=nums[:100000]' 'mergesum(c)' 10 loops, best of 3: 1 sec per loop $ python3.4 -m timeit -s '...; c=nums[:100000]' 'sortsum(c)' 10 loops, best of 3: 748 msec per loop $ python3.4 -m timeit -s '...; c=nums[:100000]' 'binsum(c)' 10 loops, best of 3: 743 msec per loop
However, unpacking the fractions for the bin summing makes this way faster for larger sequences:
def binsum2(seq): bins = dict() get_bin = bins.get _isinstance = isinstance for f in seq: d, n = (f.denominator,f.numerator) if _isinstance(f,F) else (1,f) bins[d] = get_bin(d, 0) + n return mergesum([ F(n,d) for d, n in sorted(bins.items()) ])
$ python3.4 -m timeit -s '...; c=nums[:10000]' 'sortsum(c)' 10 loops, best of 3: 76.9 msec per loop $ python3.4 -m timeit -s '...; c=nums[:10000]' 'binsum2(c)' 10 loops, best of 3: 21 msec per loop
$ python3.4 -m timeit -s '...; c=nums[:1000]' 'sortsum(c)' 100 loops, best of 3: 9.49 msec per loop $ python3.4 -m timeit -s '...; c=nums[:1000]' 'binsum2(c)' 100 loops, best of 3: 8.7 msec per loop
But again, slower for short ones:
$ python3.4 -m timeit -s '...; c=nums[:100]' 'mergesum(c)' 1000 loops, best of 3: 937 usec per loop $ python3.4 -m timeit -s '...; c=nums[:100]' 'binsum2(c)' 1000 loops, best of 3: 1.34 msec per loop
$ python3.4 -m timeit -s '...; c=nums[:10]' 'mergesum(c)' 10000 loops, best of 3: 80.2 usec per loop $ python3.4 -m timeit -s '...; c=nums[:10]' 'binsum2(c)' 10000 loops, best of 3: 137 usec per loop
Which is expected, because short sequences make it less likely to actually find common denominators. Assuming that the set of distinct denominators is usually small compared to the number of values, this would be the right tradeoff.
Maybe inlining the denominator normalisation instead of creating Fraction instances at the end would give another boost here.
In any case, this huge difference in performance speaks for providing some kind of specialised sum() function in the fractions module.
At first glance, I agree. It would make using Fractions in a real application more practical. -- Terry Jan Reedy
On 19 August 2013 10:35, Stefan Behnel
David Mertz, 16.08.2013 22:06:
My intuition was that one might do better than mergesum() by binning numbers. I.e. this implementation:
def binsum(seq): bins = dict() for f in seq: bins[f.denominator] = bins.get(f.denominator, 0) + f return mergesum(list(bins.values()))
Good thinking David. Actually for some distributions of inputs this massively outperforms the other implementations. One common use case I have for Fractions is to check the accuracy of floating point computation by repeating the computation with Fractions. In this case I would be working with Fractions whose denominators are always powers of 2 (and usually only a handful of different values). And as Stefan says if we're binning on the denominator then we can make it really fast by adding the numerators with int.__add__.
Simply sorting by denominator gives me a visible advantage over the above:
def sortsum(seq): def key(f): return f.denominator if isinstance(f, F) else 1 seq = sorted(seq, key=key) if len(seq) < 3: return sum(seq) return mergesum(seq)
Interesting. I've developed an iterator version of mergesum: def imergesum(iterable): stack = [] it = iter(iterable) nextpow2 = 1 while True: for n2, val in enumerate(it, 1): if n2 == nextpow2: stack.append(val) nextpow2 *= 2 break elif n2 % 2: stack[-1] += val else: ishift = -1 while not n2 % 2: val, stack[ishift] = stack[ishift], val ishift -= 1 n2 >>= 1 stack[ishift] += val else: return mergesum(stack) # Could just use sum here It uses log(N) storage so you'd run out of space on an iterator of say 2 ** sys.maxint Fractions but that's unlikely to bother anyone. I combined the different approaches to make a rationalsum() function that is scalable and tries to take advantage of the binsum() and sortsum() improvements where possible: def rationalsum(iterable, tablesize=1000): def binsumgen(iterable): iterator = iter(iterable) bins = defaultdict(int) finished = False denom = itemgetter(0) while not finished: finished = True for n, num in zip(range(tablesize), iterator): bins[num.denominator] += num.numerator finished = False if len(bins) >= tablesize or finished: dn = sorted(bins.items(), key=denom) yield mergesum([F(n, d) for d, n in dn]) bins.clear() return imergesum(binsumgen(iterable)) I tested this on a suite of different rational sequences (full script at the bottom): 1) float - Fractions made from floats with power of 2 denominators. 2) d1000 - the original numbers with denominators uniform in [1, 1000]. 3) d10 - like above but [1, 10]. 4) bigint - big ints rather than Fractions (just for comparison). 5) e - the sequence 1 + 1 + 1/2 + 1/3! + 1/4! + ... (this is slow). In different situations the binsum(), sortedsum() and mergesum() functions parform differently (in some cases wildly so). The rationalsum() function is never the best but always has the same order of magnitude as the best. Apart from the bigint case builtins.sum() is always among the worst performing. Here's the benchmark output (Python 3.3 on Windows XP 32-bit - the script takes a while): Running benchmarks (times in microseconds)... -------------- float --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 282 | 269 | 298 | 238 | 284 | 249 100 | 2804 | 2717 | 3045 | 766 | 2689 | 780 1000 | 29597 | 27468 | 31275 | 2093 | 27151 | 2224 10000 | 2.9e+5 | 2.7e+5 | 3.1e+5 | 12908 | 2.7e+5 | 13122 100000 | 3.1e+6 | 2.8e+6 | 3.1e+6 | 1.2e+5 | 2.8e+6 | 1.2e+5 -------------- d1000 --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 214 | 190 | 203 | 318 | 206 | 343 100 | 5879 | 2233 | 2578 | 3178 | 2338 | 3197 1000 | 2.6e+5 | 24866 | 29015 | 21929 | 23344 | 21901 10000 | 6.1e+6 | 2.7e+5 | 3e+5 | 48111 | 1.8e+5 | 48814 100000 | 6.6e+7 | 2.7e+6 | 3e+6 | 1.5e+5 | 1.8e+6 | 3.5e+5 -------------- bigint --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 1 | 14 | 19 | 19 | 19 | 36 100 | 15 | 54 | 140 | 59 | 72 | 81 1000 | 158 | 329 | 1294 | 461 | 499 | 550 10000 | 1655 | 2957 | 13102 | 4477 | 4497 | 5242 100000 | 16539 | 35929 | 1.3e+5 | 44905 | 55101 | 52732 -------------- e --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 155 | 156 | 164 | 262 | 166 | 274 100 | 9464 | 2777 | 3190 | 9282 | 2893 | 4079 1000 | 1.1e+7 | 8.7e+5 | 9.4e+5 | 1.1e+7 | 8.8e+5 | 9e+5 -------------- d10 --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 142 | 144 | 156 | 139 | 157 | 153 100 | 1456 | 1470 | 1822 | 343 | 1541 | 347 1000 | 15080 | 14978 | 16955 | 1278 | 15557 | 1241 10000 | 1.5e+5 | 1.5e+5 | 1.6e+5 | 9845 | 1.5e+5 | 10343 100000 | 1.5e+6 | 1.5e+6 | 1.7e+6 | 96015 | 1.5e+6 | 99856 And here's the script: # tmpsum.py from __future__ import print_function def mergesum(seq): while len(seq) > 1: new = [a + b for a, b in zip(seq[:-1:2], seq[1::2])] if len(seq) % 2: new.append(seq[-1]) seq = new return seq[0] def imergesum(iterable): stack = [] it = iter(iterable) nextpow2 = 1 while True: for n2, val in enumerate(it, 1): if n2 == nextpow2: stack.append(val) nextpow2 *= 2 break elif n2 % 2: stack[-1] += val else: ishift = -1 while not n2 % 2: val, stack[ishift] = stack[ishift], val ishift -= 1 n2 >>= 1 stack[ishift] += val else: return mergesum(stack) from collections import defaultdict def binsum(iterable): bins = defaultdict(int) for num in iterable: bins[num.denominator] += num.numerator return mergesum([F(n, d) for d, n in bins.items()]) from operator import attrgetter def sortsum(seq): seq = sorted(seq, key=attrgetter('denominator')) if len(seq) < 3: return sum(seq) return mergesum(seq) from operator import itemgetter def rationalsum(iterable, tablesize=1000): def binsumgen(iterable): iterator = iter(iterable) bins = defaultdict(int) finished = False denom = itemgetter(0) while not finished: finished = True for n, num in zip(range(tablesize), iterator): bins[num.denominator] += num.numerator finished = False if len(bins) >= tablesize or finished: dn = sorted(bins.items(), key=denom) yield mergesum([F(n, d) for d, n in dn]) bins.clear() return imergesum(binsumgen(iterable)) sumfuncs = sum, mergesum, imergesum, binsum, sortsum, rationalsum # Just a quick test if True: print('testing', end=' ') from fractions import Fraction as F nums = [F(n, 2*n +1) for n in range(3000)] for n in 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 100, 2005: ns = nums[:n] result = sum(ns) for func in sumfuncs: assert func(ns) == result, func.__name__ print('.', end='') print(' passed!') if True: print('generating data...') from random import randint, gauss, seed from fractions import Fraction as F seed(123456789) nmax = 10 ** 5 nums_d1000 = [F(randint(-1000, 1000), randint(1, 1000)) for _ in range(nmax)] nums_d10 = [F(randint(-10, 10), randint(1, 10)) for _ in range(nmax)] nums_float = [F(*gauss(0, 1).as_integer_ratio()) for _ in range(nmax)] nums_e = [F(1)] for n in range(1, 1000): nums_e.append(nums_e[-1] / n) nums_bigint = [10**65 + randint(1, 100) for n in range(nmax)] nums_all = { 'd1000': nums_d1000, 'd10': nums_d10, 'float': nums_float, 'e': nums_e, 'bigint': nums_bigint, } if True: print('\nRunning benchmarks (times in microseconds)...') import timeit def mytimeit(stmt, setup): n = 10 time = lambda : timeit.timeit(stmt, setup, number=n) / n t = time() if t * n < 1e-1: while t * n < 1e-2: n *= 10 if n == 10: ts = [t, time(), time()] else: ts = [time(), time(), time()] t = min(ts) return 1e6 * t def fmtnum(n): s = str(int(n)) if len(s) > 5: s = '%1.2g' % n if s[-4:-1] == 'e+0': s = s[:-2] + s[-1] return '%-6s' % s gapline = '-' * 60 fnames = [f.__name__ for f in sumfuncs] header = ' n | ' + ' | '.join('%-6s' % name[:6] for name in fnames) sum = sum setup = 'from __main__ import ' + ', '.join(['nums'] + fnames) setup += '; nums=nums[:%s]' for name, nums in nums_all.items(): print('\n\n--------------\n%s\n---------------' % name) print(header) print(gapline) for n in 10, 100, 1000, 10000, 100000: if n > len(nums): break times = [] for func in sumfuncs: stmt = '%s(nums)' % (func.__name__,) times.append(mytimeit(stmt, setup % n)) print(('%6s | ' % n), end='') print(' | '.join(fmtnum(t) for t in times)) Oscar
Oscar's improvement to my binsum() idea, using Stefan's excellent point
that we should just use int.__add__ on the numerators, is quite elegant.
I.e.:
def binsum(iterable):
bins = defaultdict(int)
for num in iterable:
bins[num.denominator] += num.numerator
return mergesum([F(n, d) for d, n in bins.items()])
Moreover, looking at Oscar's data, it looks like the improved binsum()
ALWAYS beats rationalsum()[*]
[*] OK, technically there are three cases where this isn't true:
d1000/n=100 and d10/n=1000 where it is about one percent slower (although I
think that is in the noise of timing it); and the "pathological" case of
calculating 'e', where no denominators repeat and where rationalsum() beats
everything else by a large margin at the asymptote (or maybe imergesum()
does, but it is because they behave the same here).
So two thoughts:
(1) Use the much simpler binsum(), which *does* accept an iterator, and in
non-pathological cases will also use moderate memory (i.e. N numbers will
fall into approximately log(N) different denominator classes).
(2) Call this specialized sum something like Fraction.sum(). While the
other functions work on, for example, bigint, the builtin sum() is vastly
better there. I presume a similar result would be true when concrete
optimizations of "Decimal.sum()" are discussed.
The generic case in Python allows for sum()'ing heterogeneous collections
of numbers. Putting specialized dunder methods into e.g. Fraction.__sum__
is still going to fail in the general case (i.e. fall back to
__builtins__.sum()). The special case where you KNOW you are summing a
homogenous collection of a certain style of number is special enough that
spelling it Fraction.sum() or math.fsum() or Decimal.sum() is more explicit
and no harder.
I guess on a small point of symmetry, if the spellings above are chosen,
I'd like it also to be true that:
math.fsum == float.sum
On Tue, Aug 20, 2013 at 9:39 AM, Oscar Benjamin
On 19 August 2013 10:35, Stefan Behnel
wrote: David Mertz, 16.08.2013 22:06:
My intuition was that one might do better than mergesum() by binning numbers. I.e. this implementation:
def binsum(seq): bins = dict() for f in seq: bins[f.denominator] = bins.get(f.denominator, 0) + f return mergesum(list(bins.values()))
Good thinking David. Actually for some distributions of inputs this massively outperforms the other implementations. One common use case I have for Fractions is to check the accuracy of floating point computation by repeating the computation with Fractions. In this case I would be working with Fractions whose denominators are always powers of 2 (and usually only a handful of different values). And as Stefan says if we're binning on the denominator then we can make it really fast by adding the numerators with int.__add__.
Simply sorting by denominator gives me a visible advantage over the above:
def sortsum(seq): def key(f): return f.denominator if isinstance(f, F) else 1 seq = sorted(seq, key=key) if len(seq) < 3: return sum(seq) return mergesum(seq)
Interesting.
I've developed an iterator version of mergesum:
def imergesum(iterable): stack = [] it = iter(iterable) nextpow2 = 1 while True: for n2, val in enumerate(it, 1): if n2 == nextpow2: stack.append(val) nextpow2 *= 2 break elif n2 % 2: stack[-1] += val else: ishift = -1 while not n2 % 2: val, stack[ishift] = stack[ishift], val ishift -= 1 n2 >>= 1 stack[ishift] += val else: return mergesum(stack) # Could just use sum here
It uses log(N) storage so you'd run out of space on an iterator of say 2 ** sys.maxint Fractions but that's unlikely to bother anyone.
I combined the different approaches to make a rationalsum() function that is scalable and tries to take advantage of the binsum() and sortsum() improvements where possible:
def rationalsum(iterable, tablesize=1000): def binsumgen(iterable): iterator = iter(iterable) bins = defaultdict(int) finished = False denom = itemgetter(0) while not finished: finished = True for n, num in zip(range(tablesize), iterator): bins[num.denominator] += num.numerator finished = False if len(bins) >= tablesize or finished: dn = sorted(bins.items(), key=denom) yield mergesum([F(n, d) for d, n in dn]) bins.clear() return imergesum(binsumgen(iterable))
I tested this on a suite of different rational sequences (full script at the bottom): 1) float - Fractions made from floats with power of 2 denominators. 2) d1000 - the original numbers with denominators uniform in [1, 1000]. 3) d10 - like above but [1, 10]. 4) bigint - big ints rather than Fractions (just for comparison). 5) e - the sequence 1 + 1 + 1/2 + 1/3! + 1/4! + ... (this is slow).
In different situations the binsum(), sortedsum() and mergesum() functions parform differently (in some cases wildly so). The rationalsum() function is never the best but always has the same order of magnitude as the best. Apart from the bigint case builtins.sum() is always among the worst performing.
Here's the benchmark output (Python 3.3 on Windows XP 32-bit - the script takes a while):
Running benchmarks (times in microseconds)...
-------------- float --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 282 | 269 | 298 | 238 | 284 | 249 100 | 2804 | 2717 | 3045 | 766 | 2689 | 780 1000 | 29597 | 27468 | 31275 | 2093 | 27151 | 2224 10000 | 2.9e+5 | 2.7e+5 | 3.1e+5 | 12908 | 2.7e+5 | 13122 100000 | 3.1e+6 | 2.8e+6 | 3.1e+6 | 1.2e+5 | 2.8e+6 | 1.2e+5
-------------- d1000 --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 214 | 190 | 203 | 318 | 206 | 343 100 | 5879 | 2233 | 2578 | 3178 | 2338 | 3197 1000 | 2.6e+5 | 24866 | 29015 | 21929 | 23344 | 21901 10000 | 6.1e+6 | 2.7e+5 | 3e+5 | 48111 | 1.8e+5 | 48814 100000 | 6.6e+7 | 2.7e+6 | 3e+6 | 1.5e+5 | 1.8e+6 | 3.5e+5
-------------- bigint --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 1 | 14 | 19 | 19 | 19 | 36 100 | 15 | 54 | 140 | 59 | 72 | 81 1000 | 158 | 329 | 1294 | 461 | 499 | 550 10000 | 1655 | 2957 | 13102 | 4477 | 4497 | 5242 100000 | 16539 | 35929 | 1.3e+5 | 44905 | 55101 | 52732
-------------- e --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 155 | 156 | 164 | 262 | 166 | 274 100 | 9464 | 2777 | 3190 | 9282 | 2893 | 4079 1000 | 1.1e+7 | 8.7e+5 | 9.4e+5 | 1.1e+7 | 8.8e+5 | 9e+5
-------------- d10 --------------- n | sum | merges | imerge | binsum | sortsu | ration ------------------------------------------------------------ 10 | 142 | 144 | 156 | 139 | 157 | 153 100 | 1456 | 1470 | 1822 | 343 | 1541 | 347 1000 | 15080 | 14978 | 16955 | 1278 | 15557 | 1241 10000 | 1.5e+5 | 1.5e+5 | 1.6e+5 | 9845 | 1.5e+5 | 10343 100000 | 1.5e+6 | 1.5e+6 | 1.7e+6 | 96015 | 1.5e+6 | 99856
And here's the script:
# tmpsum.py
from __future__ import print_function
def mergesum(seq): while len(seq) > 1: new = [a + b for a, b in zip(seq[:-1:2], seq[1::2])] if len(seq) % 2: new.append(seq[-1]) seq = new return seq[0]
def imergesum(iterable): stack = [] it = iter(iterable) nextpow2 = 1 while True: for n2, val in enumerate(it, 1): if n2 == nextpow2: stack.append(val) nextpow2 *= 2 break elif n2 % 2: stack[-1] += val else: ishift = -1 while not n2 % 2: val, stack[ishift] = stack[ishift], val ishift -= 1 n2 >>= 1 stack[ishift] += val else: return mergesum(stack)
from collections import defaultdict
def binsum(iterable): bins = defaultdict(int) for num in iterable: bins[num.denominator] += num.numerator return mergesum([F(n, d) for d, n in bins.items()])
from operator import attrgetter
def sortsum(seq): seq = sorted(seq, key=attrgetter('denominator')) if len(seq) < 3: return sum(seq) return mergesum(seq)
from operator import itemgetter
def rationalsum(iterable, tablesize=1000): def binsumgen(iterable): iterator = iter(iterable) bins = defaultdict(int) finished = False denom = itemgetter(0) while not finished: finished = True for n, num in zip(range(tablesize), iterator): bins[num.denominator] += num.numerator finished = False if len(bins) >= tablesize or finished: dn = sorted(bins.items(), key=denom) yield mergesum([F(n, d) for d, n in dn]) bins.clear() return imergesum(binsumgen(iterable))
sumfuncs = sum, mergesum, imergesum, binsum, sortsum, rationalsum
# Just a quick test if True: print('testing', end=' ') from fractions import Fraction as F nums = [F(n, 2*n +1) for n in range(3000)] for n in 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 100, 2005: ns = nums[:n] result = sum(ns) for func in sumfuncs: assert func(ns) == result, func.__name__ print('.', end='') print(' passed!')
if True: print('generating data...') from random import randint, gauss, seed from fractions import Fraction as F seed(123456789) nmax = 10 ** 5 nums_d1000 = [F(randint(-1000, 1000), randint(1, 1000)) for _ in range(nmax)] nums_d10 = [F(randint(-10, 10), randint(1, 10)) for _ in range(nmax)] nums_float = [F(*gauss(0, 1).as_integer_ratio()) for _ in range(nmax)] nums_e = [F(1)] for n in range(1, 1000): nums_e.append(nums_e[-1] / n) nums_bigint = [10**65 + randint(1, 100) for n in range(nmax)] nums_all = { 'd1000': nums_d1000, 'd10': nums_d10, 'float': nums_float, 'e': nums_e, 'bigint': nums_bigint, }
if True: print('\nRunning benchmarks (times in microseconds)...')
import timeit
def mytimeit(stmt, setup): n = 10 time = lambda : timeit.timeit(stmt, setup, number=n) / n t = time() if t * n < 1e-1: while t * n < 1e-2: n *= 10 if n == 10: ts = [t, time(), time()] else: ts = [time(), time(), time()] t = min(ts) return 1e6 * t
def fmtnum(n): s = str(int(n)) if len(s) > 5: s = '%1.2g' % n if s[-4:-1] == 'e+0': s = s[:-2] + s[-1] return '%-6s' % s
gapline = '-' * 60 fnames = [f.__name__ for f in sumfuncs] header = ' n | ' + ' | '.join('%-6s' % name[:6] for name in fnames) sum = sum setup = 'from __main__ import ' + ', '.join(['nums'] + fnames) setup += '; nums=nums[:%s]'
for name, nums in nums_all.items(): print('\n\n--------------\n%s\n---------------' % name) print(header) print(gapline) for n in 10, 100, 1000, 10000, 100000: if n > len(nums): break times = [] for func in sumfuncs: stmt = '%s(nums)' % (func.__name__,) times.append(mytimeit(stmt, setup % n)) print(('%6s | ' % n), end='') print(' | '.join(fmtnum(t) for t in times))
Oscar _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On 20 August 2013 19:14, David Mertz
Oscar's improvement to my binsum() idea, using Stefan's excellent point that we should just use int.__add__ on the numerators, is quite elegant. I.e.:
def binsum(iterable): bins = defaultdict(int) for num in iterable: bins[num.denominator] += num.numerator return mergesum([F(n, d) for d, n in bins.items()])
Moreover, looking at Oscar's data, it looks like the improved binsum() ALWAYS beats rationalsum()[*]
[*] OK, technically there are three cases where this isn't true: d1000/n=100 and d10/n=1000 where it is about one percent slower (although I think that is in the noise of timing it); and the "pathological" case of calculating 'e', where no denominators repeat and where rationalsum() beats everything else by a large margin at the asymptote (or maybe imergesum() does, but it is because they behave the same here).
There's nothing pathological about at that case. It's very common to have series where the denominators all differ. What you said got me thinking, though, about why it underperformed in that case and I realised that it's because binsum reorders the numbers before passing them to mergesum and therefore cannot take advantage of the natural ordering in the series. So with all our powers combined I created binsortmergesum (bsms) and it looks like this: def bsms(iterable): bins = defaultdict(int) for num in iterable: bins[num.denominator] += num.numerator return mergesum([F(n, d) for d, n in sorted(bins.items())]) The new timing results show that bsms is slightly poorer than binsum in the cases where it does well but does much better in the case where it did badly. So here's the new timing results: -------------- bigint --------------- n | sum | merges | imerge | binsum | sortsu | ration | bsms ------------------------------------------------------------ 10 | 2 | 20 | 27 | 28 | 28 | 50 | 31 100 | 21 | 73 | 208 | 93 | 102 | 126 | 96 1000 | 212 | 506 | 2162 | 749 | 762 | 943 | 746 10000 | 2283 | 5644 | 21975 | 7407 | 8503 | 9204 | 7463 100000 | 23940 | 63465 | 2.3e+5 | 75810 | 93666 | 91938 | 74516 -------------- float --------------- n | sum | merges | imerge | binsum | sortsu | ration | bsms ------------------------------------------------------------ 10 | 435 | 417 | 430 | 377 | 426 | 393 | 373 100 | 4437 | 4298 | 4549 | 1220 | 4351 | 1252 | 1231 1000 | 45889 | 43373 | 45424 | 3520 | 44004 | 3743 | 3578 10000 | 4.6e+5 | 4.7e+5 | 4.5e+5 | 23759 | 4.4e+5 | 25135 | 23601 100000 | 4.8e+6 | 4.4e+6 | 4.5e+6 | 2.2e+5 | 4.4e+6 | 2.3e+5 | 2.2e+5 -------------- d1000 --------------- n | sum | merges | imerge | binsum | sortsu | ration | bsms ------------------------------------------------------------ 10 | 339 | 296 | 304 | 497 | 322 | 528 | 505 100 | 9747 | 3554 | 3872 | 5065 | 3655 | 5197 | 5155 1000 | 4.2e+5 | 39259 | 42661 | 35199 | 37243 | 36470 | 35317 10000 | 9.3e+6 | 4.1e+5 | 4.3e+5 | 80244 | 2.9e+5 | 81798 | 80658 100000 | 1e+8 | 4.2e+6 | 4.4e+6 | 2.6e+5 | 2.7e+6 | 6e+5 | 2.6e+5 -------------- d10 --------------- n | sum | merges | imerge | binsum | sortsu | ration | bsms ------------------------------------------------------------ 10 | 214 | 219 | 226 | 204 | 236 | 232 | 206 100 | 2286 | 2245 | 2404 | 514 | 2353 | 563 | 519 1000 | 23485 | 22776 | 24744 | 2130 | 23284 | 2347 | 2119 10000 | 2.4e+5 | 2.3e+5 | 2.5e+5 | 18265 | 2.3e+5 | 20109 | 18098 100000 | 2.4e+6 | 2.3e+6 | 2.5e+6 | 1.8e+5 | 2.3e+6 | 2e+5 | 1.8e+5 -------------- e --------------- n | sum | merges | imerge | binsum | sortsu | ration | bsms ------------------------------------------------------------ 10 | 239 | 230 | 239 | 387 | 247 | 413 | 386 100 | 15221 | 4330 | 4794 | 14786 | 4483 | 6431 | 6361 1000 | 1.6e+7 | 1.4e+6 | 1.5e+6 | 1.5e+7 | 1.5e+6 | 1.6e+6 | 1.8e+6 Oscar
On Tue, Aug 20, 2013 at 3:26 PM, Oscar Benjamin
So with all our powers combined I created binsortmergesum (bsms) and it looks like this:
def bsms(iterable): bins = defaultdict(int) for num in iterable: bins[num.denominator] += num.numerator return mergesum([F(n, d) for d, n in sorted(bins.items())])
Wow. I noticed that you didn't do a sort in your binsum(), but I didn't think it would make much difference. Actually, I thought the cost of the sort wouldn't be worth it at all. It's actually a little unclear to me exactly why it makes such a big difference as it does in the 'e' case. Still, I love that function. It's simple, elegant, and fast (when used on Fraction, of course; not everywhere). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On 21 August 2013 01:16, David Mertz
On Tue, Aug 20, 2013 at 3:26 PM, Oscar Benjamin
wrote: So with all our powers combined I created binsortmergesum (bsms) and it looks like this:
def bsms(iterable): bins = defaultdict(int) for num in iterable: bins[num.denominator] += num.numerator return mergesum([F(n, d) for d, n in sorted(bins.items())])
Wow. I noticed that you didn't do a sort in your binsum(), but I didn't think it would make much difference.
And now I see that Stefan already proposed exactly this function as binsum2. Sorry Stefan, I missed that from your initial post!
Actually, I thought the cost of the sort wouldn't be worth it at all. It's actually a little unclear to me exactly why it makes such a big difference as it does in the 'e' case.
It's because of the natural ordering in the series. Summing in a random order means that you mix up giant denominators with tiny ones for pretty much every addition step. This means massive gcd computations for every addition. Mergesum sums the small denominators with small ones avoiding the extra large denominators. This is particularly important for this series since the denominators are growing super-exponentially. Oscar
Oscar Benjamin wrote:
The discussion around having a sum() function in the statistics module reminded me that despite the fact that there are two functions in the stdlib (and more in the third-party libraries I use) I have previously rolled my own sum() function. The reason is that the stdlib does not currently contain a function that can sum Fractions in linear time for many inputs even though it is possible to implement a function that achieves closer to linear performance in more cases. I propose that there could be a sum() function or class-method in the fractions module for this purpose. I would just raise this on the tracker but seeing how many feathers were ruffled by statistics.sum I thought I'd suggest it here first.
If that takes on, and the number of sum implementations grows, maybe there should be a __sum__() special (class) method, and the sum built-in be changed roughly to def sum(items, start=0): try: specialized_sum = start.__sum__ except AttributeError: return ... # current behaviour return specialized_sum(items, start) sum(items, 0.0) would then automatically profit from the clever optimizations of math.fsum() etc.
On 19 August 2013 11:09, Peter Otten <__peter__@web.de> wrote:
If that takes on, and the number of sum implementations grows, maybe there should be a __sum__() special (class) method, and the sum built-in be changed roughly to
def sum(items, start=0): try: specialized_sum = start.__sum__ except AttributeError: return ... # current behaviour return specialized_sum(items, start)
sum(items, 0.0) would then automatically profit from the clever optimizations of math.fsum() etc.
Two points: 1. Specialising based on the type of the start parameter probably isn't ideal - what you *really* want is to specialise on the type of elements in the list (which is problematic, as lists can contain objects of differing types, so you have to consider those cases - maybe dispatch based on the first element of the list, who knows?) 2. If you do specialise based on start, this can easily be implemented using the new single-dispatch generic functions (you'd have to make start the first argument, but if you're dispatching on it, you'd likely need it to be mandatory anyway so that's not such a big deal). I'm not sure this is a good idea in any case, though - why is sum(items, 0.0) (with a "magic" start parameter which is a float) better than an explicit fsum(items) (where the function name says it's a float sum)? Paul
Paul Moore wrote:
On 19 August 2013 11:09, Peter Otten <__peter__@web.de> wrote:
If that takes on, and the number of sum implementations grows, maybe there should be a __sum__() special (class) method, and the sum built-in be changed roughly to
def sum(items, start=0): try: specialized_sum = start.__sum__ except AttributeError: return ... # current behaviour return specialized_sum(items, start)
sum(items, 0.0) would then automatically profit from the clever optimizations of math.fsum() etc.
Two points:
1. Specialising based on the type of the start parameter probably isn't ideal - what you *really* want is to specialise on the type of elements in the list
You'd need another language for that -- or pypyesque magic ;)
(which is problematic, as lists can contain objects of differing types, so you have to consider those cases - maybe dispatch based on the first element of the list, who knows?) 2. If you do specialise based on start, this can easily be implemented using the new single-dispatch generic functions (you'd have to make start the first argument, but if you're dispatching on it, you'd likely need it to be mandatory anyway so that's not such a big deal).
I'm not sure this is a good idea in any case, though - why is sum(items, 0.0) (with a "magic" start parameter which is a float) better than an explicit fsum(items) (where the function name says it's a float sum)?
If there are multiple sum functions, typically one per type of summand, how can you make them easily discoverable? I doubt that fsum is used in many places where it would be appropriate. If you make the optimizations available via the built-in sum() you can add a sentence like "If you provide an explicit start value sum() may pick an algorithm optimized for that type." # needs work to its documentation and be done.
On 19 August 2013 14:58, Peter Otten <__peter__@web.de> wrote:
You'd need another language for that -- or pypyesque magic ;)
Yes, that was sort of my point :-)
If there are multiple sum functions, typically one per type of summand, how can you make them easily discoverable? I doubt that fsum is used in many places where it would be appropriate. If you make the optimizations available via the built-in sum() you can add a sentence like
"If you provide an explicit start value sum() may pick an algorithm optimized for that type." # needs work
to its documentation and be done.
having l = <a list of floats> and then res = sum(l) res = sum(l, 0.0) res = sum(l, 0) res = sum(l, Decimal('0.0')) res = sum(l, Fraction(0, 1)) all do subtly (or maybe not so subtly!) different things, seems to me to be a recipe for confusion, if not disaster. Certainly sum/math.fsum/etc. are not particularly discoverable, but that can *also* be fixed with documentation. In the documentation of sum(), add a paragraph: """If your values are all of one particular type, specialised functions for that type may be available - these may be more numerically stable, or faster, or otherwise more appropriate. The standard library includes math.fsum for floats, X.sum for X, ...""" That's just as good if you're starting from "I want to sum some values". And it's a lot *more* straightforward to find/google the documentation of math.fsum from code that calls it, than to know where you'd find details of the specialised algorithm used for sum(l, 0.0) if that was all you had to go on... Paul
On Aug 19, 2013 6:09 AM, "Peter Otten" <__peter__@web.de> wrote:
If that takes on, and the number of sum implementations grows, maybe there should be a __sum__() special (class) method, and the sum built-in be changed roughly to
def sum(items, start=0): try: specialized_sum = start.__sum__ except AttributeError: return ... # current behaviour return specialized_sum(items, start)
sum(items, 0.0) would then automatically profit from the clever optimizations of math.fsum() etc.
Another possibility (and I'm not suggesting that it's better than the proposed solution above) is to make a module just for the numerous sum implementations people need.
Clay Sweetser, 19.08.2013 15:47:
On Aug 19, 2013 6:09 AM, "Peter Otten" wrote:
If that takes on, and the number of sum implementations grows, maybe there should be a __sum__() special (class) method, and the sum built-in be changed roughly to
def sum(items, start=0): try: specialized_sum = start.__sum__ except AttributeError: return ... # current behaviour return specialized_sum(items, start)
sum(items, 0.0) would then automatically profit from the clever optimizations of math.fsum() etc.
Another possibility (and I'm not suggesting that it's better than the proposed solution above) is to make a module just for the numerous sum implementations people need.
-1 For summing up fractions, the fractions module is The One Obvious Place to look. For Decimal arithmetic, my first look would always go to the decimal module. Given that the math module deals with floating point arithmetic, it's The One Obvious Place to look for summing up floats. I agree with Paul Moore that the rest can be handled by documentation. Stefan
On 19 August 2013 11:09, Peter Otten <__peter__@web.de> wrote:
Oscar Benjamin wrote:
If that takes on, and the number of sum implementations grows, maybe there should be a __sum__() special (class) method, and the sum built-in be changed roughly to
def sum(items, start=0): try: specialized_sum = start.__sum__ except AttributeError: return ... # current behaviour return specialized_sum(items, start)
sum(items, 0.0) would then automatically profit from the clever optimizations of math.fsum() etc.
fsum() is about controlling accumulated rounding errors rather than optimisation (although it may be faster I've never needed to check). I'd rather write sum(items, Fraction) than sum(items, Fraction(0)) and either way it's so close to Fraction.sum(items) I understand what you mean about having a single function that does a good job for all types and it goes into a much broader set of issues around how Python handles different numeric types. It would be possible in a backward compatible way provided Fraction.__sum__ falls back on sum when it finds a non-Rational. There are lots of other areas in the stdlib where a similar sort of thinking could apply e.g.:
import math from decimal import Decimal as D from fractions import Fraction as F
There's currently no fsum equivalent for Decimals even though a pure Python implementation could have reasonable performance:
math.fsum([0.1, 0.2, 0.3]) == 0.6 True math.fsum([D('0.1'), D('0.2'), D('0.3')]) == D('0.6') False D(math.fsum([D('0.1'), D('0.2'), D('0.3')])) == D('0.6') False
sum() would do better in the above but then it fails in other situations:
sum([D('1e50'), D('1'), D('-1e50')]) == 1 False math.fsum([D('1e50'), D('1'), D('-1e50')]) == 1 True
The math module rounds everything to float losing precision even if better routines are available:
math.sqrt(D('0.02')) 0.1414213562373095 D('0.02').sqrt() Decimal('0.1414213562373095048801688724') math.exp(D(1)) 2.718281828459045 D(1).exp() Decimal('2.718281828459045235360287471')
There's also no support for computing the other transcendental functions with Decimals e.g. sin, cos etc. without rounding to float. The math module also fails to find exact results when possible:
a = 10 ** 20 + 1 a 100000000000000000001 math.sqrt(a**2) == a False b = F(2, 3) math.sqrt(b ** 2) == b False
These ones could just get fixed in the Fractions module:
F(4, 9) ** F(1, 2) 0.6666666666666666 F(2, 3) ** 2 Fraction(4, 9) a 100000000000000000001 (a ** 2) ** F(1, 2) == a False
Do you think that any of these things should also be changed so that there can be e.g. one sqrt() function that does the right thing for all types? One way to unify these things is with a load of new dunder methods so that each type can implement its own response to the standard function. Another is to have special modules that deal with different things and e.g. a function for summing Rationals and another for summing inexact types and so on. Another possibility is that you have decimal functions in the decimal module and fraction functions in the fractions module and so on and don't use any duck-typing (except in coercion). That may seem strange for Python but it's worth remembering that most of the best numerical code that has ever been written was written in languages that *require* you to write separate code for each numeric type and don't give any syntactic support for working with non-native types. Oscar
On Aug 20, 2013, at 10:30, Oscar Benjamin
Do you think that any of these things should also be changed so that there can be e.g. one sqrt() function that does the right thing for all types?
So instead of math.sqrt and cmath.sqrt, we just have one function that decides whether sqrt(-1) is 0+1j or an exception based on guessing whether you wanted complex numbers? :) I think in this case "explicit is better" beats "simple is better". I think having multiple sqrt functions--and, yes, maybe multiple sum functions--and making the user ask for the right one is appropriate.
On Tue, Aug 20, 2013, at 18:47, Andrew Barnert wrote:
So instead of math.sqrt and cmath.sqrt, we just have one function that decides whether sqrt(-1) is 0+1j or an exception based on guessing whether you wanted complex numbers? :)
Why exactly is an exception reasonable? If you don't want complex numbers, don't take square roots of negative numbers. If you can't handle complex numbers, you'll get an exception down the line anyway.
random832@fastmail.us writes:
Why exactly is an exception reasonable? If you don't want complex numbers, don't take square roots of negative numbers. If you can't handle complex numbers, you'll get an exception down the line anyway.
That's precisely why you want an exception: to terminate the computation as soon as the unexpected condition can be detected.
On 8/21/2013 10:45 AM, Stephen J. Turnbull wrote:
random832@fastmail.us writes:
Why exactly is an exception reasonable? If you don't want complex numbers, don't take square roots of negative numbers. If you can't handle complex numbers, you'll get an exception down the line anyway.
That's precisely why you want an exception: to terminate the computation as soon as the unexpected condition can be detected.
Exactly. Here's an extreme case: Say I do an operation, unexpectedly get a negative number, take the square root, pickle it, and store it in a file. 6 months from now I read the pickle and perform some other operation, and boom, I get an exception. I'd much prefer getting the exception today than at a later date. -- Eric.
On Wed, Aug 21, 2013, at 11:22, Eric V. Smith wrote:
Exactly. Here's an extreme case: Say I do an operation, unexpectedly get a negative number,
Why can that operation return a negative number instead of raising an exception in the situation where it would return a negative number? I mean, if you want to get your exceptions as early as possible, however unpythonic that may be, that's the next logical step. Why do we need an implicit* wall between real and complex numbers, but not between positive and negative numbers, or integers and floats? *as in "explicit is better than", as in check your results yourself.
On 21 August 2013 17:21,
On Wed, Aug 21, 2013, at 11:22, Eric V. Smith wrote:
Exactly. Here's an extreme case: Say I do an operation, unexpectedly get a negative number,
Why can that operation return a negative number instead of raising an exception in the situation where it would return a negative number? I mean, if you want to get your exceptions as early as possible, however unpythonic that may be, that's the next logical step. Why do we need an implicit* wall between real and complex numbers, but not between positive and negative numbers, or integers and floats?
Because people want to use positive and negative integers and floats much more often than they want to use complex numbers.
*as in "explicit is better than", as in check your results yourself.
I would say that using cmath.sqrt() is explicitly stating that you're interested in using complex numbers. Oscar
On Wed, Aug 21, 2013, at 10:45, Stephen J. Turnbull wrote:
random832@fastmail.us writes:
Why exactly is an exception reasonable? If you don't want complex numbers, don't take square roots of negative numbers. If you can't handle complex numbers, you'll get an exception down the line anyway.
That's precisely why you want an exception: to terminate the computation as soon as the unexpected condition can be detected.
Isn't that unpythonic? I mean, it's like doing type checking to make
sure the object you're passed doesn't implement only half of the duck
type you want (or none of a duck type the consumer of a list you're
adding it to wants, etc), and we just had a discussion about that (might
have been on python-list).
Also, not wanting complex numbers seems to me like not wanting negative
numbers, but we don't have a positive-subtract function that raises an
exception if a
On Aug 21, 2013, at 9:19, random832@fastmail.us wrote:
On Wed, Aug 21, 2013, at 10:45, Stephen J. Turnbull wrote:
random832@fastmail.us writes:
Why exactly is an exception reasonable? If you don't want complex numbers, don't take square roots of negative numbers. If you can't handle complex numbers, you'll get an exception down the line anyway.
That's precisely why you want an exception: to terminate the computation as soon as the unexpected condition can be detected.
Isn't that unpythonic? I mean, it's like doing type checking
No it isn't like type checking. That's the whole point of EAFP--and really, the whole point of exceptions. You write the natural code you would write without checking any preconditions, and then you wrap it in a try/except to deal with any exceptions that arise when the preconditions have turned out to be invalid. You could make a parallel argument about dict.__getitem__, which I think is more obviously wrong. Why should it return an exception instead of returning some "undefined" value like JavaScript? For that matter, why should any function raise an exception; even open could be changed to return a file object that can be checked for validity instead of raising on file not found, like C returning a -1 fd (or NULL FILE*). There's no simple rule that says which things should be pre-checked, which should be handled with exceptions, and which should allow you to propagate errors as far as possible with out-of-range values. That's why language design is an art, with tradeoffs requiring judgment calls.
Also, not wanting complex numbers seems to me like not wanting negative numbers, but we don't have a positive-subtract function that raises an exception if a
Sure, there's a judgment call there. There's a cost in having two functions (a more complex language), but a cost in not doing so (not being able to express positive-only computations as easily). But the fact that a judgment call is required doesn't mean a language designer throws up his hands and says "I guess I'll just have one number type to avoid all these problems," it means that he uses his judgment for each case. It turns out that people very often want to do real-only computation--much more often than they want to do positive-only--so the parallel question can have a different answer in the two cases. (And of course integer-only computation falls somewhere between those two extremes.)
random832@fastmail.us writes:
On Wed, Aug 21, 2013, at 10:45, Stephen J. Turnbull wrote:
random832@fastmail.us writes:
Why exactly is an exception reasonable? If you don't want complex numbers, don't take square roots of negative numbers. If you can't handle complex numbers, you'll get an exception down the line anyway.
That's precisely why you want an exception: to terminate the computation as soon as the unexpected condition can be detected.
Isn't that unpythonic? I mean, it's like doing type checking
No, it's not, and it's not. In Python, turning 1 into 1 + 0j is a type conversion[1]:
1 is 1 + 0j False 1 is 1 True
Adding 1 + 0j to 1 requires type checking of the arguments, and a decision about the type of the result. The same is true here; behavior on taking square root is part of the *implementation* of a type, and a negativity check needs to be made before taking the square root. This is *not* the same thing as EAFP vs. LBYL in application programming. Rather, this is the low-level implementation that makes EAFP safe in Python.
Also, not wanting complex numbers seems to me like not wanting negative numbers, but we don't have a positive-subtract function that raises an exception if a
That's a good analogy, and it would be a killer analogy if this were FORTRAN-dev. But it's not good enough on python-dev. On the pragmatic side, I've been using math, including some pretty advanced analysis, in my research for 30 years ... but I've yet to meet an honest complex number in the wild, only in math textbooks and in programs that use the "computational formula" to compute standard deviation. I doubt that the applications where it is natural to expect a complex number as the value of square root come anywhere near a majority. On the theory side, in fact Python doesn't have an unsigned numerical type. So there is no type change when subtracting a larger number from a smaller number. However, to take the square root of a negative float you need a type change to complex. So your analogy actually is nonsense in the context of Python: they're not alike at all! Note that *none* of the above shows that an exception is TOOWTDI. It just shows that it's plausible for Python to raise an exception when a program tries to take the square root of a negative number. What *would* be unpythonic is if the implementation of the float type tried to guess whether the user wants an exception or a complex number in this case. Steve Footnotes: [1] Connoisseurs of the CPython implementation will recognize that this example is perhaps a bit specious, since the implementation of 1 as a singleton is an optimization, not part of the defined semantics. But I think it serves to make my point.
On 22/08/13 03:32, Stephen J. Turnbull wrote:
random832@fastmail.us writes:
Isn't that unpythonic? I mean, it's like doing type checking
No, it's not, and it's not. In Python, turning 1 into 1 + 0j is a type conversion[1]:
1 is 1 + 0j False 1 is 1 True
Steve, not only is that a dubious, implementation-dependent test, as you recognise in the footnote, but it doesn't even demonstrate the fact you are trying to demonstrate. To whit: py> [] is [] False and therefore [] and [] are different types... not. The test I think you want is: py> type(1) is type(1+0j) False As far as whether some hypothetical sqrt method should return a complex number or raise an exception, I'd like to point out that as of Python 3.x, math.sqrt is now the odd-man-out: py> (-25)**0.5 (3.061515884555943e-16+5j) py> cmath.sqrt(-25) 5j py> math.sqrt(-25) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: math domain error -- Steven
On Thu, Aug 22, 2013 at 7:18 AM, Steven D'Aprano
As far as whether some hypothetical sqrt method should return a complex number or raise an exception, I'd like to point out that as of Python 3.x, math.sqrt is now the odd-man-out:
If you restrict your attention to square root operations, then yes. But math.sqrt is behaving just like math.log, math.acos, math.asin, math.atanh, etc. in raising an exception when it could have been returning a complex number. In that sense, I'd say that it's rather the ** operation that's the odd man out, in that there are very few other ways to get complex numbers with no obvious explicit request for complex numbers (either in the form of a cmath import or use of imaginary literals, or use of the 'complex' constructor, etc.). I've never been particularly comfortable with this aspect of PEP 3141. While I'm ranting: the other part of PEP 3141 that was wrong IMO is the decision to return ints from math.floor and math.ceil (e.g., for floats and Decimal instance). On floats, floor and ceil are simple, fundamental and inexpensive operations. But the conversion to int adds unnecessary overhead to those simple operations. iwasawa:~ mdickinson$ /opt/local/bin/python2.7 -m timeit -s "x = 1e300; import math" "y = math.floor(x)" 10000000 loops, best of 3: 0.107 usec per loop iwasawa:~ mdickinson$ /opt/local/bin/python3.3 -m timeit -s "x = 1e300; import math" "y = math.floor(x)" 1000000 loops, best of 3: 0.574 usec per loop </rant> -- Mark
On Thu, Aug 22, 2013 at 1:02 PM, Mark Dickinson
In that sense, I'd say that it's rather the ** operation that's the odd man out, in that there are very few other ways to get complex numbers with no obvious explicit request for complex numbers
1/2 == 0.5 # int/int --> float (-4.0)**0.5 == (-4.0)**0.5 == 1.2246467991473532e-16+2j # float**float --> complex (though why it doesn't equal 2j exactly, I don't know - surely there's enough precision in these floats to calculate that?) Personally, I don't like the automated casting of int to float, since int covers arbitrary range and float will quietly lose precision; if you flick to float early in a calculation, you may be suddenly surprised by the inaccuracy at the end. But that's the decision Python's made, so it makes good sense for the upcasting of float to complex to work the same way - especially since you can't lose precision by going from float to complex (AFAIK). ChrisA
On Aug 22, 2013 8:12 AM, "Chris Angelico"
On Thu, Aug 22, 2013 at 1:02 PM, Mark Dickinson
wrote: In that sense, I'd say that it's rather the ** operation that's the odd man out, in that there are very few other ways to get complex numbers with no obvious explicit request for complex numbers
1/2 == 0.5 # int/int --> float (-4.0)**0.5 == (-4.0)**0.5 == 1.2246467991473532e-16+2j # float**float --> complex
(though why it doesn't equal 2j exactly, I don't know - surely there's enough precision in these floats to calculate that?)
__pow__ is less accurate than sqrt: $ py -3.3 Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
4 ** .5 2.0 (-4) ** .5 (1.2246467991473532e-16+2j) 1j * (4 ** .5) 2j import cmath cmath.sqrt(-4) 2j cmath.sqrt(-4.) 2j
sqrt and __pow__ for negative numbers use very different algorithms so they give different results. IIUC (4.).__pow__(.5) is handled directly by the FPU and (-4.).__pow__(.5) is handled in the CPython code base. Clearly, though, it would be better if (-4)**.5 would be modified to return 1j*(4**.5).
Personally, I don't like the automated casting of int to float, since int covers arbitrary range and float will quietly lose precision; if you flick to float early in a calculation, you may be suddenly surprised by the inaccuracy at the end. But that's the decision Python's made, so it makes good sense for the upcasting of float to complex to work the same way - especially since you can't lose precision by going from float to complex (AFAIK)
If the int is out of range you'll get an error:
1.0 * (10 ** 1000) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: long int too large to convert to float
An implicit float conversion does not quietly lose precision. All subsequent values are usually infected and get turned into floats. The implicit float conversion only happens in two cases: 1) You mixed a float into your otherwise exact integer computation. 2) Division. For case 1) the answer is just don't do this. It's usually possible to spot when it's happened because your end result is a float. The exceptions to this are if you're doing something like: while m < n: # do stuff m = ... # Somehow a float can occasionally infect m return n # But we don't return m I think it would be good to be able to know that implicit conversions wouldn't occur but I also dislike putting spurious decimal points everywhere to ensure floating point computation e.g.: x = x / 2. # Unnecessary in Python 3 Note that for loss of accuracy the implicit float conversion from Python 3's true division gives a relative error of ~1e-16 which usually corresponds to a similarly small absolute error. Python 2's floor division has an absolute error of order 1 which is usually a massive error for people who care about accuracy. For me personally though I think that a conversion context would be useful e.g.: with everything_is_exact_or_I_get_an_error(): # compute stuff That would save me *a lot* of testing/debugging. Oscar
On Thu, Aug 22, 2013 at 8:58 PM, Oscar Benjamin
If the int is out of range you'll get an error:
1.0 * (10 ** 1000) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: long int too large to convert to float
An implicit float conversion does not quietly lose precision. All subsequent values are usually infected and get turned into floats. The implicit float conversion only happens in two cases: 1) You mixed a float into your otherwise exact integer computation. 2) Division.
(1<<64)*3//2+10 27670116110564327434 (1<<64)*3/2+10 2.7670116110564327e+19
It's not so large that it cannot be converted to floating point, but it's above the point at which floats are accurate to the integer. Therefore precision has been lost. Is it obvious from the second line of code that this will be the case? Obviously if you "mix in" a float, then it'll infect the calculations. But the effect of the / operator is less obvious. Fortunately it's consistent. It will ALWAYS return a float. However, I do see this as "implicit" conversion. Anyway, this is somewhat off-topic for this thread. ChrisA
On 22 August 2013 12:11, Chris Angelico
On Thu, Aug 22, 2013 at 8:58 PM, Oscar Benjamin
wrote: If the int is out of range you'll get an error:
1.0 * (10 ** 1000) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: long int too large to convert to float
An implicit float conversion does not quietly lose precision. All subsequent values are usually infected and get turned into floats. The implicit float conversion only happens in two cases: 1) You mixed a float into your otherwise exact integer computation. 2) Division.
(1<<64)*3//2+10 27670116110564327434 (1<<64)*3/2+10 2.7670116110564327e+19
It's not so large that it cannot be converted to floating point, but it's above the point at which floats are accurate to the integer. Therefore precision has been lost. Is it obvious from the second line of code that this will be the case? Obviously if you "mix in" a float, then it'll infect the calculations. But the effect of the / operator is less obvious. Fortunately it's consistent. It will ALWAYS return a float. However, I do see this as "implicit" conversion.
As I said you need to be careful around division. There's no right answer for integer division (for computers). I'd rather have an implicit conversion than an implicit massively incorrect answer. The result above has a relative error of ~1e-16. The result below has a relative error of order 1: $ py -2.7
3 / 2 1
If you use that in subsequent calculations your subsequent results could be *way* off. In many cases where you expected to compute an integer but end up with a float you'll subsequently get an error: $ py -3.3
a = [1, 2, 3, 4, 5] b = 3 a[b / 2] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: list indices must be integers, not float
If you just get an incorrect integer there's no way to know if it's exact or not without checking after every division e.g.: a = b / 2 assert 2 * a == b which is tedious. Oscar
From: Python-ideas [mailto:python-ideas-bounces+andy.henshaw=gtri.gatech.edu@python.org] On Behalf Of Chris Angelico Sent: Thursday, August 22, 2013 7:11 AM
(1<<64)*3//2+10 27670116110564327434 (1<<64)*3/2+10 2.7670116110564327e+19
It's not so large that it cannot be converted to floating point, but it's above the point at which floats are accurate to the integer. Therefore precision has been lost. Is it obvious from the second line of code that this will be the case? Obviously if you "mix in" a float, then it'll infect the calculations. But the effect of the / operator is less obvious. Fortunately it's consistent. It will ALWAYS return a float. However, I do see this as "implicit" conversion.
Anyway, this is somewhat off-topic for this thread.
One of the things that I always admired about the Occam programming language was that you had to explicitly state whether to ROUND or TRUNC when converting from integers to floats, for exactly this reason. A 32-bit integer potentially has more precision than a 32-bit float, so you had to tell the compiler how to handle the dropped bits.
Steven D'Aprano writes:
The test I think you want is:
py> type(1) is type(1+0j) False
Thank you for the correction. For some reason, I took 'random's emphasis on duck-typing more seriously than I should have.
As far as whether some hypothetical sqrt method should return a complex number or raise an exception, I'd like to point out that as of Python 3.x, math.sqrt is now the odd-man-out:
Sure, but from a pure mathematics standpoint, the transcendental functions are inherently complex functions. So it doesn't surprise *me* that 25 ** 0.5 returns a complex number. And since the reals can be naturally embedded in the complex domain, *I'm* not surprised that cmath.sqrt(25) returns a complex result, rather than raising a domain error. Speaking only for myself about "what's surprising". Nevertheless, clearly it makes mathematical sense to distinguish real numbers from complex numbers. Complex analysis goes back before Riemann, yet (smart) mathematicians have continued doing real analysis to this day. So having real and complex math separated in Python is not nonsense; it's a design choice, and it's not clear to me that "odd-man-out" indicates that the choice has really been made -- there are reasons why it might be so that are compatible with both ways of thinking about the issues.
On 21/08/2013 15:19, random832@fastmail.us wrote:
On Tue, Aug 20, 2013, at 18:47, Andrew Barnert wrote:
So instead of math.sqrt and cmath.sqrt, we just have one function that decides whether sqrt(-1) is 0+1j or an exception based on guessing whether you wanted complex numbers? :)
Why exactly is an exception reasonable? If you don't want complex numbers, don't take square roots of negative numbers. If you can't handle complex numbers, you'll get an exception down the line anyway.
I think a simpler rule might be: if the argument is a float then the result is a float; if the argument is complex then the result is complex. If that's the case, then do we really need to keep them separate, having math and cmath?
On 21 August 2013 18:06, MRAB
On 21/08/2013 15:19, random832@fastmail.us wrote:
On Tue, Aug 20, 2013, at 18:47, Andrew Barnert wrote:
So instead of math.sqrt and cmath.sqrt, we just have one function that decides whether sqrt(-1) is 0+1j or an exception based on guessing whether you wanted complex numbers? :)
Why exactly is an exception reasonable? If you don't want complex numbers, don't take square roots of negative numbers. If you can't handle complex numbers, you'll get an exception down the line anyway.
I think a simpler rule might be: if the argument is a float then the result is a float; if the argument is complex then the result is complex.
I like the fact that math.sqrt() raises an error for negative numbers. That error message has been far more useful to me than cmath.sqrt() ever has.
If that's the case, then do we really need to keep them separate, having math and cmath?
And what if the argument's an int? Does the int duck-type as a float or a complex? Or should it raise an error if the root is not an integer? I feel like I'd end up writing: math.sqrt(a + 0j) when I want to get complex roots. Definitely cmath.sqrt(a) is better. The problem I have with sqrt(int) is that it doesn't raise an error and just produces an inaccurate result. Fixing that for the cases where exact results are available could be a big performance regression for some people. Anyway I'd rather use an alternative sqrt_exact() that wasn't expected to be fast but was guaranteed to be accurate or to raise an error. Oscar
Oscar Benjamin wrote:
On 19 August 2013 11:09, Peter Otten <__peter__@web.de> wrote:
sum(items, 0.0) would then automatically profit from the clever optimizations of math.fsum() etc.
fsum() is about controlling accumulated rounding errors rather than optimisation (although it may be faster I've never needed to check).
Is I understand it, you can "optimize" for precision, memory usage, code simplicity -- not just speed.
I'd rather write sum(items, Fraction) than sum(items, Fraction(0)) and either way it's so close to Fraction.sum(items)
I understand what you mean about having a single function that does a good job for all types and it goes into a much broader set of issues around how Python handles different numeric types. It would be possible in a backward compatible way provided Fraction.__sum__ falls back on sum when it finds a non-Rational.
There are lots of other areas in the stdlib where a similar sort of thinking could apply e.g.:
import math from decimal import Decimal as D from fractions import Fraction as F
There's currently no fsum equivalent for Decimals even though a pure Python implementation could have reasonable performance:
math.fsum([0.1, 0.2, 0.3]) == 0.6 True math.fsum([D('0.1'), D('0.2'), D('0.3')]) == D('0.6') False D(math.fsum([D('0.1'), D('0.2'), D('0.3')])) == D('0.6') False
sum() would do better in the above but then it fails in other situations:
sum([D('1e50'), D('1'), D('-1e50')]) == 1 False math.fsum([D('1e50'), D('1'), D('-1e50')]) == 1 True
The math module rounds everything to float losing precision even if better routines are available:
math.sqrt(D('0.02')) 0.1414213562373095 D('0.02').sqrt() Decimal('0.1414213562373095048801688724') math.exp(D(1)) 2.718281828459045 D(1).exp() Decimal('2.718281828459045235360287471')
There's also no support for computing the other transcendental functions with Decimals e.g. sin, cos etc. without rounding to float.
The math module also fails to find exact results when possible:
a = 10 ** 20 + 1 a 100000000000000000001 math.sqrt(a**2) == a False b = F(2, 3) math.sqrt(b ** 2) == b False
These ones could just get fixed in the Fractions module:
F(4, 9) ** F(1, 2) 0.6666666666666666 F(2, 3) ** 2 Fraction(4, 9) a 100000000000000000001 (a ** 2) ** F(1, 2) == a False
Do you think that any of these things should also be changed so that there can be e.g. one sqrt() function that does the right thing for all types?
At the time I only thought about sum(), but yes, for every operation that has one "best" implementation per class there should be a uniform way to make these implementations available.
One way to unify these things is with a load of new dunder methods so that each type can implement its own response to the standard function. Another is to have special modules that deal with different things and e.g. a function for summing Rationals and another for summing inexact types and so on.
Another possibility is that you have decimal functions in the decimal module and fraction functions in the fractions module and so on and don't use any duck-typing (except in coercion). That may seem strange for Python but it's worth remembering that most of the best numerical code that has ever been written was written in languages that *require* you to write separate code for each numeric type and don't give any syntactic support for working with non-native types.
Classmethods would be yet another option float.sum(numbers) or complex.sqrt(a) don't look bad, though I'm not sure what the right approach for int.sqrt() would be...
On Wed, Aug 21, 2013 at 9:25 AM, Peter Otten <__peter__@web.de> wrote:
At the time I only thought about sum(), but yes, for every operation that has one "best" implementation per class there should be a uniform way to make these implementations available.
Oscar Benjamin wrote:
One way to unify these things is with a load of new dunder methods so that each type can implement its own response to the standard function. Another is to have special modules that deal with different things and e.g. a function for summing Rationals and another for summing inexact types and so on.
Another possibility is that you have decimal functions in the decimal module and fraction functions in the fractions module and so on and don't use any duck-typing (except in coercion). That may seem strange for Python but it's worth remembering that most of the best numerical code that has ever been written was written in languages that *require* you to write separate code for each numeric type and don't give any syntactic support for working with non-native types.
Classmethods would be yet another option
float.sum(numbers)
or
complex.sqrt(a)
don't look bad, though I'm not sure what the right approach for int.sqrt() would be...
I must say, this sounds like a classic case for a cookbook recipe, especially considering that the implementations are simple and elegant Python code. - Tal
On 21 August 2013 07:25, Peter Otten <__peter__@web.de> wrote:
Oscar Benjamin wrote:
On 19 August 2013 11:09, Peter Otten <__peter__@web.de> wrote:
sum(items, 0.0) would then automatically profit from the clever optimizations of math.fsum() etc.
fsum() is about controlling accumulated rounding errors rather than optimisation (although it may be faster I've never needed to check).
Is I understand it, you can "optimize" for precision, memory usage, code simplicity -- not just speed.
Sorry, you're right.
Do you think that any of these things should also be changed so that there can be e.g. one sqrt() function that does the right thing for all types?
At the time I only thought about sum(), but yes, for every operation that has one "best" implementation per class there should be a uniform way to make these implementations available. [snip]
Classmethods would be yet another option
float.sum(numbers)
or
complex.sqrt(a)
don't look bad, though I'm not sure what the right approach for int.sqrt() would be...
It's not necessarily per-class. For sqrt we can write functions targeted at the Rational ABC that will work for anything that fits with that part of the numeric tower (including int) e.g.: import math def sqrt_ceil(y): xguess = int(math.floor(math.sqrt(y))) while xguess ** 2 < y: # This can be improved xguess += 1 return xguess def sqrt_floor(y): x = sqrt_ceil(y) if x ** 2 != y: x -= 1 return x def sqrt_exact(y): if y.denominator != 1: return type(y)(sqrt_exact(y.numerator), sqrt_exact(y.denominator)) x = sqrt_ceil(y) if x ** 2 == y: return x else: raise ValueError('No exact rational root') I think it's reasonable to have things like that in the fractions module since that's where the stdlib implements its concrete Rational type. A similar thing is possible with fsum. Alternative algorithms can achieve the same effect for arbitrary radix (e.g. decimal) numeric types under the appropriate rounding modes so it would be possible to make it do the right thing for decimals while keeping a fast-path for sum. This would lead to a significant performance regression for anyone who was actually hoping that their decimals would get coerced to floats though. So maybe a function in the decimal module could provide the fully general algorithm that works well for sensibly rounded instances of the Real ABC. Oscar
participants (17)
-
Andrew Barnert
-
Chris Angelico
-
Clay Sweetser
-
David Mertz
-
Eric V. Smith
-
Henshaw, Andy
-
Mark Dickinson
-
MRAB
-
Oscar Benjamin
-
Paul Moore
-
Peter Otten
-
random832@fastmail.us
-
Stefan Behnel
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Tal Einat
-
Terry Reedy