[New-bugs-announce] [issue25928] Improve performance of statistics._decimal_to_ratio and fractions.from_decimal
John Walker
report at bugs.python.org
Tue Dec 22 17:18:25 EST 2015
New submission from John Walker:
In statistics, there is a FIXME on Line 250 above _decimal_to_ratio that says:
# FIXME This is faster than Fraction.from_decimal, but still too slow.
Half of the time is spent in a conversion in d.as_tuple(). Decimal internally stores the digits as a string, but in d.as_tuple(), the digits are individually cast to integers and returned as a tuple of integers.
This is OK, but _decimal_to_ratio undoes the work that was done in d.as_tuple() by adding them all back into an integer. A similar, but slightly different approach is taken in Fractions.from_decimal, where the tuple is cast into a string and then parsed into an integer. We can be a lot faster if we use the _int instance variable directly.
In the case of _decimal_to_ratio, the new code seems to be twice as fast with usage _decimal_to_ratio(Decimal(str(random.random()))):
def _decimal_to_ratio(d):
sign, exp = d._sign, d._exp
if exp in ('F', 'n', 'N'): # INF, NAN, sNAN
assert not d.is_finite()
return (d, None)
num = int(d._int)
if exp < 0:
den = 10**-exp
else:
num *= 10**exp
den = 1
if sign:
num = -num
return (num, den)
If the performance improvement is considered worthwhile, here are a few solutions I see.
1) Use _int directly in fractions and statistics.
2) Add a digits_as_str method to Decimal. This prevents exposing _int as an implementation detail, and makes sense to me since I suspect there is a lot of code casting the tuple of int to a string anyway.
3) Add a parameter to as_tuple for determining whether digits should be returned as a string or a tuple.
4) Deprecate _int in Decimal and add a new reference str_digits.
There are probably more solutions. I lean towards 4, because it makes usage easier and avoids cluttering Decimal with methods.
Here is what I used for benchmarks:
========
import timeit
old_setup = """
import random
from decimal import Decimal
def _decimal_to_ratio(d):
sign, digits, exp = d.as_tuple()
if exp in ('F', 'n', 'N'): # INF, NAN, sNAN
assert not d.is_finite()
return (d, None)
num = 0
for digit in digits:
num = num*10 + digit
if exp < 0:
den = 10**-exp
else:
num *= 10**exp
den = 1
if sign:
num = -num
return (num, den)
def run_it():
dec = Decimal(str(random.random()))
_decimal_to_ratio(dec)
"""
new_setup = """
import random
from decimal import Decimal
def _decimal_to_ratio(d):
sign, exp = d._sign, d._exp
if exp in ('F', 'n', 'N'): # INF, NAN, sNAN
assert not d.is_finite()
return (d, None)
num = int(d._int)
if exp < 0:
den = 10**-exp
else:
num *= 10**exp
den = 1
if sign:
num = -num
return (num, den)
def run_it():
dec = Decimal(str(random.random()))
_decimal_to_ratio(dec)
"""
if __name__ == '__main__':
print("Testing proposed implementation")
print("number = 10000")
print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=10000))
print("number = 100000")
print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=100000))
print("number = 1000000")
print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=1000000))
print("Testing old implementation")
print("number = 10000")
print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=10000))
print("number = 100000")
print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=100000))
print("number = 1000000")
print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=1000000))
----------
components: Library (Lib)
messages: 256873
nosy: johnwalker
priority: normal
severity: normal
status: open
title: Improve performance of statistics._decimal_to_ratio and fractions.from_decimal
versions: Python 3.5
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue25928>
_______________________________________
More information about the New-bugs-announce
mailing list