Making cdecimal.Decimal a native numpy type
Hi,
I am a seasoned numpy/pandas user mainly interested in financial
applications. These and other applications would greatly benefit from a
decimal data type with flexible rounding rules, precision etc.
Yes, there is cdecimal, the traditional decimal module from the Python
stdlib rewritten in C,
- http://www.bytereef.org/mpdecimal/index.html -
which has become part of the stdlib from Python 3.3.
However, it appears that cdecimal cannot be meaningfully used with numpy
(see the benchmark below). Squaring an n=10000 ndarray is 1500 times
faster with float64 than with a dtype=object ndarray based on
cdecimal.Decimal, and even simple operations fail in the first place.
I am not deeply enough into ufuncs etc. to judge if some of these
problems can be avoided with a few lines of Python code. However, my
impression is that ultimately we would all benefit from cdecimal.Decimal
becoming a native numpy type. Put bluntly, cdecimal is a great tool. But
it is not yet where we most need it.
The author of cdecimal, Stefan Krah, would probably have a great deal of
the skillset needed to successfully take such a project forward. He
happens to have also written the new memoryview implementation of Python
3.3. And from recent correspondence I understand he might be willing to
get involved in an effort to marry numpy and cdecimal.
The main question is if such project would fit into what core developers
see as the future of numpy.
Regards
Leo
And here is the benchmark:
In [1]: from numpy import *
In [2]: from cdecimal import Decimal
In [3]: r=random.rand(10000)
In [4]: d=ndarray(10000, dtype=Decimal)
In [5]: d.dtype
Out[5]: dtype('object')
In [6]: r.dtype
Out[6]: dtype('float64')
In [7]: for i in range(10000): d[i] = Decimal(r[i])
In [8]: %timeit r**2
100000 loops, best of 3: 14.7 us per loop
In [9]: %timeit d**2
10 loops, best of 3: 21.2 ms per loop
In [10]: r.var()
Out[10]: 0.082478142261349557
In [11]: d.var()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
C:\
On Sun, Jul 22, 2012 at 8:54 AM, Dr.Leo
Hi,
I am a seasoned numpy/pandas user mainly interested in financial applications. These and other applications would greatly benefit from a decimal data type with flexible rounding rules, precision etc.
Yes, there is cdecimal, the traditional decimal module from the Python stdlib rewritten in C,
- http://www.bytereef.org/mpdecimal/index.html -
which has become part of the stdlib from Python 3.3.
However, it appears that cdecimal cannot be meaningfully used with numpy (see the benchmark below). Squaring an n=10000 ndarray is 1500 times faster with float64 than with a dtype=object ndarray based on cdecimal.Decimal, and even simple operations fail in the first place.
I am not deeply enough into ufuncs etc. to judge if some of these problems can be avoided with a few lines of Python code. However, my impression is that ultimately we would all benefit from cdecimal.Decimal becoming a native numpy type. Put bluntly, cdecimal is a great tool. But it is not yet where we most need it.
The author of cdecimal, Stefan Krah, would probably have a great deal of the skillset needed to successfully take such a project forward. He happens to have also written the new memoryview implementation of Python 3.3. And from recent correspondence I understand he might be willing to get involved in an effort to marry numpy and cdecimal.
The main question is if such project would fit into what core developers see as the future of numpy.
Regards
Leo
And here is the benchmark:
In [1]: from numpy import *
In [2]: from cdecimal import Decimal
In [3]: r=random.rand(10000)
In [4]: d=ndarray(10000, dtype=Decimal)
In [5]: d.dtype Out[5]: dtype('object')
In [6]: r.dtype Out[6]: dtype('float64')
In [7]: for i in range(10000): d[i] = Decimal(r[i])
In [8]: %timeit r**2 100000 loops, best of 3: 14.7 us per loop
In [9]: %timeit d**2 10 loops, best of 3: 21.2 ms per loop
In [10]: r.var() Out[10]: 0.082478142261349557
In [11]: d.var() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) C:\
in <module>() ----> 1 d.var() _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
The numpy-dtypes repository (https://github.com/numpy/numpy-dtypes) has been created recently as a repository for extension dtypes for numpy. This would be the natural place for a decimal dtype. Currently there is a rational and quaternion type, and documentation on how to implement a new dtype. This project is at an early stage and moving somewhat slowly, so contributions and input would be quite welcome. - Tom
participants (2)
-
Dr.Leo
-
Tom Aldcroft