Adding keyword to asarray and asanyarray.

Hi All, This is apropos gh-5634 <https://github.com/numpy/numpy/pull/5634>, a PR adding a precision keyword to asarray and asanyarray. The PR description is The precision keyword differs from the current dtype keyword in the
following way.
- It specifies a minimum precision. If the precision of the input is greater than the specified precision, the input precision is preserved. - Complex types are preserved. A specifies floating precision applies to the dtype of the real and complex parts separately.
For example, both complex128 and float64 dtypes have the same precision and an array of dtype float64 will be unchanged if the specified precision is float32.
Ideally the precision keyword would be pushed down into the array constructor so that the resulting dtype could be determined before the array is constructed, but that would require adding new functions as the current constructors are part of the API and cannot have their signatures changed.
The name of the keyword is open to discussion, as well as its acceptable values. And of course, anything else that might come to mind ;) Thoughts? Chuck

dare I say... datetime64/timedelta64 support? ::ducks:: Ben Root On Thu, Mar 5, 2015 at 11:40 AM, Charles R Harris <charlesr.harris@gmail.com
wrote:
Hi All,
This is apropos gh-5634 <https://github.com/numpy/numpy/pull/5634>, a PR adding a precision keyword to asarray and asanyarray. The PR description is
The precision keyword differs from the current dtype keyword in the
following way.
- It specifies a minimum precision. If the precision of the input is greater than the specified precision, the input precision is preserved. - Complex types are preserved. A specifies floating precision applies to the dtype of the real and complex parts separately.
For example, both complex128 and float64 dtypes have the same precision and an array of dtype float64 will be unchanged if the specified precision is float32.
Ideally the precision keyword would be pushed down into the array constructor so that the resulting dtype could be determined before the array is constructed, but that would require adding new functions as the current constructors are part of the API and cannot have their signatures changed.
The name of the keyword is open to discussion, as well as its acceptable values. And of course, anything else that might come to mind ;)
Thoughts?
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, Mar 5, 2015 at 8:42 AM, Benjamin Root <ben.root@ou.edu> wrote:
dare I say... datetime64/timedelta64 support?
well, the precision of those is 64 bits, yes? so if you asked for less than that, you'd still get a dt64. If you asked for 64 bits, you'd get it, if you asked for datetime128 -- what would you get??? a 128 bit integer? or an Exception, because there is no 128bit datetime dtype. But I think this is the same problem with any dtype -- if you ask for a precision that doesn't exist, you're going to get an error. Is there a more detailed description of the proposed feature anywhere? Do you specify a dtype as a precision? or jsut the precision, and let the dtype figure it out for itself, i.e.: precision=64 would give you a float64 if the passed in array was a float type, but a int64 if the passed in array was an int type, or a uint64 if the passed in array was a unsigned int type, etc..... But in the end, I wonder about the use case. I generaly use asarray one of two ways: Without a dtype -- to simple make sure I've got an ndarray of SOME dtype. or With a dtype - because I really care about the dtype -- usually because I need to pass it on to C code or something. I don't think I'd ever need at least some precision, but not care if I got more than that.... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Thu, Mar 5, 2015 at 12:04 PM, Chris Barker <chris.barker@noaa.gov> wrote:
well, the precision of those is 64 bits, yes? so if you asked for less than that, you'd still get a dt64. If you asked for 64 bits, you'd get it, if you asked for datetime128 -- what would you get???
a 128 bit integer? or an Exception, because there is no 128bit datetime dtype.
I was more thinking of datetime64/timedelta64's ability to specify the time units. Ben Root

On Thu, Mar 5, 2015 at 10:04 AM, Chris Barker <chris.barker@noaa.gov> wrote:
On Thu, Mar 5, 2015 at 8:42 AM, Benjamin Root <ben.root@ou.edu> wrote:
dare I say... datetime64/timedelta64 support?
well, the precision of those is 64 bits, yes? so if you asked for less than that, you'd still get a dt64. If you asked for 64 bits, you'd get it, if you asked for datetime128 -- what would you get???
a 128 bit integer? or an Exception, because there is no 128bit datetime dtype.
But I think this is the same problem with any dtype -- if you ask for a precision that doesn't exist, you're going to get an error.
Is there a more detailed description of the proposed feature anywhere? Do you specify a dtype as a precision? or jsut the precision, and let the dtype figure it out for itself, i.e.:
precision=64
would give you a float64 if the passed in array was a float type, but a int64 if the passed in array was an int type, or a uint64 if the passed in array was a unsigned int type, etc.....
But in the end, I wonder about the use case. I generaly use asarray one of two ways:
Without a dtype -- to simple make sure I've got an ndarray of SOME dtype.
or
With a dtype - because I really care about the dtype -- usually because I need to pass it on to C code or something.
I don't think I'd ever need at least some precision, but not care if I got more than that...
The main use that I want to cover is that float64 and complex128 have the same precision and it would be good if either is acceptable. Also, one might just want either float32 or float64, not just one of the two. Another intent is to make the fewest possible copies. The determination of the resulting type is made using the result_type function. Chuck

On Thu, Mar 5, 2015 at 12:33 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, Mar 5, 2015 at 10:04 AM, Chris Barker <chris.barker@noaa.gov> wrote:
On Thu, Mar 5, 2015 at 8:42 AM, Benjamin Root <ben.root@ou.edu> wrote:
dare I say... datetime64/timedelta64 support?
well, the precision of those is 64 bits, yes? so if you asked for less than that, you'd still get a dt64. If you asked for 64 bits, you'd get it, if you asked for datetime128 -- what would you get???
a 128 bit integer? or an Exception, because there is no 128bit datetime dtype.
But I think this is the same problem with any dtype -- if you ask for a precision that doesn't exist, you're going to get an error.
Is there a more detailed description of the proposed feature anywhere? Do you specify a dtype as a precision? or jsut the precision, and let the dtype figure it out for itself, i.e.:
precision=64
would give you a float64 if the passed in array was a float type, but a int64 if the passed in array was an int type, or a uint64 if the passed in array was a unsigned int type, etc.....
But in the end, I wonder about the use case. I generaly use asarray one of two ways:
Without a dtype -- to simple make sure I've got an ndarray of SOME dtype.
or
With a dtype - because I really care about the dtype -- usually because I need to pass it on to C code or something.
I don't think I'd ever need at least some precision, but not care if I got more than that...
The main use that I want to cover is that float64 and complex128 have the same precision and it would be good if either is acceptable. Also, one might just want either float32 or float64, not just one of the two. Another intent is to make the fewest possible copies. The determination of the resulting type is made using the result_type function.
How does this work for object arrays, or datetime? Can I specify at least float32 or float64, and it raises an exception if it cannot be converted? The problem we have in statsmodels is that pandas frequently uses object arrays and it messes up patsy or statsmodels if it's not explicitly converted. Josef
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, Mar 5, 2015 at 10:02 PM, <josef.pktd@gmail.com> wrote:
On Thu, Mar 5, 2015 at 12:33 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, Mar 5, 2015 at 10:04 AM, Chris Barker <chris.barker@noaa.gov>
On Thu, Mar 5, 2015 at 8:42 AM, Benjamin Root <ben.root@ou.edu> wrote:
dare I say... datetime64/timedelta64 support?
well, the precision of those is 64 bits, yes? so if you asked for less than that, you'd still get a dt64. If you asked for 64 bits, you'd get
it,
if you asked for datetime128 -- what would you get???
a 128 bit integer? or an Exception, because there is no 128bit datetime dtype.
But I think this is the same problem with any dtype -- if you ask for a precision that doesn't exist, you're going to get an error.
Is there a more detailed description of the proposed feature anywhere? Do you specify a dtype as a precision? or jsut the precision, and let the
figure it out for itself, i.e.:
precision=64
would give you a float64 if the passed in array was a float type, but a int64 if the passed in array was an int type, or a uint64 if the passed in array was a unsigned int type, etc.....
But in the end, I wonder about the use case. I generaly use asarray one of two ways:
Without a dtype -- to simple make sure I've got an ndarray of SOME
wrote: dtype dtype.
or
With a dtype - because I really care about the dtype -- usually because
I
need to pass it on to C code or something.
I don't think I'd ever need at least some precision, but not care if I got more than that...
The main use that I want to cover is that float64 and complex128 have the same precision and it would be good if either is acceptable. Also, one might just want either float32 or float64, not just one of the two. Another intent is to make the fewest possible copies. The determination of the resulting type is made using the result_type function.
How does this work for object arrays, or datetime?
Can I specify at least float32 or float64, and it raises an exception if it cannot be converted?
The problem we have in statsmodels is that pandas frequently uses object arrays and it messes up patsy or statsmodels if it's not explicitly converted.
Object arrays go to object arrays, datetime64 depends. In [10]: result_type(ones(1, dtype=object_), float32) Out[10]: dtype('O') Datetime64 seems to use the highest precision In [12]: result_type(ones(1, dtype='datetime64[D]'), 'datetime64[us]') Out[12]: dtype('<M8[us]') In [13]: result_type(ones(1, dtype='datetime64[D]'), 'datetime64[Y]') Out[13]: dtype('<M8[D]') but doesn't convert to float In [11]: result_type(ones(1, dtype='datetime64[D]'), float32) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-11-e1a09e933dc7> in <module>() ----> 1 result_type(ones(1, dtype='datetime64[D]'), float32) TypeError: invalid type promotion What would you like it to do? Chuck

On Fri, Mar 6, 2015 at 7:59 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, Mar 5, 2015 at 10:02 PM, <josef.pktd@gmail.com> wrote:
On Thu, Mar 5, 2015 at 12:33 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, Mar 5, 2015 at 10:04 AM, Chris Barker <chris.barker@noaa.gov> wrote:
On Thu, Mar 5, 2015 at 8:42 AM, Benjamin Root <ben.root@ou.edu> wrote:
dare I say... datetime64/timedelta64 support?
well, the precision of those is 64 bits, yes? so if you asked for less than that, you'd still get a dt64. If you asked for 64 bits, you'd get it, if you asked for datetime128 -- what would you get???
a 128 bit integer? or an Exception, because there is no 128bit datetime dtype.
But I think this is the same problem with any dtype -- if you ask for a precision that doesn't exist, you're going to get an error.
Is there a more detailed description of the proposed feature anywhere? Do you specify a dtype as a precision? or jsut the precision, and let the dtype figure it out for itself, i.e.:
precision=64
would give you a float64 if the passed in array was a float type, but a int64 if the passed in array was an int type, or a uint64 if the passed in array was a unsigned int type, etc.....
But in the end, I wonder about the use case. I generaly use asarray one of two ways:
Without a dtype -- to simple make sure I've got an ndarray of SOME dtype.
or
With a dtype - because I really care about the dtype -- usually because I need to pass it on to C code or something.
I don't think I'd ever need at least some precision, but not care if I got more than that...
The main use that I want to cover is that float64 and complex128 have the same precision and it would be good if either is acceptable. Also, one might just want either float32 or float64, not just one of the two. Another intent is to make the fewest possible copies. The determination of the resulting type is made using the result_type function.
How does this work for object arrays, or datetime?
Can I specify at least float32 or float64, and it raises an exception if it cannot be converted?
The problem we have in statsmodels is that pandas frequently uses object arrays and it messes up patsy or statsmodels if it's not explicitly converted.
Object arrays go to object arrays, datetime64 depends.
In [10]: result_type(ones(1, dtype=object_), float32) Out[10]: dtype('O')
Datetime64 seems to use the highest precision
In [12]: result_type(ones(1, dtype='datetime64[D]'), 'datetime64[us]') Out[12]: dtype('<M8[us]')
In [13]: result_type(ones(1, dtype='datetime64[D]'), 'datetime64[Y]') Out[13]: dtype('<M8[D]')
but doesn't convert to float
In [11]: result_type(ones(1, dtype='datetime64[D]'), float32) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-11-e1a09e933dc7> in <module>() ----> 1 result_type(ones(1, dtype='datetime64[D]'), float32)
TypeError: invalid type promotion
What would you like it to do?
Note: the dtype handling in statsmodels is still a mess, and we just plugged some of the worst cases. What we would need is asarray with at least a minimum precision (e.g. float32) and raise an exception if it's not numeric, like string, object, custom dtypes ... However, we need custom dtype handling in statsmodels anyway, so the enhancement to asarray with exceptions would mainly be convenient to get something to work with because pandas and numpy as now "object array friendly". I assume scipy also has insufficient checks for non-numeric dtypes, AFAIR. Josef
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Fri, Mar 6, 2015 at 7:59 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:
Datetime64 seems to use the highest precision
In [12]: result_type(ones(1, dtype='datetime64[D]'), 'datetime64[us]') Out[12]: dtype('<M8[us]')
In [13]: result_type(ones(1, dtype='datetime64[D]'), 'datetime64[Y]') Out[13]: dtype('<M8[D]')
Ah, yes, that's what I'm looking for. +1 from me to have this in asarray/asanyarray. Of course, there is always the usual caveats about converting your datetime data in this manner, but this would be helpful in many situations in writing functions that expect to deal with temporal data at the resolution of minutes or somesuch. Cheers! Ben Root
participants (4)
-
Benjamin Root
-
Charles R Harris
-
Chris Barker
-
josef.pktd@gmail.com