[Numpy-discussion] Adding keyword to asarray and asanyarray.

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Mar 6 08:48:01 EST 2015


On Fri, Mar 6, 2015 at 7:59 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Thu, Mar 5, 2015 at 10:02 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Thu, Mar 5, 2015 at 12:33 PM, Charles R Harris
>> <charlesr.harris at gmail.com> wrote:
>> >
>> >
>> > On Thu, Mar 5, 2015 at 10:04 AM, Chris Barker <chris.barker at noaa.gov>
>> > wrote:
>> >>
>> >> On Thu, Mar 5, 2015 at 8:42 AM, Benjamin Root <ben.root at ou.edu> wrote:
>> >>>
>> >>> dare I say... datetime64/timedelta64 support?
>> >>
>> >>
>> >> well, the precision of those is 64 bits, yes? so if you asked for less
>> >> than that, you'd still get a dt64. If you asked for 64 bits, you'd get
>> >> it,
>> >> if you asked for datetime128  -- what would you get???
>> >>
>> >> a 128 bit integer? or an Exception, because there is no 128bit datetime
>> >> dtype.
>> >>
>> >> But I think this is the same problem with any dtype -- if you ask for a
>> >> precision that doesn't exist, you're going to get an error.
>> >>
>> >> Is there a more detailed description of the proposed feature anywhere?
>> >> Do
>> >> you specify a dtype as a precision? or jsut the precision, and let the
>> >> dtype
>> >> figure it out for itself, i.e.:
>> >>
>> >> precision=64
>> >>
>> >> would give you a float64 if the passed in array was a float type, but a
>> >> int64 if the passed in array was an int type, or a uint64 if the passed
>> >> in
>> >> array was a unsigned int type, etc.....
>> >>
>> >> But in the end,  I wonder about the use case. I generaly use asarray
>> >> one
>> >> of two ways:
>> >>
>> >> Without a dtype -- to simple make sure I've got an ndarray of SOME
>> >> dtype.
>> >>
>> >> or
>> >>
>> >> With a dtype - because I really care about the dtype -- usually because
>> >> I
>> >> need to pass it on to C code or something.
>> >>
>> >> I don't think I'd ever need at least some precision, but not care if I
>> >> got
>> >> more than that...
>> >
>> >
>> > The main use that I want to cover is that float64 and complex128 have
>> > the
>> > same precision and it would be good if either is acceptable.  Also, one
>> > might just want either float32 or float64, not just one of the two.
>> > Another
>> > intent is to make the fewest possible copies. The determination of the
>> > resulting type is made using the result_type function.
>>
>>
>> How does this work for object arrays, or datetime?
>>
>> Can I specify at least float32 or float64, and it raises an exception
>> if it cannot be converted?
>>
>> The problem we have in statsmodels is that pandas frequently uses
>> object arrays and it messes up patsy or statsmodels if it's not
>> explicitly converted.
>
>
> Object arrays go to object arrays, datetime64 depends.
>
> In [10]: result_type(ones(1, dtype=object_), float32)
> Out[10]: dtype('O')
>
>
> Datetime64 seems to use the highest precision
>
> In [12]: result_type(ones(1, dtype='datetime64[D]'), 'datetime64[us]')
> Out[12]: dtype('<M8[us]')
>
> In [13]: result_type(ones(1, dtype='datetime64[D]'), 'datetime64[Y]')
> Out[13]: dtype('<M8[D]')
>
> but doesn't convert to float
>
> In [11]: result_type(ones(1, dtype='datetime64[D]'), float32)
> ---------------------------------------------------------------------------
> TypeError                                 Traceback (most recent call last)
> <ipython-input-11-e1a09e933dc7> in <module>()
> ----> 1 result_type(ones(1, dtype='datetime64[D]'), float32)
>
> TypeError: invalid type promotion
>
> What would you like it to do?

Note: the dtype handling in statsmodels is still a mess, and we just
plugged some of the worst cases.


What we would need is asarray with at least a minimum precision (e.g.
float32) and raise an exception if it's not numeric, like string,
object, custom dtypes ...

However, we need custom dtype handling in statsmodels anyway, so the
enhancement to asarray with exceptions would mainly be convenient to
get something to work with because pandas and numpy as now "object
array friendly".

I assume scipy also has insufficient checks for non-numeric dtypes, AFAIR.


Josef


>
> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list