[Numpy-discussion] Datetime branch

Charles R Harris charlesr.harris at gmail.com
Thu Jun 11 15:24:40 EDT 2009


On Thu, Jun 11, 2009 at 1:07 PM, Travis Oliphant <oliphant at enthought.com>wrote:

>
> On Jun 11, 2009, at 1:44 PM, Charles R Harris wrote:
>
> >
> > The implementation of  PyArray_CanCastSafely illustrates two other
> > points that bother me.
> >
> > 1) The rules are encoded in the program logic. This makes them
> > difficult to find or to see what they are and requires editing the
> > code to make changes.
>
> I agree that this is all sub-optimal.     I didn't do much to fix what
> was there with Numeric except add a semi-orthogonal user-defined
> approach.
>
> I like the generic function concept that was added to the ufuncs quite
> a bit.   I'm wondering if most of the functions currently in the *f
> member of the data-type structure couldn't be implemented under that
> notion instead.
>
> Also, should we attach coercion information to each data-type directly
> and an API to extend the coercion information?   I agree that the
> "implicit" ordering of the data-types for coercion is wonky, but it
> allowed the code from Numeric to be used to dispatch in the ufunc
> instead of designing a new approach.   Do you have other ideas about
> how this might work?
>

It was a fairly decent system when there were just a few numeric types, but
there are more data types then datetime that might be useful so it would be
nice if there was a more general way to add them without wading through all
the stuff Robert had to do. The descriptors still need to be identified and
a number is as good as anything, it is the reliance on ordering that is the
limitation.

For a general solution, my thoughts have been running along the lines of a
table/linked list, but not directly implemented in c. Who wants to edit a
19x19 array, maybe even several of them ;) So I'm trying to think how the
rules could be encoded so that a python program could generate tables or
lists. The rules could all be collected in one spot, then.

Actual code would still be needed for the conversions  and loops and there
needs to be a way to associate the conversion with the corresponding
function. So probably a name as well as a number is needed when a new type
is added.


> >
> > 2) Some of the rules are maintained by the types. That is even more
> > obscure and reminiscent of the "friend" functions in c++ that encode
> > the same sort of thing when the operators are overloaded. I never
> > did like that as a general system ;)
>
> Are you referring to the user-defined data-types?    I agree it's
> pretty kludgy.    Are you envisioning a "global" coercion table?  It
> seems like this may need to be operation specific and extensible to
> allow new data-types to be added fairly easily.
>
> >
> > BTW, what is the metadata that is going to be added to the types?
> > What purpose does it serve?
>
> In the date-time case, it holds what frequency the integer in the data-
> type represents.    There will only be 2 new static data-types.
> "Datetime" and "Timedelta" that use 8 bytes each.
>
> What those 8 bytes represent will be determined by the metadata
> (years, months, seconds, etc...).
>
> But, generally, it will be an extra dictionary that can store anything
> you want (anybody want to define a "float" data-type that uses IBM
> format bits?).  The ufunc machinery needs to change to handle passing
> that information in somehow.   The approaches we take to doing that
> will also hopefully allow us to define ufuncs for string, unicode, and
> void * arrays as well.
>

Might be useful for units also.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20090611/7efc4115/attachment.html>


More information about the NumPy-Discussion mailing list