[Numpy-discussion] Datetime branch

Pierre GM pgmdevlist at gmail.com
Thu Jun 11 16:33:10 EDT 2009


On Jun 11, 2009, at 3:47 PM, Robert Kern wrote:

> On Thu, Jun 11, 2009 at 14:37, Pierre GM<pgmdevlist at gmail.com> wrote:
>>
>> On Jun 11, 2009, at 3:07 PM, Travis Oliphant wrote:
>>
>>>> BTW, what is the metadata that is going to be added to the types?
>>>> What purpose does it serve?
>>>
>>> In the date-time case, it holds what frequency the integer in the
>>> data-
>>> type represents.    There will only be 2 new static data-types.
>>> "Datetime" and "Timedelta" that use 8 bytes each.
>>>
>>> What those 8 bytes represent will be determined by the metadata
>>> (years, months, seconds, etc...).
>>
>> As Charles pointed out, it'd be quite useful for units as well. Or to
>> store some extra information like the filling_value of a  
>> MaskedArray...
>>
>> So, this metadata would be attached to an array, right ?
>
> No. The metadata is on the dtype.

Ah, OK. Still could be used for units, then. And it'll probably make  
things easier to define custom dtypes (I was thinking about a standard  
problem where all the fields of a structured array have the same  
dtype. A flag could be attached to the main dtype telling that it's OK  
to perform some functions on fields, for example... Thinking aloud  
here).


>> Scalars would
>> be considered as 0d array for that purpose, right ? eg,  given a 1d
>> array of dates w/ a given frequency, accessing a single element would
>> give me a scalar w/ the same frequency ?
>
> It should. The details still need to be worked out.

OK.


>
>>>  The ufunc machinery needs to change to handle passing
>>> that information in somehow.   The approaches we take to doing that
>>> will also hopefully allow us to define ufuncs for string, unicode,  
>>> and
>>> void * arrays as well.
>>
>> In that case, could we also think about what Darren was suggesting  
>> for
>> his units package, viz, a pre-processing function  
>> (__array_unwrap__ ?)
>> that complements the current __array_wrap__ one ? The idea being that
>> any operation would be performed on a ndarray, the corresponding
>> metadata would be just passed along during the operation, and
>> modifications to the metadata would be taken care of in the pre- and/
>> or post- processing steps ?
>
> Neither here nor there, I think.
>
>> Oh, just another question: why trying to put datetime and timedelta  
>> in
>> the type ordering ? My understanding is that underneath, they're just
>> long/longlong. It's only because they have a particular metadata that
>> they should be processed differently, right ?
>
> No. They need to be different types such that the ufunc mechanism can
> find the right loop implementations.

Meh. I'm not familiar enough with the details of C ufuncs, so bear  
with me for a minute.

A datetime  is basically a long + a frequency attribute. All the  
operations recognized as valid for a datetime object will deal w/ the  
long part, the frequency are just patched back at the end, right ? So,  
a ufunc could first check the underlying type (here, long or  
longlong), then check whether there's a value for the 'unit': if  
there's one, choose the corresponding loop, if None, use the default  
(the one we currently have).

I really fail to see why we need to see datetime/timedelta as  
intrinsically different from the other types (apart that they carry  
some extra info), and why the mechanism should be different for  
datetime/timedelta than for units, say.


>> So, if soon we add units
>> to floats, the underneath object would still be considered float,
>> dealing w/ the unit has to be let for ufuncs ?
>
> This is why I don't think this mechanism can be used for units.

Robert, would you mind pointing me offlist to the relevant part of the  
code so that I can try to figure out by myself ? Or just explain it in  
plain english (which would then be the basis for a documentation of  
these new features)...



More information about the NumPy-Discussion mailing list