[Numpy-discussion] Datetime branch
Pierre GM
pgmdevlist at gmail.com
Thu Jun 11 16:33:10 EDT 2009
On Jun 11, 2009, at 3:47 PM, Robert Kern wrote:
> On Thu, Jun 11, 2009 at 14:37, Pierre GM<pgmdevlist at gmail.com> wrote:
>>
>> On Jun 11, 2009, at 3:07 PM, Travis Oliphant wrote:
>>
>>>> BTW, what is the metadata that is going to be added to the types?
>>>> What purpose does it serve?
>>>
>>> In the date-time case, it holds what frequency the integer in the
>>> data-
>>> type represents. There will only be 2 new static data-types.
>>> "Datetime" and "Timedelta" that use 8 bytes each.
>>>
>>> What those 8 bytes represent will be determined by the metadata
>>> (years, months, seconds, etc...).
>>
>> As Charles pointed out, it'd be quite useful for units as well. Or to
>> store some extra information like the filling_value of a
>> MaskedArray...
>>
>> So, this metadata would be attached to an array, right ?
>
> No. The metadata is on the dtype.
Ah, OK. Still could be used for units, then. And it'll probably make
things easier to define custom dtypes (I was thinking about a standard
problem where all the fields of a structured array have the same
dtype. A flag could be attached to the main dtype telling that it's OK
to perform some functions on fields, for example... Thinking aloud
here).
>> Scalars would
>> be considered as 0d array for that purpose, right ? eg, given a 1d
>> array of dates w/ a given frequency, accessing a single element would
>> give me a scalar w/ the same frequency ?
>
> It should. The details still need to be worked out.
OK.
>
>>> The ufunc machinery needs to change to handle passing
>>> that information in somehow. The approaches we take to doing that
>>> will also hopefully allow us to define ufuncs for string, unicode,
>>> and
>>> void * arrays as well.
>>
>> In that case, could we also think about what Darren was suggesting
>> for
>> his units package, viz, a pre-processing function
>> (__array_unwrap__ ?)
>> that complements the current __array_wrap__ one ? The idea being that
>> any operation would be performed on a ndarray, the corresponding
>> metadata would be just passed along during the operation, and
>> modifications to the metadata would be taken care of in the pre- and/
>> or post- processing steps ?
>
> Neither here nor there, I think.
>
>> Oh, just another question: why trying to put datetime and timedelta
>> in
>> the type ordering ? My understanding is that underneath, they're just
>> long/longlong. It's only because they have a particular metadata that
>> they should be processed differently, right ?
>
> No. They need to be different types such that the ufunc mechanism can
> find the right loop implementations.
Meh. I'm not familiar enough with the details of C ufuncs, so bear
with me for a minute.
A datetime is basically a long + a frequency attribute. All the
operations recognized as valid for a datetime object will deal w/ the
long part, the frequency are just patched back at the end, right ? So,
a ufunc could first check the underlying type (here, long or
longlong), then check whether there's a value for the 'unit': if
there's one, choose the corresponding loop, if None, use the default
(the one we currently have).
I really fail to see why we need to see datetime/timedelta as
intrinsically different from the other types (apart that they carry
some extra info), and why the mechanism should be different for
datetime/timedelta than for units, say.
>> So, if soon we add units
>> to floats, the underneath object would still be considered float,
>> dealing w/ the unit has to be let for ufuncs ?
>
> This is why I don't think this mechanism can be used for units.
Robert, would you mind pointing me offlist to the relevant part of the
code so that I can try to figure out by myself ? Or just explain it in
plain english (which would then be the basis for a documentation of
these new features)...
More information about the NumPy-Discussion
mailing list