[Numpy-discussion] NumPy date/time types and the resolution concept

Francesc Alted faltet at pytables.org
Mon Jul 14 12:50:21 EDT 2008


A Monday 14 July 2008, Pierre GM escrigué:
> On Monday 14 July 2008 09:07:47 Francesc Alted wrote:
> > The advantage of this abstraction is that the user can easily
> > choose the scale of resolution that better fits his need.  I'm
> > thinking in providing the next resolutions:
> >
> > ["femtosec", "picosec", "nanosec", "microsec", "millisec", "sec",
> > "min", "hour", "month", "year"]
>
> In TimeSeries, we don't have anything less than a second, but we
> have 'daily', 'business daily', 'weekly' and 'quarterly' resolutions.

Yes, I forgot the "day" resolution.  I suppose that "weekly" 
and "quaterly" could be added too.  However, if we adopt a new way to 
specify the resolution (see later), these can be stated as '7d' 
and '3m' respectively.  Mmh, not sure about "business daily"; this 
maybe is useful in time series, but I don't find a reasonable meaning 
for it as a 'time resolution' (which is a different concept from 'time 
frequency').  So I'd let it out.

> A very useful point that Matt Knox had coded is the possibility to
> specify starting points for switching from one resolution to another.
> For example, you can have a series with a 'ANN_MAR' frequency, that
> corresponds to 1 point a year, the year starting in April. When
> switching back to a monthly resolution, the points from January to
> March of the first year will be masked.

Ok.  Ann was also suggesting that the origin of time would be 
configurable, but then, you are talking about *masking* values.  Mmm, I 
don't think we should try to incorporate masking capabilities in the 
NumPy date/time types.

At any rate, I've not thought about the possibility of having an origin 
defined by the user, but if we could add the 'resolution' metainfo, I 
don't see why we couldn't do the same with the 'origin' metainfo too.

> Another useful point would be allow the user to define his/her own
> resolution (every 15min, every 12h...). Right now it's a bit clunky
> in TimeSeries, we have to use the lowest resolution of the series
> (min, hour) and leave a lot of blanks (TimeSeries don't have to be
> regularly spaced, but it helps...)

Ok.  I see the use case for this, but for implementation purposes, we 
should come with a more complete way to specify the resolution than I 
realized before. Hmm, what about the next:

[N]timeunit

where ``timeunit`` can take the values in:

['y', 'm', 'd', 'h', 'm', 's', 'ms', 'us', 'ns', 'fs']

so, for example, '14d' means a resolution of 14 days, or '10ms' means a 
resolution of 1 hundreth of second.  Sounds good to me.  What other 
people think?

>
> > Now, it comes the tricky part: how to integrate the notion
> > of 'resolution' with the 'dtype' data type factory of NumPy?
>
> In TimeSeries, the frequency is stored as an integer. For example, a
> daily frequency is stored as 6000, an annual frequency as 1000, a
> 'ANN_MAR' frequency as 1003...

Well, I initially planned to keep the resolution as an enumerated (int8 
would be enough), but if the new way to specify resolutions goes ahead, 
I'm afraid that we may need a fill int64 to save this.  But apart from 
that, this should be not a problem (in general, the metainfo is a very 
tiny part of the space taken by a dataset).

Cheers,

-- 
Francesc Alted



More information about the NumPy-Discussion mailing list