A Monday 14 July 2008, Pierre GM escrigué:
On Monday 14 July 2008 09:07:47 Francesc Alted wrote:
The advantage of this abstraction is that the user can easily choose the scale of resolution that better fits his need. I'm thinking in providing the next resolutions:
["femtosec", "picosec", "nanosec", "microsec", "millisec", "sec", "min", "hour", "month", "year"]
In TimeSeries, we don't have anything less than a second, but we have 'daily', 'business daily', 'weekly' and 'quarterly' resolutions.
Yes, I forgot the "day" resolution. I suppose that "weekly" and "quaterly" could be added too. However, if we adopt a new way to specify the resolution (see later), these can be stated as '7d' and '3m' respectively. Mmh, not sure about "business daily"; this maybe is useful in time series, but I don't find a reasonable meaning for it as a 'time resolution' (which is a different concept from 'time frequency'). So I'd let it out.
A very useful point that Matt Knox had coded is the possibility to specify starting points for switching from one resolution to another. For example, you can have a series with a 'ANN_MAR' frequency, that corresponds to 1 point a year, the year starting in April. When switching back to a monthly resolution, the points from January to March of the first year will be masked.
Ok. Ann was also suggesting that the origin of time would be configurable, but then, you are talking about *masking* values. Mmm, I don't think we should try to incorporate masking capabilities in the NumPy date/time types. At any rate, I've not thought about the possibility of having an origin defined by the user, but if we could add the 'resolution' metainfo, I don't see why we couldn't do the same with the 'origin' metainfo too.
Another useful point would be allow the user to define his/her own resolution (every 15min, every 12h...). Right now it's a bit clunky in TimeSeries, we have to use the lowest resolution of the series (min, hour) and leave a lot of blanks (TimeSeries don't have to be regularly spaced, but it helps...)
Ok. I see the use case for this, but for implementation purposes, we should come with a more complete way to specify the resolution than I realized before. Hmm, what about the next: [N]timeunit where ``timeunit`` can take the values in: ['y', 'm', 'd', 'h', 'm', 's', 'ms', 'us', 'ns', 'fs'] so, for example, '14d' means a resolution of 14 days, or '10ms' means a resolution of 1 hundreth of second. Sounds good to me. What other people think?
Now, it comes the tricky part: how to integrate the notion of 'resolution' with the 'dtype' data type factory of NumPy?
In TimeSeries, the frequency is stored as an integer. For example, a daily frequency is stored as 6000, an annual frequency as 1000, a 'ANN_MAR' frequency as 1003...
Well, I initially planned to keep the resolution as an enumerated (int8 would be enough), but if the new way to specify resolutions goes ahead, I'm afraid that we may need a fill int64 to save this. But apart from that, this should be not a problem (in general, the metainfo is a very tiny part of the space taken by a dataset). Cheers, -- Francesc Alted