[Numpy-discussion] fixing up datetime

Wed Jun 8 20:47:59 EDT 2011

On Wed, Jun 8, 2011 at 5:59 AM, Dave Hirschfeld
<dave.hirschfeld at gmail.com>wrote:

> Mark Wiebe <mwwiebe <at> gmail.com> writes:
> >
> >
> > It appears to me that a structured dtype with some further NumPy
> extensions
> > could entirely replace the 'events' metadata fairly cleanly. If the
> ufuncs
> > are extended to operate on structured arrays, and integers modulo n are
> > added as a new dtype, a dtype like
> > [('date', 'M8[D]'), ('event', 'i8[mod 100]')] could replace the current
> > 'M8[D]//100'.
>
> Sounds like a cleaner API.
>
> >
> >>
> >> As Dave H. summarized, we used a basic keyword to do the same thing in
> >> scikits.timeseries, with the addition of some subfrequencies like A-SEP
> >> to represent a year starting in September, for example. It works, but
> it's
> >> really not elegant a solution.
> >>
> >
> > This kind of thing definitely belongs in a layer above datetime.
> >
>
> That's fair enough - my perspective as a timeseries user is probably a lot
> higher level. My main aim is to point out some of the higher level uses so
> that
> the numpy dtype can be made compatible with them - I'd hate to have a
> situation where we have multiple different datetime representations
> in the different libraries and having to continually convert at the
> boundaries.
>
> That said, I think the starting point for a series at a weekly, quarterly
> or annual frequency/unit/type is something which may need to be sorted out
> at the lowest level...
>

Yeah, that's what we're trying to figure out. I'm trying to get the
functionality to work in such a way that implementing a system on top doing
all the things you need can be done pretty smoothly. After the next few
merge requests, it should be possible to start experimenting with it to try
things out and get feedback based on that.

> >
> > One overall impression I have about timeseries in general is the use of
> the
> > term "frequency" synonymously with the time unit. To me, a frequency is a
> > numerical quantity with a unit of 1/(time unit), so while it's related to
> > the time unit, naming it the same is something the specific timeseries
> > domain has chosen to do, I think the numpy datetime class shouldn't have
> > anything called "frequency" in it, and I would like to remove the current
> > usage of that terminology from the codebase.
> >
>
> It seems that it's just a naming convention (possibly not the best) and
> can be used synonymously with the "time unit"/resolution/dtype
>
> > I don't envision 'asfreq' being a datetime function, this is the kind
> > of thing that would layer on top in a specialized timeseries library. The
> > behavior of timedelta follows a more physics-like idea with regard to the
> > time unit, and I don't think something more complicated belongs at the
> bottom
> > layer that is shared among all datetime uses.
>
> I think since freq <==> dtype then asfreq <==> astype. From your examples
> it
> seems to do the same thing - i.e. if you go to a lower resolution (freq)
> representation the higher resolution information is truncated - e.g. a
> monthly resolution date has no information about days/hours/minutes/seconds
> etc. It's converting in the other direction: low --> high resolution
> where the difference lies - numpy always converts to the start of the
> interval
> whereas the timeseries Date class gives you the option of the start or the
> end.
>

The fact that it's a NumPy dtype probably is the biggest limiting factor
preventing parameters like 'start' and 'end' during conversion. Having a
datetime represent an instant in time neatly removes any ambiguity, so
converting between days and seconds as a unit is analogous to converting
between int32 and float32.

> I'm thinking of a datetime as an infinitesimal moment in time, with the
> > unit representing the precision that can be stored. Thus, '2011',
> > '2011-01', and '2011-01-01T00:00:00.00Z' all represent the same moment in
> > time, but with a different unit of precision. Computationally, this
> > perspective is turning out to provide a pretty rich mechanism to do
> > operations on date.
>
> I think this is where the main point of difference is. I use the timeseries
> as a container for data which arrives in a certain interval of time. e.g.
> monthly temperatures are the average temperature over the interval defined
> by
> the particular month. It's not the temperature at the instant of time that
> the
> month began, or ended or of any particular instant in that month.
>
> Thus the conversion from a monthly resolution to say a daily resolution
> isn't
> well defined and the right thing to do is likely to be application
> specific.
>
> For the average temperature example you may want to choose a value in the
> middle of the month so that you don't introduce a phase delay if you
> interpolate between the datapoints.
>
> If for example you measured total rainfall in a month you might want to
> choose the last day of the month to represent the total rainfall for that
> month as that was the only date in the interval where the total rainfall
> did in fact equal the monthly value.
>
> It may be as you say though that all this functionality does belong at a
> higher
> level...
>

Would it be possible to mock up a set of examples, or find an existing set
of examples, which exhibit the kinds of computations that are needed? That
could be used to see how easily it is to express the ideas with the numpy
datetime machinery.

Cheers,
Mark

>
> Regards,
> Dave
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110608/4686a486/attachment.html>