[Numpy-discussion] fixing up datetime
Wes McKinney
wesmckinn at gmail.com
Wed Jun 8 05:57:07 EDT 2011
On Wed, Jun 8, 2011 at 7:36 AM, Chris Barker <Chris.Barker at noaa.gov> wrote:
> On 6/7/11 4:53 PM, Pierre GM wrote:
>> Anyhow, each time yo
>> read 'frequency' in scikits.timeseries, think 'unit'.
> or maybe "precision" -- when I think if unit, I think of something that
> can be represented as a floating point value -- but here, with integers,
> it's the precision that can be represented. Just a thought.
>> Well, it can be argued that the epoch is 0...
> yes, but that really should be transparent to the user -- what epoch is
> chosen should influence as little as possible (e.g. only the range of
> values representable)
>> Mmh. How would you define a quarter unit ? [3M] ? But then, what if
>> you want your year to start in December, say (we often use
>> DJF/MAM/JJA/SON as a way to decompose a year in four 'hydrological'
>> seasons, for example)
> And the federal fiscal year is Oct - Sept, so the first quarter is (Oct,
> Nov, Dec) -- clearly that needs to be flexible.
> -Chris
> --
> Christopher Barker, Ph.D.
> Oceanographer
> Emergency Response Division
> NOAA/NOS/OR&R (206) 526-6959 voice
> 7600 Sand Point Way NE (206) 526-6329 fax
> Seattle, WA 98115 (206) 526-6317 main reception
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
Your guys' discussion is a bit overwhelming for me in my currently
jet-lagged state ( =) ) but I thought I would comment on a couple
things, especially now with the input of another financial Python user
Note that I use scikits.timeseries very little for a few reasons (a
bit OT, but...):
- Fundamental need to be able to work with multiple time series,
especially performing operations involving cross-sectional data
- I think it's a bit hard for lay people to use (read: ex-MATLAB/R
users). This is just my opinion, but a few years ago I thought about
using it and concluded that teaching people how to properly use it (a
precision tool, indeed!) was going to cause me grief.
- The data alignment problem, best explained in code:
In [8]: ts
2000-01-05 00:00:00 0.0503706684002
2000-01-12 00:00:00 -1.7660004939
2000-01-19 00:00:00 1.11716758554
2000-01-26 00:00:00 -0.171029995265
2000-02-02 00:00:00 -0.99876580126
2000-02-09 00:00:00 -0.262729046405
In [9]: ts.index
<class 'pandas.core.daterange.DateRange'>
offset: <1 Week: kwds={'weekday': 2}, weekday=2>, tzinfo: None
[2000-01-05 00:00:00, ..., 2000-02-09 00:00:00]
length: 6
In [10]: ts2 = ts[:4]
In [11]: ts2.index
<class 'pandas.core.daterange.DateRange'>
offset: <1 Week: kwds={'weekday': 2}, weekday=2>, tzinfo: None
[2000-01-05 00:00:00, ..., 2000-01-26 00:00:00]
length: 4
In [12]: ts + ts2
2000-01-05 00:00:00 0.1007413368
2000-01-12 00:00:00 -3.5320009878
2000-01-19 00:00:00 2.23433517109
2000-01-26 00:00:00 -0.34205999053
2000-02-02 00:00:00 NaN
2000-02-09 00:00:00 NaN
Or ts / or ts2 could be completely DateRange-naive (e.g. they have no
way of knowing that they are fixed-frequency), or even out of order,
and stuff like this will work no problem. I view the "fixed frequency"
issue as sort of an afterthought-- if you need it, it's there for you
(the DateRange class is a valid Index--"label vector"--for pandas
objects, and provides an API for defining custom time deltas). Which
leads me to:
- Inability to derive custom offsets:
I can do:
In [14]: ts.shift(2, offset=2 * datetools.BDay())
2000-01-11 00:00:00 0.0503706684002
2000-01-18 00:00:00 -1.7660004939
2000-01-25 00:00:00 1.11716758554
2000-02-01 00:00:00 -0.171029995265
2000-02-08 00:00:00 -0.99876580126
2000-02-15 00:00:00 -0.262729046405
or even generate, say, 5-minutely or 10-minutely date ranges thusly:
In [16]: DateRange('6/8/2011 5:00', '6/8/2011 12:00',
<class 'pandas.core.daterange.DateRange'>
offset: <5 Minutes>, tzinfo: None
[2011-06-08 05:00:00, ..., 2011-06-08 12:00:00]
length: 85
I'm currently working on high perf reduceat-based resampling methods
(e.g. converting secondly data to 5-minutely data).
So in summary, w.r.t. time series data and datetime, the only things I
care about from a datetime / pandas point of view:
- Ability to easily define custom timedeltas
- Generate datetime objects, or some equivalent, which can be used to
back pandas data structures
- (possible now??) Ability to have a set of frequency-naive dates
(possibly not in order).
This last point actually matters. Suppose you wanted to get the worst
5-performing days in the S&P 500 index:
In [7]: spx.index
<class 'pandas.core.daterange.DateRange'>
offset: <1 BusinessDay>, tzinfo: None
[1999-12-31 00:00:00, ..., 2011-05-10 00:00:00]
length: 2963
# but this is OK
In [8]: spx.order()[:5]
2008-10-15 00:00:00 -0.0903497960942
2008-12-01 00:00:00 -0.0892952780505
2008-09-29 00:00:00 -0.0878970494885
2008-10-09 00:00:00 -0.0761670761671
2008-11-20 00:00:00 -0.0671229140321
- W
More information about the NumPy-Discussion
mailing list