![](https://secure.gravatar.com/avatar/512cd9d8ff9ea17006ee87fa144f47fe.jpg?s=120&d=mm&r=g)
Timeseries is an awesome package. Great contribution. I have 2 questions about it, though. 1. Is scipy-user the right place for questions? 2. I've noticed that 'business frequency' includes holidays, and that can create holes in what are actually complete data sets. For instance, Sep 01, 2008 was a holiday in the US (Labor Day). However, it is included in a DateArray spanning that date. For instance. In [640]: ts.date_array(ts.Date('B','2008-08-25'), length=12) Out[640]: DateArray([25-Aug-2008, 26-Aug-2008, 27-Aug-2008, 28-Aug-2008, 29- Aug-2008, 01-Sep-2008, 02-Sep-2008, 03-Sep-2008, 04-Sep-2008, 05-Sep-2008, 08-Sep-2008, 09-Sep-2008], freq='B') This makes stock ticker data look like it's incomplete - no data for Sep 01, since the markets were closed. For instance, if I use matplotlib.finance.quotes_historical_yahoo to download Intel data, and put that into the date array above, I get the series: masked_array(data = [22.77 22.95 23.21 23.39 22.67 -- 22.39 21.35 20.34 20.43 20.79], mask = [False False False False False True False False False False False], fill_value=1e+20) That has a hole on Sep 1. This matters for things like moving average calculation. Sep 1 should be treated like a Saturday or Sunday, but instead causes a 5-day mov_average calculation to not compute anything from Sep 2 through Sep 7: timeseries([-- -- -- -- 22.998 -- -- -- -- -- 21.06], dates = [25-Aug-2008 ... 08-Sep-2008], freq = B) My question: What is a good way to handle (get rid of?) the holes in the series? thanks, -robert
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
On Nov 27, 2008, at 11:23 AM, Robert Ferrell wrote:
Timeseries is an awesome package. Great contribution. I have 2 questions about it, though.
1. Is scipy-user the right place for questions?
It is
2. I've noticed that 'business frequency' includes holidays, and that can create holes in what are actually complete data sets. For instance, Sep 01, 2008 was a holiday in the US (Labor Day).
Yes, the moniker "business days" is a bit decepetive, as it refers only to days that are not Saturday or Sunday. It'd be too tricky for us to implement holidays, as it'd vary from one place to another (no such things as Thanksgiving in Europe, for example...).
That has a hole on Sep 1. This matters for things like moving average calculation. Sep 1 should be treated like a Saturday or Sunday, but instead causes a 5-day mov_average calculation to not compute anything from Sep 2 through Sep 7:
timeseries([-- -- -- -- 22.998 -- -- -- -- -- 21.06], dates = [25-Aug-2008 ... 08-Sep-2008], freq = B)
My question: What is a good way to handle (get rid of?) the holes in the series?
Mmh. On the top of my head, I'd do something like that: * create a new series by using .compressed on your initial series. You'll get rid of the masked data and will have incomplete dates, but it shouldn't matter. * use your moving average function on the new series. * if needed, reset the missing dates by using fill_missing_dates on the filtered series. Let me know how it goes. P.
![](https://secure.gravatar.com/avatar/512cd9d8ff9ea17006ee87fa144f47fe.jpg?s=120&d=mm&r=g)
On Nov 27, 2008, at 11:40 AM, Pierre GM wrote:
On Nov 27, 2008, at 11:23 AM, Robert Ferrell wrote:
That has a hole on Sep 1. This matters for things like moving average calculation. Sep 1 should be treated like a Saturday or Sunday, but instead causes a 5-day mov_average calculation to not compute anything from Sep 2 through Sep 7:
timeseries([-- -- -- -- 22.998 -- -- -- -- -- 21.06], dates = [25-Aug-2008 ... 08-Sep-2008], freq = B)
My question: What is a good way to handle (get rid of?) the holes in the series?
Mmh. On the top of my head, I'd do something like that: * create a new series by using .compressed on your initial series. You'll get rid of the masked data and will have incomplete dates, but it shouldn't matter. * use your moving average function on the new series. * if needed, reset the missing dates by using fill_missing_dates on the filtered series.
Let me know how it goes. P.
Since the date arrays has holes, I can't use timeseries date range calculations. So, for instance, to get the previous 5 days of data I can't just use series[d-5:d]. Instead I need to (I think) convert to an index, series.date_to_index(d), and then use that index. I'm going to try that, along with using .compressed(), and see how I do. Is there any possibility of allowing user defined frequencies? thanks, -robert
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
Robert: It's always easier to manipulate series withoutmissing data. The trick I gave you earlier about computing a moving average after having removed the missing dates was that, just a trick. However, I'm confident it should work. Unfortunately, there's no easy way to define new frequencies, and it's not on or todo list either. Frequencies are defined in the C part of the code... On Nov 28, 2008, at 12:09 AM, Robert Ferrell wrote:
On Nov 27, 2008, at 11:23 AM, Robert Ferrell wrote:
That has a hole on Sep 1. This matters for things like moving average calculation. Sep 1 should be treated like a Saturday or Sunday, but instead causes a 5-day mov_average calculation to not compute anything from Sep 2 through Sep 7:
timeseries([-- -- -- -- 22.998 -- -- -- -- -- 21.06], dates = [25-Aug-2008 ... 08-Sep-2008], freq = B)
My question: What is a good way to handle (get rid of?) the holes in the series?
Mmh. On the top of my head, I'd do something like that: * create a new series by using .compressed on your initial series. You'll get rid of the masked data and will have incomplete dates, but it shouldn't matter. * use your moving average function on the new series. * if needed, reset the missing dates by using fill_missing_dates on the filtered series.
Let me know how it goes. P.
Since the date arrays has holes, I can't use timeseries date range calculations. So, for instance, to get the previous 5 days of data I can't just use series[d-5:d]. Instead I need to (I think) convert to an index, series.date_to_index(d), and then use that index. I'm going to try that, along with using .compressed(), and see how I do.
Is there any possibility of allowing user defined frequencies?
thanks, -robert _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
![](https://secure.gravatar.com/avatar/512cd9d8ff9ea17006ee87fa144f47fe.jpg?s=120&d=mm&r=g)
On Nov 28, 2008, at 12:03 PM, Pierre GM wrote:
Robert: It's always easier to manipulate series withoutmissing data. The trick I gave you earlier about computing a moving average after having removed the missing dates was that, just a trick. However, I'm confident it should work.
It does work quite well. When I plot I have a few holes in the data (at holidays), but that's about the only issue I haven't resolved.
Unfortunately, there's no easy way to define new frequencies, and it's not on or todo list either. Frequencies are defined in the C part of the code...
How do you (or other users) use the Business frequency? Also, I get this error when I use tsplot: --------------------------------------------------------------------------- <type 'exceptions.AttributeError'> Traceback (most recent call last) /Users/Shared/Develop/Financial/<ipython console> in <module>() /Library/Frameworks/Python.framework/Versions/2.5.2001/lib/python2.5/ site-packages/scikits/timeseries/lib/plotlib.py in tsplot(self, *args, **kwargs) 1021 # when adding a right axis (using add_yaxis), for some reason the 1022 # x axis limits don't get properly set. This gets around the problem -> 1023 if self.get_xlim().tolist() == [0., 1.]: 1024 # if xlim still at default values, autoscale the axis 1025 self.autoscale_view() <type 'exceptions.AttributeError'>: 'tuple' object has no attribute 'tolist' That comes up no matter what kind of data or frequency I'm using (full, valid, etc...). Is that possibly why the cursor won't give me x axis position when I mouse around? thanks again, -robert
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
On Dec 1, 2008, at 1:44 PM, Robert Ferrell wrote:
Unfortunately, there's no easy way to define new frequencies, and it's not on or todo list either. Frequencies are defined in the C part of the code...
How do you (or other users) use the Business frequency?
I'll let other users answer that. I never used that frequency myself.
Also, I get this error when I use tsplot:
Looks familiar... What version of matplotlib and scikits.timeseries are you using?
That comes up no matter what kind of data or frequency I'm using (full, valid, etc...). Is that possibly why the cursor won't give me x axis position when I mouse around?
No. I never took the time to find out what I can't get the x axis position under the cursor either, but the two issues are unrelated: the error you see comes from an update of matplotlib that hasn't been ported yet to scikits.timeseries.
![](https://secure.gravatar.com/avatar/512cd9d8ff9ea17006ee87fa144f47fe.jpg?s=120&d=mm&r=g)
On Dec 1, 2008, at 11:54 AM, Pierre GM wrote:
On Dec 1, 2008, at 1:44 PM, Robert Ferrell wrote:
Unfortunately, there's no easy way to define new frequencies, and it's not on or todo list either. Frequencies are defined in the C part of the code...
How do you (or other users) use the Business frequency?
I'll let other users answer that. I never used that frequency myself.
Also, I get this error when I use tsplot:
Looks familiar... What version of matplotlib and scikits.timeseries are you using?
In [741]: matplotlib.__version__ Out[741]: '0.98.3' In [742]: ts.__version__ Out[742]: '0.67.0.dev-r1570'
That comes up no matter what kind of data or frequency I'm using (full, valid, etc...). Is that possibly why the cursor won't give me x axis position when I mouse around?
No. I never took the time to find out what I can't get the x axis position under the cursor either, but the two issues are unrelated: the error you see comes from an update of matplotlib that hasn't been ported yet to scikits.timeseries.
The error seems benign enough that I can ignore it. -robert
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
Robert, Thx a lot for reporting, I'll take a better look ASAP. On Dec 1, 2008, at 3:21 PM, Robert Ferrell wrote:
On Dec 1, 2008, at 11:54 AM, Pierre GM wrote:
On Dec 1, 2008, at 1:44 PM, Robert Ferrell wrote:
Unfortunately, there's no easy way to define new frequencies, and it's not on or todo list either. Frequencies are defined in the C part of the code...
How do you (or other users) use the Business frequency?
I'll let other users answer that. I never used that frequency myself.
Also, I get this error when I use tsplot:
Looks familiar... What version of matplotlib and scikits.timeseries are you using?
In [741]: matplotlib.__version__ Out[741]: '0.98.3'
In [742]: ts.__version__ Out[742]: '0.67.0.dev-r1570'
That comes up no matter what kind of data or frequency I'm using (full, valid, etc...). Is that possibly why the cursor won't give me x axis position when I mouse around?
No. I never took the time to find out what I can't get the x axis position under the cursor either, but the two issues are unrelated: the error you see comes from an update of matplotlib that hasn't been ported yet to scikits.timeseries.
The error seems benign enough that I can ignore it.
-robert
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
![](https://secure.gravatar.com/avatar/b01bc4d31940adf91800e2f1d8a4abc1.jpg?s=120&d=mm&r=g)
Pierre GM <pgmdevlist <at> gmail.com> writes:
On Nov 27, 2008, at 11:23 AM, Robert Ferrell wrote:
2. I've noticed that 'business frequency' includes holidays, and that can create holes in what are actually complete data sets. For instance, Sep 01, 2008 was a holiday in the US (Labor Day).
Yes, the moniker "business days" is a bit decepetive, as it refers only to days that are not Saturday or Sunday. It'd be too tricky for us to implement holidays, as it'd vary from one place to another (no such things as Thanksgiving in Europe, for example...).
Hi Pierre & Matt, I'm finding the timeseries package very useful but I've also run into the same holidays issue as Robert. I was wondering if a solution of allowing the user to specify the holidays (cf Excel networkdays function) would be feasible? In the following example the user is able to change the function in the descriptor which would allow him/her to specify the holidays in their particular part of the world. I don't claim that this is the best way to do it, but I was wondering if such a scheme could be made to work in the wider context of the timeseries package? Cheers, Dave from scikits.timeseries import Date, DateArray, date_array class _isbusinessday(object): def __init__(self, func): assert callable(func) self.func = func def __get__(self, obj, objtype): return self.func(obj) def __set__(self, obj, func): assert callable(func) self.func = func # class BusinessDateArray(DateArray): isbusinessday = _isbusinessday(lambda x: x.weekday < 5) def __init__(self,*args,**kwargs): super(BusinessDateArray, self).__init__(*args,**kwargs) # dates = date_array(start_date=Date('D','01-Jan-2008'),length=100) dates = BusinessDateArray(dates=dates) print dates.isbusinessday
![](https://secure.gravatar.com/avatar/56b215661867f3b4f4a3b28077de66b3.jpg?s=120&d=mm&r=g)
Dave, On Dec 10, 2008, at 9:23 AM, Dave Hirschfeld wrote:
Pierre GM <pgmdevlist <at> gmail.com> writes:
On Nov 27, 2008, at 11:23 AM, Robert Ferrell wrote:
2. I've noticed that 'business frequency' includes holidays, and that can create holes in what are actually complete data sets. For instance, Sep 01, 2008 was a holiday in the US (Labor Day).
Yes, the moniker "business days" is a bit decepetive, as it refers only to days that are not Saturday or Sunday. It'd be too tricky for us to implement holidays, as it'd vary from one place to another (no such things as Thanksgiving in Europe, for example...).
Hi Pierre & Matt, I'm finding the timeseries package very useful but I've also run into the same holidays issue as Robert. I was wondering if a solution of allowing the user to specify the holidays (cf Excel networkdays function) would be feasible?
Yes and no. No : there's no plan for any user-defined frequency yet, if either. The whole machinery is in C, and it would be *very* tricky for us to implement such a feature. Besides, this 'OpenBusinessDate' frequency is far too local to be developed on a large scale. Yes : This said, there should be a way to take holidays into account, at a small scale. I'm thinking out loud here: Say we come with a list of holidays for a given period of time. We could use that to mask specific dates on a series with Business/ WeekDay frequency. That way, conversion and statistics would still work seamlessly, we'd just be working with masked data. However, we'd still have some problems. A basic one would be to find the value in the series that falls 3 business days after some date: we could start adding 3 to the initial date (in WeekDay frequency), but then we would have to check whether there were some missing data during these 3 days (a vacation), and adjust the result accordingly. Doable, but not straightforward.
In the following example the user is able to change the function in the descriptor which would allow him/her to specify the holidays in their particular part of the world. I don't claim that this is the best way to do it, but I was wondering if such a scheme could be made to work in the wider context of the timeseries package?
We'd be more than happy to incorporate a good subclass of DateArray that takes holidays into account, whether through your scheme or the one I just suggested, and adresses some of the issues I listed above (find the business day that falls 3 days from now). I don't have time to do it myself, I don't think Matt has either, so we'll rely on users to come up with a solution.
from scikits.timeseries import Date, DateArray, date_array
class _isbusinessday(object): def __init__(self, func): assert callable(func) self.func = func def __get__(self, obj, objtype): return self.func(obj) def __set__(self, obj, func): assert callable(func) self.func = func #
class BusinessDateArray(DateArray): isbusinessday = _isbusinessday(lambda x: x.weekday < 5) def __init__(self,*args,**kwargs): super(BusinessDateArray, self).__init__(*args,**kwargs) #
dates = date_array(start_date=Date('D','01-Jan-2008'),length=100) dates = BusinessDateArray(dates=dates) print dates.isbusinessday
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
![](https://secure.gravatar.com/avatar/b01bc4d31940adf91800e2f1d8a4abc1.jpg?s=120&d=mm&r=g)
Pierre GM <pgmdevlist <at> gmail.com> writes:
Dave,
On Dec 10, 2008, at 9:23 AM, Dave Hirschfeld wrote:
I was wondering if a solution of allowing the user to specify the holidays (cf Excel networkdays function) would be feasible?
Yes and no. No : there's no plan for any user-defined frequency yet, if either. The whole machinery is in C, and it would be *very* tricky for us to implement such a feature. Yes : This said, there should be a way to take holidays into account, at a small scale.
I was afraid it could be difficult to incorporate in a more general setting. Thanks for the quick reply. -Dave
participants (3)
-
Dave Hirschfeld
-
Pierre GM
-
Robert Ferrell