[Numpy-discussion] Question on timeseries, for financial application

Sun Dec 13 09:27:19 EST 2009

On Dec 13, 2009, at 7:07 AM, josef.pktd at gmail.com wrote:

> On Sun, Dec 13, 2009 at 3:31 AM, Pierre GM <pgmdevlist at gmail.com>  
> wrote:
>> On Dec 13, 2009, at 12:11 AM, Robert Ferrell wrote:
>>> Have you considered creating a TimeSeries for each data series, and
>>> then putting them all together in a dict, keyed by symbol?
>>
>> That's an idea
>
> As far as I understand, that's what pandas.DataFrame does.
> pandas.DataMatrix used 2d array to store data
>
>>
>>> One disadvantage of one big monster numpy array for all the series  
>>> is
>>> that not all series may have a full set of 1800 data points.  So the
>>> array isn't really nicely rectangular.
>>
>> Bah, there's adjust_endpoints to take care of that.
>>
>>>
>>> Not sure exactly what kind of analysis you want to do, but  
>>> grabbing a
>>> series from a dict is quite fast.
>>
>> Thomas, as robert F. pointed out, everything depends on the kind of  
>> analysis you want. If you want to normalize your series, having all  
>> of them in a big array is the best (plain array, not structured, so  
>> that you can apply .mean and .std directly without having to loop  
>> on the series). If you need to apply the same function over all the  
>> series, here again having a big ndarray is easiest. Give us an  
>> example of what you wanna do.
>
> Or a structured array with homogeneous type that allows fast creation
> of views for data analysis.

These kinds of financial series don't have that much data (speaking  
from the early 21st century point of view).  The OP says 1000 series,  
1800 observations per series.  Maybe 5 data items per observation, 4  
bytes each.  That's well under 50MB.  I've found it satisfactory to  
keep the data someplace that's handy to get at, and easy to use.  When  
I want to do analysis I pull it into whatever format is best for that  
analysis.  Depending on the needs, it may not be necessary to try to  
arrange the data so you can get a view for analysis - the time for a  
copy can be negligible if the analysis takes a while.

-r