interest in Time series functionality?
I hope this is the correct mailing list to post this kind of question to... and if not, my apologies. I work in the quantitative finance division of a financial services company in Canada and the last year or so we have been doing a lot more python based work. Most of the data we work with is time series data (stock price data, etc) and we have traditionally used FAME (a product of sungard) to store and manipulate this data. We have developed a python api on top of the included FAME c api to access FAME data from python, but the problem now is that there aren't any available python libraries (to my knowledge) for manipulating time series data in any way that comes close to the power of FAME. Our motivation for being able to manipulate this data in Python is primarily for web-based applications. Although being able to bring this data into the python world opens up many other possibilities for us as well (FAME has no matrix capabilities whatsoever, which is pretty sad really given their target market). We have done some preliminary work in developing a time series class built on top of the numpy array class, and it has gotten to the point where it works reasonably well for what we are using it for, although I'm certain it could be optimized a great deal. The key features of this module are: - works with different frequencies of data (currently supports monthly, daily, business days, and secondly frequencies) - able to index the time series directly by date objects (from a custom date class we have created) - handle missing values (along the lines of masked arrays) - global module settings to dictate how certain scenarios are handled - perform operations on time series that do not necessarily have the same start/end dates (+,-,*,/) (and handle missing values appropriately in the operation according to certain global option settings). This involves an implicit resizing of the arrays. - perform operations on time series that do not have the same frequency and perform implicit frequency conversions (according to certain global option settings). Again, this involves implicitly resizing the arrays We have basically attempted to model the time series functionality to be similar to how FAME handles it since that works reasonably well. I'm wondering if there is any kind of interest in this? Our group consists mostly of financial practitioners and engineers, not really pure software developers, so if somebody is interested in taking this to the next level I would be willing to release the code (both the FAME api, and the time series module) if someone wanted to improve upon this and share their improvements in the future. The code is definitely not a polished product right now, but it is functional. If you have any thoughts on this (positive or negative) I would love to hear them. Thanks, - Matt Knox _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
On Sat, 22 Apr 2006, Matt Knox apparently wrote:
Our group consists mostly of financial practitioners and engineers, not really pure software developers, so if somebody is interested in taking this to the next level I would be willing to release the code (both the FAME api, and the time series module) if someone wanted to improve upon this and share their improvements in the future. The code is definitely not a polished product right now, but it is functional.
If you have any thoughts on this (positive or negative) I would love to hear them.
I was hoping someone else would respond first, but since they have not, I will provide a smidgen of feedback. I hope you will release the code in advance of the reassurances you seek. It sounds useful, and it sounds likely to attract development effort over time. I am interested in looking at it, if it is released under a liberal license, but I am more a user than a developer. Still, Python is great in that for many applications users can readily contribute to development. The time series module seems to be an obvious candidate for such contributions. The real question, I propose, is where to house the code and how to manage patches. As for the former, it seems to be an obvious candidate for the scipy sandbox. Cheers, Alan Isaac
Alan G Isaac <aisaac <at> american.edu> writes:
On Sat, 22 Apr 2006, Matt Knox apparently wrote:
Our group consists mostly of financial practitioners and engineers, not really pure software developers, so if somebody is interested in taking this to the next level I would be willing to release the code (both the FAME api, and the time series module) if someone wanted to improve upon this and share their improvements in the future. The code is definitely not a polished product right now, but it is functional.
If you have any thoughts on this (positive or negative) I would love to hear them.
I was hoping someone else would respond first, but since they have not, I will provide a smidgen of feedback.
I hope you will release the code in advance of the reassurances you seek. It sounds useful, and it sounds likely to attract development effort over time. I am interested in looking at it, if it is released under a liberal license, but I am more a user than a developer. Still, Python is great in that for many applications users can readily contribute to development. The time series module seems to be an obvious candidate for such contributions.
The real question, I propose, is where to house the code and how to manage patches. As for the former, it seems to be an obvious candidate for the scipy sandbox.
Cheers, Alan Isaac
Thanks for the reply Alan. The code needs some additional spit and polish, reorganization and some additional documentation before it is suitable to be released to the public, but within the next couple of months I hope to be able to find the time to do that. I have no experience writing or managing any kind of open source project (nor do any of my colleagues), so I'm not sure I would be able to offer much in the way of managing patches, etc. Currently the FAME database functionality is sort of mangled in with the time series module, but that can definitely be separated better. Would the FAME api be suitable for the sandbox as well? or just the time series capabilities? I suspect database API's aren't really something people would look for on scipy, but who knows. At any rate, I am fairly certain there is no existing python API for FAME freely available so some people might be interested in that. - Matt Knox
Hey Matt, Matt Knox wrote:
Alan G Isaac <aisaac <at> american.edu> writes:
On Sat, 22 Apr 2006, Matt Knox apparently wrote:
Our group consists mostly of financial practitioners and engineers, not really pure software developers, so if somebody is interested in taking this to the next level I would be willing to release the code (both the FAME api, and the time series module) if someone wanted to improve upon this and share their improvements in the future. The code is definitely not a polished product right now, but it is functional.
If you have any thoughts on this (positive or negative) I would love to hear them.
I was hoping someone else would respond first, but since they have not, I will provide a smidgen of feedback.
I hope you will release the code in advance of the reassurances you seek. It sounds useful, and it sounds likely to attract development effort over time. I am interested in looking at it, if it is released under a liberal license, but I am more a user than a developer. Still, Python is great in that for many applications users can readily contribute to development. The time series module seems to be an obvious candidate for such contributions.
The real question, I propose, is where to house the code and how to manage patches. As for the former, it seems to be an obvious candidate for the scipy sandbox.
Cheers, Alan Isaac
Thanks for the reply Alan.
The code needs some additional spit and polish, reorganization and some additional documentation before it is suitable to be released to the public, but within the next couple of months I hope to be able to find the time to do that. I have no experience writing or managing any kind of open source project (nor do any of my colleagues), so I'm not sure I would be able to offer much in the way of managing patches, etc. Currently the FAME database functionality is sort of mangled in with the time series module, but that can definitely be separated better. Would the FAME api be suitable for the sandbox as well? or just the time series capabilities? I suspect database API's aren't really something people would look for on scipy, but who knows. At any rate, I am fairly certain there is no existing python API for FAME freely available so some people might be interested in that.
- Matt Knox
For the record, Enthought is also very interested in seeing time-series functionality within scipy. I'd love to see the code in the sandbox--coupled to the FAME api or not. It is a sandbox, after all, and the refactoring can be done in the open (if you're comfortable with that). To state the obvious: If it's useful, folks will pile on and it can get integrated into the core of scipy. If not, it won't. Regarding the posting of the code, we at Enthought have been involved in open source for a while now and my best advice is "don't do as poor a job as we do" in making the code available at an early stage to the wider community. We're trying to do a better job of this ourselves. The code we've developed and open-sourced has really suffered as a result of our not disseminating and promoting it earlier in the process. Best, Travis
Travis N. Vaught wrote:
For the record, Enthought is also very interested in seeing time-series functionality within scipy.
So am I. I do a lot of time series analysis (solar activity time series in my case) and am forever re-inventing the wheel here. My reading of Matt's original post sounds like it would be extremely useful to a very large community, namely those solar astronomers who study the solar cycle on daily/weekly/annual/decadal time scales. Steve Walton
For the record, I deal with fMRI time series. I would like to see a proper time series module in scipy, too. Although there are lots of hacks in fMRI to speed computations, a proper time series module would be a good addition. Jonathan Taylor Stephen Walton wrote:
Travis N. Vaught wrote:
For the record, Enthought is also very interested in seeing time-series functionality within scipy.
So am I. I do a lot of time series analysis (solar activity time series in my case) and am forever re-inventing the wheel here. My reading of Matt's original post sounds like it would be extremely useful to a very large community, namely those solar astronomers who study the solar cycle on daily/weekly/annual/decadal time scales.
Steve Walton
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.net http://www.scipy.net/mailman/listinfo/scipy-dev
-- ------------------------------------------------------------------------ I'm part of the Team in Training: please support our efforts for the Leukemia and Lymphoma Society! http://www.active.com/donate/tntsvmb/tntsvmbJTaylor GO TEAM !!! ------------------------------------------------------------------------ Jonathan Taylor Tel: 650.723.9230 Dept. of Statistics Fax: 650.725.8977 Sequoia Hall, 137 www-stat.stanford.edu/~jtaylo 390 Serra Mall Stanford, CA 94305
Matt Knox wrote:
Currently the FAME database functionality is sort of mangled in with the time series module, but that can definitely be separated better. Would the FAME api be suitable for the sandbox as well? or just the time series capabilities? I suspect database API's aren't really something people would look for on scipy, but who knows. At any rate, I am fairly certain there is no existing python API for FAME freely available so some people might be interested in that.
FAME is a proprietary package, right? The website (fame.com) looks expensive. I would really like to see some good tools for handling time series (specifically calendrical time series) in scipy. I hope that as much functionality as possible can be decoupled from FAME. I don't think that wrappers to an expensive database package really belong in scipy although they might be just right for a separate projects.scipy.org project. Of course, the API and conventions that you've established by how the non-FAME bits interact with the FAME bits will probably serve as a useful standard for talking to other databases with time series information. -- Robert Kern robert.kern@gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
As far as Databases go, I'd love to see more people jumping on the HDF5 bandwagon. http://hdf.ncsa.uiuc.edu/HDF5/ They are about to release version 1.8 (1.8alpha already released on April 20). There's some very nice features (A new higher level interface, packet tables for socket streams, dimension scales, imaging etc). Not to mention It's totally portable and PyTables provides an excellent Python/Numpy/Numarray/Numeric API to HDF5, so database usage between Python and other HDF5 APIs (C, FORTRAN, C++, Java) is totally seamless. PyTables has a netCDF conversion facility and also very nice compression features too. See: http://www.pytables.org/moin/PyTables By the way Matt, I'm a financial engineer myself. If I were to find the time, I'd also like to be involved in such a project. Dieter On 4/25/06, Robert Kern <robert.kern@gmail.com> wrote:
Matt Knox wrote:
Currently the FAME database functionality is sort of mangled in with the time series module, but that can definitely be separated better. Would the FAME api be suitable for the sandbox as well? or just the time series capabilities? I suspect database API's aren't really something people would look for on scipy, but who knows. At any rate, I am fairly certain there is no existing python API for FAME freely available so some people might be interested in that.
FAME is a proprietary package, right? The website (fame.com) looks expensive. I would really like to see some good tools for handling time series (specifically calendrical time series) in scipy. I hope that as much functionality as possible can be decoupled from FAME. I don't think that wrappers to an expensive database package really belong in scipy although they might be just right for a separate projects.scipy.org project.
Of course, the API and conventions that you've established by how the non-FAME bits interact with the FAME bits will probably serve as a useful standard for talking to other databases with time series information.
-- Robert Kern robert.kern@gmail.com
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.net http://www.scipy.net/mailman/listinfo/scipy-dev
I'm going to try and do some work this week on extricating the FAME bits from the core time series module, and flesh out the documentation and create some simple examples, then hopefully I will have some code to release towards the end of next week. I am making no claims in regards to the quality/design of this module, I'm sure it is not up to the standards of some of the more experienced developers involved with numpy/scipy. But if we can begin to form a framework for how the core classes in a time series module should behave, then that would probably be a good start. Coming from a FAME background, the functionality really kind of mirrors FAME's approach to time series (or attempts to), but if somebody has a better approach, I'm certainly open to the idea of reshaping my view of the world :) If someone could provide me with some instructions for where/how to post the code, that would be great. I work strictly in the Microsoft windows world, so any *nix jargon will be lost on me. And I have only tried compiling the C portion of the code under windows with numpy 0.9.4, so no promises for other platforms. - Matt Knox
On Wed, 26 Apr 2006, (UTC) Matt Knox apparently wrote:
hopefully I will have some code to release towards the end of next week.
Looking forward to it.
I am making no claims in regards to the quality/design of this module
That's part of the MIT license: THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND Go for it! Cheers, Alan Isaac
Oh, one thing I should mention is that the code currently uses the mx.DateTime module (http://www.egenix.com/files/python/mxDateTime.html) because I have found it to offer a lot more flexibility than python's built-in date/time module. It could *probably* be rewritten to use only the built in date/time module, but I am not going to attempt to do that at this time - Matt Knox
On Wed, 26 Apr 2006, (UTC) Matt Knox apparently wrote:
Oh, one thing I should mention is that the code currently uses the mx.DateTime module (http://www.egenix.com/files/python/mxDateTime.html) because I have found it to offer a lot more flexibility than python's built-in date/time module. It could probably be rewritten to use only the built in date/time module, but I am not going to attempt to do that at this time
IMO, this is a reasonable dependency, even if I might better like reliance on python-dateutil, which is wonderful. (I think Matplotlib uses python-dateutil for time series handling; you might want to take a look.) The only thing I suggest is gracefully handling the import error if the module is not present. (E.g., say where to get it.) Cheers, Alan Isaac
Alan G Isaac wrote:
On Wed, 26 Apr 2006, (UTC) Matt Knox apparently wrote:
Oh, one thing I should mention is that the code currently uses the mx.DateTime module (http://www.egenix.com/files/python/mxDateTime.html) because I have found it to offer a lot more flexibility than python's built-in date/time module. It could probably be rewritten to use only the built in date/time module, but I am not going to attempt to do that at this time
IMO, this is a reasonable dependency, even if I might better like reliance on python-dateutil, which is wonderful. (I think Matplotlib uses python-dateutil for time series handling; you might want to take a look.)
mxDateTime does have (at least) one relevant advantage over python-dateutil: it deals with the various "Julian Day Number" systems that astronomers and some other science fields use. It is also frequently *the* date-time type supported by various database adapters. -- Robert Kern robert.kern@gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
participants (7)
-
Alan G Isaac
-
dHering
-
Jonathan Taylor
-
Matt Knox
-
Robert Kern
-
Stephen Walton
-
Travis N. Vaught