I just started working on a time-series module/class in scipy/numpy and it seemed useful to have some of the R data-frame functionality (i.e., select columns of data based on variable names). I tried rec-arrays but couldn't get them to work the way I wanted. I also looked at the Dataframe class by Andrew Straw but at over 400 lines of code that seemed pretty complicated, to me at least. I searched the mailing-list archives and found a discussion on 'Table like array' (see exert below). To get the minimal functionality discussed, I wrote a simple class (see attached) to try and implement X.get('a','c') where 'a' and 'c' are variables names linked to columns of data in X. I added some test code so that if you run the code in the attachment you will see that is seems to work. However, since this is my first class I'd appreciate your input on the approach I used and any suggestions on how to improve the class (or use something else). I'd like to read the data and variable names directly from a single csv file. I tried this through the python csv module but it would read all data as strings and I couldn't figure out how to easily separate the variable names and the data. Thanks, Vincent
[Numpy-discussion] Re: [SciPy-user] Table like array Paul Barrett pebarrett at gmail.com Wed Mar 1 06:45:02 CST 2006
On 3/1/06, Travis Oliphant <oliphant.travis at ieee.org> wrote:
How many people would like to see x['f1','f2','f5'] return a new array with a new data-type descriptor constructed from the provided fields?
I'm surprised that it's not already available.
-- Paul
If your main concern is to store scientific data on disk you might try: http://www.pytables.org/moin However, it uses numarray internally and a C library, which you have to build from source. (You use a Mac right?) Concerning your code: - Your two file solution seems impractical to me. I think you should just pickle your whole dbase object. - Maybe you should write 'load' and 'store' methods that create the temporary file, Pickler and Unpickler objects. -The __init__ method should then construct the object from a list of variable names and an array. -Offcourse you need a set method. more ideas: - A special variable name 'time'. Then you can implement a getAtTime( varNameList, timePoint) method with interpolation. - A 'plot' method that works like matplotlib's plot function. - An extract(varNameList) method, that returns a new dbase object with only the selected variables. - A companion class that can hold several time series at once to compare different experiments. Finally, post the code to the mailing list. At least I would like to use such a class :-). Yours Eike.
Thanks for the input Eike. I will add load and store methods to Pickle/UnPickle the object. I have got to get the data into the class first however from an ascii file (txt or csv). I'd like to read the data and variable names directly from a single csv file. I tried this through the python csv module but it would read all data as strings and I couldn't figure out how to easily separate the variable names and the data. I you have any suggestion on how I might do this please let me know. Unfortunately I don't know what a 'set' method is or would do :) Could you point to an example perhaps. I like your ideas for extending the class. I'll look into that when I get the basic class working. Best, Vincent On 12/28/06 12:54 PM, "Eike Welk" <eike.welk@gmx.net> wrote:
If your main concern is to store scientific data on disk you might try: http://www.pytables.org/moin
However, it uses numarray internally and a C library, which you have to build from source. (You use a Mac right?)
Concerning your code: - Your two file solution seems impractical to me. I think you should just pickle your whole dbase object. - Maybe you should write 'load' and 'store' methods that create the temporary file, Pickler and Unpickler objects. -The __init__ method should then construct the object from a list of variable names and an array. -Offcourse you need a set method.
more ideas: - A special variable name 'time'. Then you can implement a getAtTime( varNameList, timePoint) method with interpolation. - A 'plot' method that works like matplotlib's plot function. - An extract(varNameList) method, that returns a new dbase object with only the selected variables. - A companion class that can hold several time series at once to compare different experiments.
Finally, post the code to the mailing list. At least I would like to use such a class :-).
Yours Eike.
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Vincent R. Nijs Assistant Professor of Marketing Kellogg School of Management, Northwestern University 2001 Sheridan Road, Evanston, IL 60208-2001 Phone: +1-847-491-4574 Fax: +1-847-491-2498 E-mail: v-nijs@kellogg.northwestern.edu Skype: vincentnijs
Based on Eike's input the dbase class can now also load and dump (simple) csv and pickle files. See the tests at the bottom of the file and the doc-strings. If there is an easy way to read array data + variable names using the csv module it would be great if that could be added to cookbook/InputOutput. I couldn't figure out how to do it. Eike: I think I can figure out how to add a plot method. However, if you have some more suggestions on how to implement the getAtTime, extract, and set methods you mentioned that would be great. Vincent On 12/28/06 1:40 PM, "Vincent Nijs" <v-nijs@kellogg.northwestern.edu> wrote:
Thanks for the input Eike.
I will add load and store methods to Pickle/UnPickle the object. I have got to get the data into the class first however from an ascii file (txt or csv).
I'd like to read the data and variable names directly from a single csv file. I tried this through the python csv module but it would read all data as strings and I couldn't figure out how to easily separate the variable names and the data. I you have any suggestion on how I might do this please let me know.
Unfortunately I don't know what a 'set' method is or would do :) Could you point to an example perhaps.
I like your ideas for extending the class. I'll look into that when I get the basic class working.
Best,
Vincent
On 12/28/06 12:54 PM, "Eike Welk" <eike.welk@gmx.net> wrote:
If your main concern is to store scientific data on disk you might try: http://www.pytables.org/moin
However, it uses numarray internally and a C library, which you have to build from source. (You use a Mac right?)
Concerning your code: - Your two file solution seems impractical to me. I think you should just pickle your whole dbase object. - Maybe you should write 'load' and 'store' methods that create the temporary file, Pickler and Unpickler objects. -The __init__ method should then construct the object from a list of variable names and an array. -Offcourse you need a set method.
more ideas: - A special variable name 'time'. Then you can implement a getAtTime( varNameList, timePoint) method with interpolation. - A 'plot' method that works like matplotlib's plot function. - An extract(varNameList) method, that returns a new dbase object with only the selected variables. - A companion class that can hold several time series at once to compare different experiments.
Finally, post the code to the mailing list. At least I would like to use such a class :-).
Yours Eike.
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Vincent R. Nijs Assistant Professor of Marketing Kellogg School of Management, Northwestern University 2001 Sheridan Road, Evanston, IL 60208-2001 Phone: +1-847-491-4574 Fax: +1-847-491-2498 E-mail: v-nijs@kellogg.northwestern.edu Skype: vincentnijs
Sorry for the extra post. There were are few errors in the previous attachment. Vincent On 12/28/06 5:39 PM, "Vincent Nijs" <v-nijs@kellogg.northwestern.edu> wrote:
Based on Eike's input the dbase class can now also load and dump (simple) csv and pickle files. See the tests at the bottom of the file and the doc-strings.
If there is an easy way to read array data + variable names using the csv module it would be great if that could be added to cookbook/InputOutput. I couldn't figure out how to do it.
Eike: I think I can figure out how to add a plot method. However, if you have some more suggestions on how to implement the getAtTime, extract, and set methods you mentioned that would be great.
Vincent
On 12/28/06 1:40 PM, "Vincent Nijs" <v-nijs@kellogg.northwestern.edu> wrote:
Thanks for the input Eike.
I will add load and store methods to Pickle/UnPickle the object. I have got to get the data into the class first however from an ascii file (txt or csv).
I'd like to read the data and variable names directly from a single csv file. I tried this through the python csv module but it would read all data as strings and I couldn't figure out how to easily separate the variable names and the data. I you have any suggestion on how I might do this please let me know.
Unfortunately I don't know what a 'set' method is or would do :) Could you point to an example perhaps.
I like your ideas for extending the class. I'll look into that when I get the basic class working.
Best,
Vincent
On 12/28/06 12:54 PM, "Eike Welk" <eike.welk@gmx.net> wrote:
If your main concern is to store scientific data on disk you might try: http://www.pytables.org/moin
However, it uses numarray internally and a C library, which you have to build from source. (You use a Mac right?)
Concerning your code: - Your two file solution seems impractical to me. I think you should just pickle your whole dbase object. - Maybe you should write 'load' and 'store' methods that create the temporary file, Pickler and Unpickler objects. -The __init__ method should then construct the object from a list of variable names and an array. -Offcourse you need a set method.
more ideas: - A special variable name 'time'. Then you can implement a getAtTime( varNameList, timePoint) method with interpolation. - A 'plot' method that works like matplotlib's plot function. - An extract(varNameList) method, that returns a new dbase object with only the selected variables. - A companion class that can hold several time series at once to compare different experiments.
Finally, post the code to the mailing list. At least I would like to use such a class :-).
Yours Eike.
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Vincent R. Nijs Assistant Professor of Marketing Kellogg School of Management, Northwestern University 2001 Sheridan Road, Evanston, IL 60208-2001 Phone: +1-847-491-4574 Fax: +1-847-491-2498 E-mail: v-nijs@kellogg.northwestern.edu Skype: vincentnijs
On Friday 29 December 2006 00:39, Vincent Nijs wrote:
Eike: I think I can figure out how to add a plot method. However, if you have some more suggestions on how to implement the getAtTime, extract, and set methods you mentioned that would be great. Set method: I thought of a method to change the the data. Something like: myDb.set(varNameList, dataArray)
Extract method: A way to get an other dbase object with a subset of variables. Tomorrow I'll propose an implementation. Because your __init__ method wants a file name it needs to be changed too. GetAtTime: Maybe your data are samples from some continuous process or function. Then you might want to have values from between the stored timepoints. You could compute them trough interpolation. The following class from scipy will do the job: http://www.scipy.org/doc/api_docs/scipy.interpolate.interpolate.interp1d.htm... Yours Eike.
Vincent Nijs schrieb:
If there is an easy way to read array data + variable names using the csv module it would be great if that could be added to cookbook/InputOutput. I couldn't figure out how to do it.
Hi Vincent, of course it depends a little on how exactly your csv file looks like, but if you just have column headers and the actual data, you might try something like the following: import csv from numpy import mat # or array if you like file_to_read = file(filename, 'r') read_from = csv.reader(file_to_read, skipinitialspace = True) obslist = [] datalist = [] for line in read_from: obslist.append(line[0]) datalist.append(line[1:]) file_to_read.close() # (datalist should now be a nested list, first index rows, second # columns) # (still contains the headers) varnames = datalist.pop(0) # now the real data data = mat(datalist, dtype = float) -sven
Sven Schreiber schrieb:
Hi Vincent, of course it depends a little on how exactly your csv file looks like, but if you just have column headers and the actual data, you might try something like the following:
Ok sorry the previous thing doesn't work, I also stumbled over the strings. Here's the next attempt, also shorter. (this time even tested ;-) import csv from numpy import mat read_from = csv.reader(file(filename, 'r'), skipinitialspace = True) stringlist = [ line for line in read_from ] varnames = stringlist.pop(0)[1:] datalist = [ map(float, line[1:]) for line in stringlist ] # now the real data data = mat(datalist, dtype = float) I actually quite like it... python lists are very nice. This discards the observation labels, but it's not difficult to add that, of course. -sven
participants (3)
-
Eike Welk
-
Sven Schreiber
-
Vincent Nijs