NumPy-Discussion

Download

numpy-discussion@python.org

March 2006

106 participants
171 discussions

Re: [Numpy-discussion] Re: [SciPy-user] Table like array
by Niklas Volbers 01 Mar '06

01 Mar '06

[[Oops, I accidentally sent my post to scipy-users, but it was intended for numpy-discussion, so here's another attempt]] Hey Michael, take a look at my attempt of such a Table implementation! The newest release 0.5.2 of my plotting project http://developer.berlios.de/projects/sloppyplot contains a Table class (in Sloppy.Base.dataset) which wraps a heterogeneous numpy array. The class should be fairly self-documenting, at least I hope so. Don't get confused by the 'undolist' stuff, this is my private undo implementation which could be easily removed from the code. If you want a similar implementation using a list of 1-dimensional arrays, then download the previous release 0.5.1 (which uses Numeric). The reason I switched over to the heterogeneous approach was that it is easier to provide similar wrappers for 2-d table like data (using a 2d heterogeneous array) and for 2-d matrix like data (using a 2d homogeneous array). Using a list of arrays gives you some problems when you would like to access the rows, because then you are in charge of creating a new 1-d array that represents the row, while with the heterogeneous array you can access both the columns and the rows quite naturally. By the way, I had first planned to subclass ndarray, but I did not know how to resize the array and still keep the array as such persistent. This is why I wrapped the array into a class called 'Dataset' which you can consider constant. If you need some more help on this, feel free to ask, Niklas Volbers. Travis Oliphant schrieb am 01.03.06 08:16:15: > > Michael Sorich wrote: > > > Hi, > > > > I am looking for a table like array. Something like a 'data frame' > > object to those familiar with the statistical languages R and Splus. > > This is mainly to hold and manipulate 2D spreadsheet like data, which > > tends to be of relatively small size (compared to what many people > > seem to use numpy for), heterogenous, have column and row names, and > > often contains missing data. > > You could subclass the ndarray to produce one of these fairly easily, I > think. The missing data item could be handled by a mask stored along > with the array (or even in the array itself). Or you could use a masked > array as your core object (though I'm not sure how it handles the > arbitrary (i.e. record-like) data-types yet). > > Alternatively, and probably the easiest way to get started, you could > just create your own table-like class and use simple 1-d arrays or 1-d > masked arrays for each of the columns --- This has always been a way to > store record-like tables. > > It really depends what you want the data-frames to be able to do and > what you want them to "look-like." > > > A RecArray seems potentially useful, as it allows different fields to > > have different data types and holds the name of the field. However it > > doesn't seem easy to manipulate the data. Or perhaps I am simply > > having difficulty finding documentation on there features. > > Adding a new column/field means basically creating a new array with a > new data-type and copying data over into the already-defined fields. > Data-types always have a fixed number of bytes per item. What those > bytes represent can be quite arbitrary but it's always fixed. So, it > is always "more work" to insert a new column. You could make that > seamless in your table class so the user doesn't see it though. > > You'll want to thoroughly understand the dtype object including it's > attributes and methods. Particularly the fields attribute of the dtype > object. > > > eg > > adding a new column/field (and to a lesser extent a new row/record) to > > the recarray > > Adding a new row or record is actually similar because once an array is > created it is usually resized by creating another array and copying the > old array into it in the right places. > > > Changing the field/column names > > make a new table by selecting a subset of fields/columns. (you can > > select a single field/column, but not multiple). > > Right. So far you can't select multiple columns. It would be possible > to add this feature with a little-bit of effort if there were a strong > demand for it, but it would be much easier to do it in your subclass > and/or container class. > > How many people would like to see x['f1','f2','f5'] return a new array > with a new data-type descriptor constructed from the provided fields? > > > It would also be nice for the table to be able to deal easily with > > masked data (I have not tried this with recarray yet) and perhaps also > > to be able to give the rows/records unique ids that could be used to > > select the rows/records (in addition to the row/record index), in the > > same way that the fieldnames can select the fields. > > Adding fieldnames to the "rows" is definitely something that a subclass > would be needed for. I'm not sure how you would even propose to select > using row names. Would you also use getitem semantics? > > > Can anyone comment on this issue? Particularly whether code exists for > > this purpose, and if not ideas about how best to go about developing > > such a Table like array (this would need to be limited to python > > programing as my ability to program in c is very limited). > > I don't know of code that already exists for this, but I don't think it > would be too hard to construct your own data-frame object. > > I would probably start with an implementation that just used standard > arrays of a particular type to represent the internal columns and then > handle the indexing using your own over-riding of the __getitem__ and > __setitem__ special methods. This would be the easiest to get working, > I think. > > -Travis ______________________________________________________________ Verschicken Sie romantische, coole und witzige Bilder per SMS! Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193

2 1

Jump to page: