[Numpy-discussion] Re: [SciPy-user] Table like array

Wed Mar 1 10:12:06 EST 2006

Hi,
Thanks for your link, Niklas, as it looks interesting.

The ability to create some sort of data array object would be very
useful for me. The record array was not sufficiently flexible when I
looked especially for missing values. The approach that I used was a
class that defined, amongst other things,  a masked array to be able
to handle missing values and a dictionary to recode alphanumeric
values into numeric values. The downside is that there is overhead in
converting this to use in other functions and in some cases writing
new functions.

Bruce

On 3/1/06, Niklas Volbers <Mithrandir42 at web.de> wrote:
> [[Oops, I accidentally sent my post to scipy-users, but it was intended for numpy-discussion, so here's another attempt]]
>
>
> Hey Michael,
>
> take a look at my attempt of such a Table implementation!
>
> The newest release 0.5.2 of my plotting project
>
> http://developer.berlios.de/projects/sloppyplot
>
> contains a Table class (in Sloppy.Base.dataset) which wraps a heterogeneous numpy array. The class should be fairly self-documenting, at least I hope so. Don't get confused by the 'undolist' stuff, this is my private undo implementation which could be easily removed from the code.
>
> If you want a similar implementation using a list of 1-dimensional arrays, then download the previous release 0.5.1 (which uses Numeric).
>
> The reason I switched over to the heterogeneous approach was that it is easier to provide similar wrappers for 2-d table like data (using a 2d heterogeneous array) and for 2-d matrix like data (using a 2d homogeneous array). Using a list of arrays gives you some problems when you would like to access the rows, because then you are in charge of creating a new 1-d array that represents the row, while with the heterogeneous array you can access both the columns and the rows quite naturally.
>
> By the way, I had first planned to subclass ndarray, but I did not know how to resize the array and still keep the array as such persistent. This is why I wrapped the array into a class called 'Dataset' which you can consider constant.
>
> If you need some more help on this, feel free to ask,
>
> Niklas Volbers.
>
>
>
>
>
> Travis Oliphant schrieb am 01.03.06 08:16:15:
> >
> > Michael Sorich wrote:
> >
> > > Hi,
> > >
> > > I am looking for a table like array. Something like a 'data frame'
> > > object to those familiar with the statistical languages R and Splus.
> > > This is mainly to hold and manipulate 2D spreadsheet like data, which
> > > tends to be of relatively small size (compared to what many people
> > > seem to use numpy for), heterogenous, have column and row names, and
> > > often contains missing data.
> >
> > You could subclass the ndarray to produce one of these fairly easily, I
> > think. The missing data item could be handled by a mask stored along
> > with the array (or even in the array itself). Or you could use a masked
> > array as your core object (though I'm not sure how it handles the
> > arbitrary (i.e. record-like) data-types yet).
> >
> > Alternatively, and probably the easiest way to get started, you could
> > just create your own table-like class and use simple 1-d arrays or 1-d
> > masked arrays for each of the columns --- This has always been a way to
> > store record-like tables.
> >
> > It really depends what you want the data-frames to be able to do and
> > what you want them to "look-like."
> >
> > > A RecArray seems potentially useful, as it allows different fields to
> > > have different data types and holds the name of the field. However it
> > > doesn't seem easy to manipulate the data. Or perhaps I am simply
> > > having difficulty finding documentation on there features.
> >
> > Adding a new column/field means basically creating a new array with a
> > new data-type and copying data over into the already-defined fields.
> > Data-types always have a fixed number of bytes per item. What those
> > bytes represent can be quite arbitrary but it's always fixed. So, it
> > is always "more work" to insert a new column. You could make that
> > seamless in your table class so the user doesn't see it though.
> >
> > You'll want to thoroughly understand the dtype object including it's
> > attributes and methods. Particularly the fields attribute of the dtype
> > object.
> >
> > > eg
> > > adding a new column/field (and to a lesser extent a new row/record) to
> > > the recarray
> >
> > Adding a new row or record is actually similar because once an array is
> > created it is usually resized by creating another array and copying the
> > old array into it in the right places.
> >
> > > Changing the field/column names
> > > make a new table by selecting a subset of fields/columns. (you can
> > > select a single field/column, but not multiple).
> >
> > Right. So far you can't select multiple columns. It would be possible
> > to add this feature with a little-bit of effort if there were a strong
> > demand for it, but it would be much easier to do it in your subclass
> > and/or container class.
> >
> > How many people would like to see x['f1','f2','f5'] return a new array
> > with a new data-type descriptor constructed from the provided fields?
> >
> > > It would also be nice for the table to be able to deal easily with
> > > masked data (I have not tried this with recarray yet) and perhaps also
> > > to be able to give the rows/records unique ids that could be used to
> > > select the rows/records (in addition to the row/record index), in the
> > > same way that the fieldnames can select the fields.
> >
> > Adding fieldnames to the "rows" is definitely something that a subclass
> > would be needed for. I'm not sure how you would even propose to select
> > using row names. Would you also use getitem semantics?
> >
> > > Can anyone comment on this issue? Particularly whether code exists for
> > > this purpose, and if not ideas about how best to go about developing
> > > such a Table like array (this would need to be limited to python
> > > programing as my ability to program in c is very limited).
> >
> > I don't know of code that already exists for this, but I don't think it
> > would be too hard to construct your own data-frame object.
> >
> > I would probably start with an implementation that just used standard
> > arrays of a particular type to represent the internal columns and then
> > handle the indexing using your own over-riding of the __getitem__ and
> > __setitem__ special methods. This would be the easiest to get working,
> > I think.
> >
> > -Travis
>
> ______________________________________________________________
> Verschicken Sie romantische, coole und witzige Bilder per SMS!
> Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> that extends applications into web and mobile media. Attend the live webcast
> and join the prime developer group breaking into this new coding territory!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>