[Numpy-discussion] Tools / data structures for statistical analysis and related applications

Yaroslav Halchenko lists at onerussian.com
Fri Jun 11 23:10:46 EDT 2010


On Fri, 11 Jun 2010, Keith Goodman wrote:
> > For this purpose, I like Fernando's data array:
> > http://github.com/fperez/datarray A very simple subclass of ndarrays
> > that answers my most-wanted feature in terms of richer data
> > structures.
> > >...<
> I looks like datarray labels the axes. The data object in pandas and
> la, on the other hand, label the elements along each axis.
And, just to expose what we neededi for our project, in PyMVPA we
have a Dataset (or very base class AttrDataset)
http://github.com/hanke/PyMVPA/blob/master/mvpa/base/dataset.py#L31
which has not unique labels per each row/column, but rather collections
of possibly non-unique values assigned to each row/column.  Often we
have attributes repeating across different rows or columns, but they
might become unique "identifiers" if multiple attributes are considered
at the same time (e.g. if you had attributes 'day', 'month', 'year',
only taking all 3 of them together would uniquely identify the entry).
Most of the time we are interested in groups of rows/columns sharing the
same attribute (e.g. a label for a sample in classification problems).

-- 
                                  .-.
=------------------------------   /v\  ----------------------------=
Keep in touch                    // \\     (yoh@|www.)onerussian.com
Yaroslav Halchenko              /(   )\               ICQ#: 60653192
                   Linux User    ^^-^^    [175555]





More information about the NumPy-Discussion mailing list