[Numpy-discussion] Tabular data package

Mon Oct 5 18:16:42 EDT 2009

hey pierre -- good question. this is something we debated a while ago (we
actually sent a couple of emails over the numpy list about this very topic)
when coming up with our design.  at the time, there did not seem to be
strong opinions either way about using ndarray vs. recarray

the main reason we went with the recarray over the ndarray is because the
recarray has a couple of useful construction functions (e.g.
np.rec.fromrecords and np.rec.fromarrays).  not only are these functions
convenient to use, they have nice data type inference properties which we'd
have to rebuild ourselves if we wanted to avoid recarrays entirely.

It would be fairly straightforward to switch from recarray to ndarray if
this were really an important thing to do (e.g. if recarray were being
deprecated or if most NumPy people have strong feelings about this), and
doing so wouldn't modify anything about the tabarray API.

elaine

On Mon, Oct 5, 2009 at 5:47 PM, Pierre GM <pgmdevlist at gmail.com> wrote:

> Ciao Elaine,
> I just quickly browsed through your code. Say, what's the reason
> behind using np.recarrays instead of just standard ndarrays (with
> flexible dtype). Do you really need the overhead of accessing fields
> as attributes ? It looks like you're always accessing fields as items...
> Cheers
> P.
>
>
>
> On Oct 5, 2009, at 5:22 PM, Elaine Angelino wrote:
>
> > Hi there,
> >
> > We are writing to announce the release of "Tabular", a package of
> > Python modules for working with tabular data.
> >
> > Tabular is a package of Python modules for working with tabular
> > data. Its main object is the tabarray class, a data structure for
> > holding and manipulating tabular data. By putting data into a
> > tabarray object, you’ll get a representation of the data that is
> > more flexible and powerful than a native Python representation. More
> > specifically, tabarray provides:
> >
> > -- ultra-fast filtering, selection, and numerical analysis methods,
> > using convenient Matlab-style matrix operation syntax
> > -- spreadsheet-style operations, including row & column operations,
> > 'sort', 'replace', 'aggregate', 'pivot', and 'join'
> > -- flexible load and save methods for a variety of file formats,
> > including delimited text (CSV), binary, and HTML
> > -- helpful inference algorithms for determining formatting
> > parameters and data types of input files
> > -- support for hierarchical groupings of columns, both as data
> > structures and file formats
> >
> > You can download Tabular from PyPI (http://pypi.python.org/pypi/tabular/
> > ) or alternatively clone our hg repository from bitbucket (
> http://bitbucket.org/elaine/tabular/
> > ).  We also have posted tutorial-style Sphinx documentation (
> http://www.parsemydata.com/tabular/
> > ).
> >
> > The tabarray object is based on the record array object from the
> > Numerical Python package (NumPy), and Tabular is built to interface
> > well with NumPy in general.  Our intended audience is two-fold: (1)
> > Python users who, though they may not be familiar with NumPy, are in
> > need of a way to work with tabular data, and (2) NumPy users who
> > would like to do spreadsheet-style operations on top of their more
> > "numerical" work.
> >
> > We hope that some of you find Tabular useful!
> >
> > Best,
> >
> > Elaine and Dan
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091005/9c142453/attachment.html>