[Numpy-discussion] [ANN] New open source project for labeled arrays
kwgoodman at gmail.com
Wed Jan 27 22:24:24 EST 2010
On Wed, Jan 27, 2010 at 7:13 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
> On Jan 27, 2010, at 9:10 PM, Keith Goodman wrote:
>> I recently opened sourced one of my packages. It is a labeled array
>> that I call larry.
>> A two-dimensional larry, for example, contains a 2d NumPy array with
>> labels on each row and column. A larry can have any dimension.
>> Alignment by label is automatic when you add (or subtract, multiply,
>> divide) two larrys.
>> larry has built-in methods such as movingsum, ranking, merge, shuffle,
>> zscore, demean, lag as well as typical NumPy methods like sum, max,
>> std, sign, clip. NaNs are treated as missing data.
> So you can't have an integer larry with missing data ?
>> You can archive larrys in HDF5 format using save and load or using a
>> dictionary-like interface.
>> I'm working towards a 0.1 release. In the meantime, comments,
>> suggestions, critiques are all appreciated.
> I'll have to check it (hopefully I'll have a bit more time in the next couple of weeks), but what are the main differences/advantages of using your approach compared to pandas or tabular ?
I've tried to make larry behave as a numpy array user would expect.
If, for example, you have a function, myfunc, that works on Numpy
arrays and doesn't change the shape or ordering of the array, then you
can use it on a larry, y, like this: y.x = myfunc(y.x).
The main use case for a larry is when you want to work on the entire
array, or a subset of it, all at once. Not so much if you only want to
grab one row, for example, at a time.
The internal structure of a larry (Numpy array + list) is easy to
understand so it is easy to get going and to extend.
More information about the NumPy-Discussion