[Numpy-discussion] Fortran order in recarray.
shoyer at gmail.com
Wed Feb 22 12:55:07 EST 2017
On Wed, Feb 22, 2017 at 8:57 AM, Alex Rogozhnikov <
alex.rogozhnikov at yandex.ru> wrote:
> Pandas may be nice, if you need a report, and you need get it done
> tomorrow. Then you'll throw away the code. When we initially used pandas as
> main data storage in yandex/rep, it looked like an good idea, but a year
> later it was obvious this was a wrong decision. In case when you build data
> pipeline / research that should be working several years later (using some
> other installation by someone else), usage of pandas shall be *minimal*.
The pandas development team (myself included) is well aware of these
issues. There are long term plans/hopes to fix this, but there's a lot of
work to be done and some hard choices to make:
That's why I am looking for a reliable pandas substitute, which should be:
> - completely consistent with numpy and should fail when this wasn't
> implemented / impossible
> - fewer new abstractions, nobody wants to learn one-more-way-to-manipulate-the-data,
> specifically other researchers
> - it may be less convenient for interactive data mungling
> - in particular, less methods is ok
> - written code should be interpretable, and hardly can be misinterpreted.
> - not super slow, 1-10 gigabytes datasets are a normal situation
This has some overlap with our motivations for writing Xarray (
http://xarray.pydata.org), so I encourage you to take a look. It still
might be more complex than you're looking for, but we did try to clean up
the really ambiguous APIs from pandas like indexing.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion