[Numpy-discussion] Fortran order in recarray.
Alex Rogozhnikov
alex.rogozhnikov at yandex.ru
Wed Feb 22 13:24:00 EST 2017
Hi Stephan,
thanks for the note. The progress over last two years wasn't impressive IMO, but I hope you'll manage.
As you suggest, I'll have a look at xarray too, as I see xarray.Dataset.
I was sure that it doesn't work with non-homogeneous data at all, clearly I need to refresh my opinion.
> 22 февр. 2017 г., в 20:55, Stephan Hoyer <shoyer at gmail.com> написал(а):
>
> On Wed, Feb 22, 2017 at 8:57 AM, Alex Rogozhnikov <alex.rogozhnikov at yandex.ru <mailto:alex.rogozhnikov at yandex.ru>> wrote:
> Pandas may be nice, if you need a report, and you need get it done tomorrow. Then you'll throw away the code. When we initially used pandas as main data storage in yandex/rep, it looked like an good idea, but a year later it was obvious this was a wrong decision. In case when you build data pipeline / research that should be working several years later (using some other installation by someone else), usage of pandas shall be minimal.
>
> The pandas development team (myself included) is well aware of these issues. There are long term plans/hopes to fix this, but there's a lot of work to be done and some hard choices to make:
> https://github.com/pandas-dev/pandas/issues/10000 <https://github.com/pandas-dev/pandas/issues/10000>
> https://github.com/pandas-dev/pandas/issues/13862 <https://github.com/pandas-dev/pandas/issues/13862>
>
> That's why I am looking for a reliable pandas substitute, which should be:
> - completely consistent with numpy and should fail when this wasn't implemented / impossible
> - fewer new abstractions, nobody wants to learn one-more-way-to-manipulate-the-data, specifically other researchers
> - it may be less convenient for interactive data mungling
> - in particular, less methods is ok
> - written code should be interpretable, and hardly can be misinterpreted.
> - not super slow, 1-10 gigabytes datasets are a normal situation
>
> This has some overlap with our motivations for writing Xarray (http://xarray.pydata.org <http://xarray.pydata.org/>), so I encourage you to take a look. It still might be more complex than you're looking for, but we did try to clean up the really ambiguous APIs from pandas like indexing.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170222/5df9d7e6/attachment.html>
More information about the NumPy-Discussion
mailing list