22 февр. 2017 г., в 20:55, Stephan Hoyer <shoyer@gmail.com> написал(а):_______________________________________________On Wed, Feb 22, 2017 at 8:57 AM, Alex Rogozhnikov <alex.rogozhnikov@yandex.ru> wrote:Pandas may be nice, if you need a report, and you need get it done tomorrow. Then you'll throw away the code. When we initially used pandas as main data storage in yandex/rep, it looked like an good idea, but a year later it was obvious this was a wrong decision. In case when you build data pipeline / research that should be working several years later (using some other installation by someone else), usage of pandas shall be minimal.The pandas development team (myself included) is well aware of these issues. There are long term plans/hopes to fix this, but there's a lot of work to be done and some hard choices to make:That's why I am looking for a reliable pandas substitute, which should be:- completely consistent with numpy and should fail when this wasn't implemented / impossible- fewer new abstractions, nobody wants to learn one-more-way-to-manipulate-the-data, specifically other researchers - it may be less convenient for interactive data mungling- in particular, less methods is ok- written code should be interpretable, and hardly can be misinterpreted.- not super slow, 1-10 gigabytes datasets are a normal situationThis has some overlap with our motivations for writing Xarray (http://xarray.pydata.org), so I encourage you to take a look. It still might be more complex than you're looking for, but we did try to clean up the really ambiguous APIs from pandas like indexing.
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion