
2017-02-22 16:30 GMT+01:00 Kiko <kikocorreoso@gmail.com>:
2017-02-22 16:23 GMT+01:00 Alex Rogozhnikov <alex.rogozhnikov@yandex.ru>:
Hi Francesc, thanks a lot for you reply and for your impressive job on bcolz!
Bcolz seems to make stress on compression, which is not of much interest for me, but the *ctable*, and chunked operations look very appropriate to me now. (Of course, I'll need to test it much before I can say this for sure, that's current impression).
You can disable compression for bcolz by default too: http://bcolz.blosc.org/en/latest/defaults.html#list-of-default-values
The strongest concern with bcolz so far is that it seems to be completely non-trivial to install on windows systems, while pip provides binaries for most (or all?) OS for numpy. I didn't build pip binary wheels myself, but is it hard / impossible to cook pip-installabel binaries?
http://www.lfd.uci.edu/~gohlke/pythonlibs/#bcolz Check if the link solves the issue with installing.
Yeah. Also, there are binaries for conda: http://bcolz.blosc.org/en/latest/install.html#installing-from-conda-forge
You can change shapes of numpy arrays, but that usually involves copies of the whole container.
sure, but this is ok for me, as I plan to organize column editing in 'batches', so this should require seldom copying. It would be nice to see an example to understand how deep I need to go inside numpy.
Well, if copying is not a problem for you, then you can just create a new numpy container and do the copy by yourself. Francesc
Cheers, Alex.
22 февр. 2017 г., в 17:03, Francesc Alted <faltet@gmail.com> написал(а):
Hi Alex,
2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov <alex.rogozhnikov@yandex.ru>:
Hi Nathaniel,
pandas
yup, the idea was to have minimal pandas.DataFrame-like storage (which I was using for a long time), but without irritating problems with its row indexing and some other problems like interaction with matplotlib.
A dict of arrays?
that's what I've started from and implemented, but at some point I decided that I'm reinventing the wheel and numpy has something already. In principle, I can ignore this 'column-oriented' storage requirement, but potentially it may turn out to be quite slow-ish if dtype's size is large.
Suggestions are welcome.
You may want to try bcolz:
https://github.com/Blosc/bcolz
bcolz is a columnar storage, basically as you require, but data is compressed by default even when stored in-memory (although you can disable compression if you want to).
Another strange question: in general, it is considered that once numpy.array is created, it's shape not changed. But if i want to keep the same recarray and change it's dtype and/or shape, is there a way to do this?
You can change shapes of numpy arrays, but that usually involves copies of the whole container. With bcolz you can change length and add/del columns without copies. If your containers are large, it is better to inform bcolz on its final estimated size. See:
http://bcolz.blosc.org/en/latest/opt-tips.html
Francesc
Thanks, Alex.
22 февр. 2017 г., в 3:53, Nathaniel Smith <njs@pobox.com> написал(а):
On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" <alex.rogozhnikov@yandex.ru> wrote:
Ah, got it. Thanks, Chris! I thought recarray can be only one-dimensional (like tables with named columns).
Maybe it's better to ask directly what I was looking for: something that works like a table with named columns (but no labelling for rows), and keeps data (of different dtypes) in a column-by-column way (and this is numpy, not pandas).
Is there such a magic thing?
Well, that's what pandas is for...
A dict of arrays?
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Francesc Alted _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Francesc Alted