
2011/2/25 Gael Varoquaux <gael.varoquaux@normalesup.org>:
On Fri, Feb 25, 2011 at 10:36:42AM +0100, Fred wrote:
I have a big array (44 GB) I want to decimate.
But this array has a lot of NaN (only 1/3 has value, in fact, so 2/3 of NaN).
If I "basically" decimate it (a la NumPy, ie data[::nx, ::ny, ::nz], for instance), the decimated array will also have a lot of NaN.
What I would like to have in one cell of the decimated array is the nearest (for instance) value in the big array. This is what I call a "condensated array".
What exactly do you mean by 'decimating'. To me is seems that you are looking for matrix factorization or matrix completion techniques, which are trendy topics in machine learning currently.
They however are a bit challenging, and I fear that you will have read the papers and do some implementation, unless you have a clear application in mind that enables for simple tricks to solve it.
Indeed the following paper by G. Martinsson from there is also a section on matrix summarization: http://arxiv.org/abs/0909.4061 http://www.stanford.edu/group/mmds/slides2010/Martinsson.pdf The scikit-learn randomized SVD implementation is coming this paper. It's pretty useful in practice. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel