Re: [Numpy-discussion] Condensing array...

25 Feb 2011

      2011/2/25 Gael Varoquaux <gael.varoquaux@normalesup.org>:
...
On Fri, Feb 25, 2011 at 10:36:42AM +0100, Fred wrote:
...
I have a big array (44 GB) I want to decimate.
...
But this array has a lot of NaN (only 1/3 has value, in fact, so 2/3 of
NaN).
...
If I "basically" decimate it (a la NumPy, ie data[::nx, ::ny, ::nz], for
instance), the decimated array will also have a lot of NaN.
...
What I would like to have in one cell of the decimated array is the
nearest (for instance) value in the big array. This is what I call a
"condensated array".
What exactly do you mean by 'decimating'. To me is seems that you are
looking for matrix factorization or matrix completion techniques, which
are trendy topics in machine learning currently.
They however are a bit challenging, and I fear that you will have read
the papers and do some implementation, unless you have a clear
application in mind that enables for simple tricks to solve it.
Indeed the following paper by G. Martinsson from there is also a
section on matrix summarization:

  http://arxiv.org/abs/0909.4061
  http://www.stanford.edu/group/mmds/slides2010/Martinsson.pdf

The scikit-learn randomized SVD implementation is coming this paper.
It's pretty useful in practice.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: [Numpy-discussion] Condensing array...

Olivier Grisel