[Numpy-discussion] Condensing array...
Olivier Grisel
olivier.grisel at ensta.org
Fri Feb 25 05:52:29 EST 2011
2011/2/25 Gael Varoquaux <gael.varoquaux at normalesup.org>:
> On Fri, Feb 25, 2011 at 10:36:42AM +0100, Fred wrote:
>> I have a big array (44 GB) I want to decimate.
>
>> But this array has a lot of NaN (only 1/3 has value, in fact, so 2/3 of
>> NaN).
>
>> If I "basically" decimate it (a la NumPy, ie data[::nx, ::ny, ::nz], for
>> instance), the decimated array will also have a lot of NaN.
>
>> What I would like to have in one cell of the decimated array is the
>> nearest (for instance) value in the big array. This is what I call a
>> "condensated array".
>
> What exactly do you mean by 'decimating'. To me is seems that you are
> looking for matrix factorization or matrix completion techniques, which
> are trendy topics in machine learning currently.
>
> They however are a bit challenging, and I fear that you will have read
> the papers and do some implementation, unless you have a clear
> application in mind that enables for simple tricks to solve it.
Indeed the following paper by G. Martinsson from there is also a
section on matrix summarization:
http://arxiv.org/abs/0909.4061
http://www.stanford.edu/group/mmds/slides2010/Martinsson.pdf
The scikit-learn randomized SVD implementation is coming this paper.
It's pretty useful in practice.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
More information about the NumPy-Discussion
mailing list