[scikit-learn] Does NMF optimise over observed values

Mon Aug 29 14:11:42 EDT 2016

On Monday, August 29, 2016, Andreas Mueller <t3kcit at gmail.com> wrote:

>
>
> On 08/28/2016 01:16 PM, Raphael C wrote:
>
>
>
> On Sunday, August 28, 2016, Andy <t3kcit at gmail.com
> <javascript:_e(%7B%7D,'cvml','t3kcit at gmail.com');>> wrote:
>
>>
>>
>> On 08/28/2016 12:29 PM, Raphael C wrote:
>>
>> To give a little context from the web, see e.g. http://www.quuxlabs.com/b
>> log/2010/09/matrix-factorization-a-simple-tutorial-and-
>> implementation-in-python/ where it explains:
>>
>> "
>> A question might have come to your mind by now: if we find two matrices [image:
>> \mathbf{P}] and [image: \mathbf{Q}] such that [image: \mathbf{P} \times
>> \mathbf{Q}] approximates [image: \mathbf{R}], isn’t that our predictions
>> of all the unseen ratings will all be zeros? In fact, we are not really
>> trying to come up with [image: \mathbf{P}] and [image: \mathbf{Q}] such
>> that we can reproduce [image: \mathbf{R}] exactly. Instead, we will only
>> try to minimise the errors of the observed user-item pairs.
>> "
>>
>> Yes, the sklearn interface is not meant for matrix completion but
>> matrix-factorization.
>> There was a PR for some matrix completion for missing value imputation at
>> some point.
>>
>> In general, scikit-learn doesn't really implement anything for
>> recommendation algorithms as that requires a different interface.
>>
>
> Thanks Andy. I just looked up that PR.
>
> I was thinking simply producing a different factorisation optimised only
> over the observed values wouldn't need a new interface. That in itself
> would be hugely useful.
>
> Depends. Usually you don't want to complete all values, but only compute a
> factorization. What do you return? Only the factors?
>
> The PR implements completing everything, and that you can do with the
> transformer interface. I'm not sure what the status of the PR is,
> but doing that with NMF instead of SVD would certainly also be interesting.
>

I was thinking you would literally return W and H so that WH approx X.  The
user can then decide what to do with the factorisation just like when doing
SVD.

Raphael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160829/0d6671f3/attachment.html>