# [scikit-learn] Does NMF optimise over observed values

Raphael C drraph at gmail.com
Sun Aug 28 12:29:59 EDT 2016

To give a little context from the web, see e.g.
http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/
where
it explains:

"
A question might have come to your mind by now: if we find two matrices [image:
\mathbf{P}] and [image: \mathbf{Q}] such that [image: \mathbf{P} \times
\mathbf{Q}] approximates [image: \mathbf{R}], isn’t that our predictions of
all the unseen ratings will all be zeros? In fact, we are not really trying
to come up with [image: \mathbf{P}] and [image: \mathbf{Q}] such that we
can reproduce [image: \mathbf{R}] exactly. Instead, we will only try to
minimise the errors of the observed user-item pairs.
"

Raphael

On Sunday, August 28, 2016, Raphael C <drraph at gmail.com> wrote:

> Thank you for the quick reply.  Just to make sure I understand, if X is
> sparse and n by n with X[0,0] = 1, X_[n-1, n-1]=0 explicitly set (that is
> only two values are set in X) then this is treated the same for the
> purposes of the objective function  as the all zeros n by n matrix with
> X[0,0] set to 1? That is all elements of X that are not specified
> explicitly are assumed to be 0?
>
> It would be really useful if it were possible to have a version of NMF
> where contributions to the objective function are only counted where the
> value is explicitly set in X.  This is AFAIK the standard formulation for
> collaborative filtering. Would there be any interest in doing this? In
> theory it should be a simple modification of the optimisation code.
>
> Raphael
>
>
>
> On Sunday, August 28, 2016, Arthur Mensch <arthur.mensch at inria.fr
> <javascript:_e(%7B%7D,'cvml','arthur.mensch at inria.fr');>> wrote:
>
>> Zeros are considered as zeros in the objective function, not as missing
>> values - - i.e. no mask in the loss function.
>> Le 28 août 2016 16:58, "Raphael C" <drraph at gmail.com> a écrit :
>>
>> What I meant was, how is the objective function defined when X is sparse?
>>
>> Raphael
>>
>>
>> On Sunday, August 28, 2016, Raphael C <drraph at gmail.com> wrote:
>>
>>> Reading the docs for http://scikit-learn.org/st
>>> able/modules/generated/sklearn.decomposition.NMF.html it says
>>>
>>> The objective function is:
>>>
>>> 0.5 * ||X - WH||_Fro^2
>>> + alpha * l1_ratio * ||vec(W)||_1
>>> + alpha * l1_ratio * ||vec(H)||_1
>>> + 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2
>>> + 0.5 * alpha * (1 - l1_ratio) * ||H||_Fro^2
>>>
>>> Where:
>>>
>>> ||A||_Fro^2 = \sum_{i,j} A_{ij}^2 (Frobenius norm)
>>> ||vec(A)||_1 = \sum_{i,j} abs(A_{ij}) (Elementwise L1 norm)
>>>
>>> This seems to suggest that it is optimising over all values in X even if X is sparse.   When using NMF for collaborative filtering we need the objective function to be defined over only the defined elements of X. The remaining elements should effectively be regarded as missing.
>>>
>>>
>>> What is the true objective function NMF is using?
>>>
>>>
>>> Raphael
>>>
>>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160828/9a22fbf6/attachment.html>