Re: [scikit-learn] Does NMF optimise over observed values
Zeros are considered as zeros in the objective function, not as missing values - - i.e. no mask in the loss function. Le 28 août 2016 16:58, "Raphael C" <drraph@gmail.com> a écrit : What I meant was, how is the objective function defined when X is sparse? Raphael On Sunday, August 28, 2016, Raphael C <drraph@gmail.com> wrote:
Reading the docs for http://scikit-learn.org/st able/modules/generated/sklearn.decomposition.NMF.html it says
The objective function is:
0.5 * ||X - WH||_Fro^2 + alpha * l1_ratio * ||vec(W)||_1 + alpha * l1_ratio * ||vec(H)||_1 + 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2 + 0.5 * alpha * (1 - l1_ratio) * ||H||_Fro^2
Where:
||A||_Fro^2 = \sum_{i,j} A_{ij}^2 (Frobenius norm) ||vec(A)||_1 = \sum_{i,j} abs(A_{ij}) (Elementwise L1 norm)
This seems to suggest that it is optimising over all values in X even if X is sparse. When using NMF for collaborative filtering we need the objective function to be defined over only the defined elements of X. The remaining elements should effectively be regarded as missing.
What is the true objective function NMF is using?
Raphael
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Thank you for the quick reply. Just to make sure I understand, if X is sparse and n by n with X[0,0] = 1, X_[n-1, n-1]=0 explicitly set (that is only two values are set in X) then this is treated the same for the purposes of the objective function as the all zeros n by n matrix with X[0,0] set to 1? That is all elements of X that are not specified explicitly are assumed to be 0? It would be really useful if it were possible to have a version of NMF where contributions to the objective function are only counted where the value is explicitly set in X. This is AFAIK the standard formulation for collaborative filtering. Would there be any interest in doing this? In theory it should be a simple modification of the optimisation code. Raphael On Sunday, August 28, 2016, Arthur Mensch <arthur.mensch@inria.fr> wrote:
Zeros are considered as zeros in the objective function, not as missing values - - i.e. no mask in the loss function. Le 28 août 2016 16:58, "Raphael C" <drraph@gmail.com <javascript:_e(%7B%7D,'cvml','drraph@gmail.com');>> a écrit :
What I meant was, how is the objective function defined when X is sparse?
Raphael
On Sunday, August 28, 2016, Raphael C <drraph@gmail.com <javascript:_e(%7B%7D,'cvml','drraph@gmail.com');>> wrote:
Reading the docs for http://scikit-learn.org/st able/modules/generated/sklearn.decomposition.NMF.html it says
The objective function is:
0.5 * ||X - WH||_Fro^2 + alpha * l1_ratio * ||vec(W)||_1 + alpha * l1_ratio * ||vec(H)||_1 + 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2 + 0.5 * alpha * (1 - l1_ratio) * ||H||_Fro^2
Where:
||A||_Fro^2 = \sum_{i,j} A_{ij}^2 (Frobenius norm) ||vec(A)||_1 = \sum_{i,j} abs(A_{ij}) (Elementwise L1 norm)
This seems to suggest that it is optimising over all values in X even if X is sparse. When using NMF for collaborative filtering we need the objective function to be defined over only the defined elements of X. The remaining elements should effectively be regarded as missing.
What is the true objective function NMF is using?
Raphael
_______________________________________________ scikit-learn mailing list scikit-learn@python.org <javascript:_e(%7B%7D,'cvml','scikit-learn@python.org');> https://mail.python.org/mailman/listinfo/scikit-learn
To give a little context from the web, see e.g. http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-... where it explains: " A question might have come to your mind by now: if we find two matrices [image: \mathbf{P}] and [image: \mathbf{Q}] such that [image: \mathbf{P} \times \mathbf{Q}] approximates [image: \mathbf{R}], isn’t that our predictions of all the unseen ratings will all be zeros? In fact, we are not really trying to come up with [image: \mathbf{P}] and [image: \mathbf{Q}] such that we can reproduce [image: \mathbf{R}] exactly. Instead, we will only try to minimise the errors of the observed user-item pairs. " Raphael On Sunday, August 28, 2016, Raphael C <drraph@gmail.com> wrote:
Thank you for the quick reply. Just to make sure I understand, if X is sparse and n by n with X[0,0] = 1, X_[n-1, n-1]=0 explicitly set (that is only two values are set in X) then this is treated the same for the purposes of the objective function as the all zeros n by n matrix with X[0,0] set to 1? That is all elements of X that are not specified explicitly are assumed to be 0?
It would be really useful if it were possible to have a version of NMF where contributions to the objective function are only counted where the value is explicitly set in X. This is AFAIK the standard formulation for collaborative filtering. Would there be any interest in doing this? In theory it should be a simple modification of the optimisation code.
Raphael
On Sunday, August 28, 2016, Arthur Mensch <arthur.mensch@inria.fr <javascript:_e(%7B%7D,'cvml','arthur.mensch@inria.fr');>> wrote:
Zeros are considered as zeros in the objective function, not as missing values - - i.e. no mask in the loss function. Le 28 août 2016 16:58, "Raphael C" <drraph@gmail.com> a écrit :
What I meant was, how is the objective function defined when X is sparse?
Raphael
On Sunday, August 28, 2016, Raphael C <drraph@gmail.com> wrote:
Reading the docs for http://scikit-learn.org/st able/modules/generated/sklearn.decomposition.NMF.html it says
The objective function is:
0.5 * ||X - WH||_Fro^2 + alpha * l1_ratio * ||vec(W)||_1 + alpha * l1_ratio * ||vec(H)||_1 + 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2 + 0.5 * alpha * (1 - l1_ratio) * ||H||_Fro^2
Where:
||A||_Fro^2 = \sum_{i,j} A_{ij}^2 (Frobenius norm) ||vec(A)||_1 = \sum_{i,j} abs(A_{ij}) (Elementwise L1 norm)
This seems to suggest that it is optimising over all values in X even if X is sparse. When using NMF for collaborative filtering we need the objective function to be defined over only the defined elements of X. The remaining elements should effectively be regarded as missing.
What is the true objective function NMF is using?
Raphael
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
On 08/28/2016 12:29 PM, Raphael C wrote:
To give a little context from the web, see e.g. http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-... where it explains:
" A question might have come to your mind by now: if we find two matrices \mathbf{P} and \mathbf{Q} such that \mathbf{P} \times \mathbf{Q} approximates \mathbf{R}, isn’t that our predictions of all the unseen ratings will all be zeros? In fact, we are not really trying to come up with \mathbf{P} and \mathbf{Q} such that we can reproduce \mathbf{R} exactly. Instead, we will only try to minimise the errors of the observed user-item pairs. " Yes, the sklearn interface is not meant for matrix completion but matrix-factorization. There was a PR for some matrix completion for missing value imputation at some point.
In general, scikit-learn doesn't really implement anything for recommendation algorithms as that requires a different interface.
On Sunday, August 28, 2016, Andy <t3kcit@gmail.com> wrote:
On 08/28/2016 12:29 PM, Raphael C wrote:
To give a little context from the web, see e.g. http://www.quuxlabs.com/ blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation- in-python/ where it explains:
" A question might have come to your mind by now: if we find two matrices [image: \mathbf{P}] and [image: \mathbf{Q}] such that [image: \mathbf{P} \times \mathbf{Q}] approximates [image: \mathbf{R}], isn’t that our predictions of all the unseen ratings will all be zeros? In fact, we are not really trying to come up with [image: \mathbf{P}] and [image: \mathbf{Q}] such that we can reproduce [image: \mathbf{R}] exactly. Instead, we will only try to minimise the errors of the observed user-item pairs. "
Yes, the sklearn interface is not meant for matrix completion but matrix-factorization. There was a PR for some matrix completion for missing value imputation at some point.
In general, scikit-learn doesn't really implement anything for recommendation algorithms as that requires a different interface.
Thanks Andy. I just looked up that PR. I was thinking simply producing a different factorisation optimised only over the observed values wouldn't need a new interface. That in itself would be hugely useful. I can see that providing a full drop in recommender system would involve more work. Raphael
On 08/28/2016 01:16 PM, Raphael C wrote:
On Sunday, August 28, 2016, Andy <t3kcit@gmail.com <mailto:t3kcit@gmail.com>> wrote:
On 08/28/2016 12:29 PM, Raphael C wrote:
To give a little context from the web, see e.g. http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-... <http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-...> where it explains:
" A question might have come to your mind by now: if we find two matrices \mathbf{P} and \mathbf{Q} such that \mathbf{P} \times \mathbf{Q} approximates \mathbf{R}, isn’t that our predictions of all the unseen ratings will all be zeros? In fact, we are not really trying to come up with \mathbf{P} and \mathbf{Q} such that we can reproduce \mathbf{R} exactly. Instead, we will only try to minimise the errors of the observed user-item pairs. "
Yes, the sklearn interface is not meant for matrix completion but matrix-factorization. There was a PR for some matrix completion for missing value imputation at some point.
In general, scikit-learn doesn't really implement anything for recommendation algorithms as that requires a different interface.
Thanks Andy. I just looked up that PR.
I was thinking simply producing a different factorisation optimised only over the observed values wouldn't need a new interface. That in itself would be hugely useful.
Depends. Usually you don't want to complete all values, but only compute a factorization. What do you return? Only the factors? The PR implements completing everything, and that you can do with the transformer interface. I'm not sure what the status of the PR is, but doing that with NMF instead of SVD would certainly also be interesting.
If X is sparse, explicit zeros and missing-value zeros are **both** considered as zeros in the objective functions. Changing the objective function wouldn't need a new interface, yet I am not sure the code change would be completely trivial. The question is: do we want this new objective function in scikit-learn, since we have no other recommendation-like algorithm? If we agree that it would useful, feel free to send a PR. Tom 2016-08-29 17:50 GMT+02:00 Andreas Mueller <t3kcit@gmail.com>:
On 08/28/2016 01:16 PM, Raphael C wrote:
On Sunday, August 28, 2016, Andy <t3kcit@gmail.com> wrote:
On 08/28/2016 12:29 PM, Raphael C wrote:
To give a little context from the web, see e.g. http://www.quuxlabs.com/b log/2010/09/matrix-factorization-a-simple-tutorial-and- implementation-in-python/ where it explains:
" A question might have come to your mind by now: if we find two matrices [image: \mathbf{P}] and [image: \mathbf{Q}] such that [image: \mathbf{P} \times \mathbf{Q}] approximates [image: \mathbf{R}], isn’t that our predictions of all the unseen ratings will all be zeros? In fact, we are not really trying to come up with [image: \mathbf{P}] and [image: \mathbf{Q}] such that we can reproduce [image: \mathbf{R}] exactly. Instead, we will only try to minimise the errors of the observed user-item pairs. "
Yes, the sklearn interface is not meant for matrix completion but matrix-factorization. There was a PR for some matrix completion for missing value imputation at some point.
In general, scikit-learn doesn't really implement anything for recommendation algorithms as that requires a different interface.
Thanks Andy. I just looked up that PR.
I was thinking simply producing a different factorisation optimised only over the observed values wouldn't need a new interface. That in itself would be hugely useful.
Depends. Usually you don't want to complete all values, but only compute a factorization. What do you return? Only the factors? The PR implements completing everything, and that you can do with the transformer interface. I'm not sure what the status of the PR is, but doing that with NMF instead of SVD would certainly also be interesting.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
On Monday, August 29, 2016, Andreas Mueller <t3kcit@gmail.com> wrote:
On 08/28/2016 01:16 PM, Raphael C wrote:
On Sunday, August 28, 2016, Andy <t3kcit@gmail.com <javascript:_e(%7B%7D,'cvml','t3kcit@gmail.com');>> wrote:
On 08/28/2016 12:29 PM, Raphael C wrote:
To give a little context from the web, see e.g. http://www.quuxlabs.com/b log/2010/09/matrix-factorization-a-simple-tutorial-and- implementation-in-python/ where it explains:
" A question might have come to your mind by now: if we find two matrices [image: \mathbf{P}] and [image: \mathbf{Q}] such that [image: \mathbf{P} \times \mathbf{Q}] approximates [image: \mathbf{R}], isn’t that our predictions of all the unseen ratings will all be zeros? In fact, we are not really trying to come up with [image: \mathbf{P}] and [image: \mathbf{Q}] such that we can reproduce [image: \mathbf{R}] exactly. Instead, we will only try to minimise the errors of the observed user-item pairs. "
Yes, the sklearn interface is not meant for matrix completion but matrix-factorization. There was a PR for some matrix completion for missing value imputation at some point.
In general, scikit-learn doesn't really implement anything for recommendation algorithms as that requires a different interface.
Thanks Andy. I just looked up that PR.
I was thinking simply producing a different factorisation optimised only over the observed values wouldn't need a new interface. That in itself would be hugely useful.
Depends. Usually you don't want to complete all values, but only compute a factorization. What do you return? Only the factors?
The PR implements completing everything, and that you can do with the transformer interface. I'm not sure what the status of the PR is, but doing that with NMF instead of SVD would certainly also be interesting.
I was thinking you would literally return W and H so that WH approx X. The user can then decide what to do with the factorisation just like when doing SVD. Raphael
participants (5)
-
Andreas Mueller -
Andy -
Arthur Mensch -
Raphael C -
Tom DLT