[scikit-learn] Latent Semantic Analysis (LSA) and TrucatedSVD

Roman Yurchak rth.yurchak at gmail.com
Mon Aug 29 06:39:46 EDT 2016

Thank you for all your responses!

In the LSA what is equivalent, I think, is
   - to apply a L2 normalization (not the StandardScaler) after the LSA
and then compute the cosine similarity between document vectors simply
as a dot product.
   - not apply the L2 normalization and call the `cosine_similarity`
function instead.

I have applied this normalization to the previous example, and it
produces indeed equivalent results (i.e. does not solve the problem).
Opening an issue on this for further discussion

Thanks for your feedback!

On 28/08/16 18:20, Andy wrote:
> If you do "with_mean=False" it should be the same, right?
> On 08/27/2016 12:20 PM, Olivier Grisel wrote:
>> I am not sure this is exactly the same because we do not center the
>> data in the TruncatedSVD case (as opposed to the real PCA case where
>> whitening is the same as calling StandardScaler).
>> Having an option to normalize the transformed data by sigma seems like
>> a good idea but we should probably not call that whitening.
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

More information about the scikit-learn mailing list