[scikit-learn] Latent Semantic Analysis (LSA) and TrucatedSVD
Roman Yurchak
rth.yurchak at gmail.com
Mon Aug 29 06:39:46 EDT 2016
Thank you for all your responses!
In the LSA what is equivalent, I think, is
- to apply a L2 normalization (not the StandardScaler) after the LSA
and then compute the cosine similarity between document vectors simply
as a dot product.
- not apply the L2 normalization and call the `cosine_similarity`
function instead.
I have applied this normalization to the previous example, and it
produces indeed equivalent results (i.e. does not solve the problem).
Opening an issue on this for further discussion
https://github.com/scikit-learn/scikit-learn/issues/7283
Thanks for your feedback!
--
Roman
On 28/08/16 18:20, Andy wrote:
> If you do "with_mean=False" it should be the same, right?
>
> On 08/27/2016 12:20 PM, Olivier Grisel wrote:
>> I am not sure this is exactly the same because we do not center the
>> data in the TruncatedSVD case (as opposed to the real PCA case where
>> whitening is the same as calling StandardScaler).
>>
>> Having an option to normalize the transformed data by sigma seems like
>> a good idea but we should probably not call that whitening.
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
More information about the scikit-learn
mailing list