[scikit-learn] Normalizer, l1 and l2 norms

Guillaume Lemaître g.lemaitre58 at gmail.com
Tue Sep 24 09:03:03 EDT 2019


One example where I saw it used was Scale-Invariant Feature Transform
(SIFT). Normalizing each vector to have a unit length will compensate for
affine changes in illumination between samples.
The use case given in scikit-learn would be something similar but with text
processing:

"Scaling inputs to unit norms is a common operation for text classification
or clustering for instance. For instance the dot product of two
l2-normalized TF-IDF vectors is the cosine similarity of the vectors and is
the base similarity metric for the Vector Space Model commonly used by the
Information Retrieval community."

So basically, you cancel a transform and it allows you to compare samples
between each other.

On Tue, 24 Sep 2019 at 14:04, Sole Galli <solegalli1 at gmail.com> wrote:

> Sorry, ignore my question, I got it right now.
>
> It is calculating the norm of the observation vector (across variables),
> and its distance varies obs per obs, that is why it needs to be
> re-calculated, and therefore not stored.
>
> I would appreciate some articles / links with successful implementations
> of this technique and why it adds value to ML. Would you be able to point
> me to any?
>
> Cheers
>
> Sole
>
>
>
>
>
> On Tue, 24 Sep 2019 at 12:39, Sole Galli <solegalli1 at gmail.com> wrote:
>
>> Hello team,
>>
>> Quick question respect to the Normalizer().
>>
>> My understanding is that this transformer divides the values (rows) of a
>> vector by the vector euclidean (l2) or manhattan distances (l1).
>>
>> From the sklearn docs, I understand that the Normalizer() does not learn
>> the distances from the train set and stores them. It rathers normalises the
>> data according to distance the data set presents, which could be or not,
>> the same in test and train.
>>
>> Am I understanding this correctly?
>>
>> If so, what is the reason not to store these parameters in the Normalizer
>> and use them to scale future data?
>>
>> If not getting it right, what am I missing?
>>
>> Many thanks and I will appreciate if you have an article on this to share.
>>
>> Cheers
>>
>> Sole
>>
>>
>> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>


-- 
Guillaume Lemaitre
Scikit-learn @ Inria Foundation
https://glemaitre.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190924/32a4da57/attachment-0001.html>


More information about the scikit-learn mailing list