[scikit-learn] Validating L2 - Least Squares - sum of squares, During a Normalization Function

Christopher Pfeifer chrispfeifer8557 at gmail.com
Sun Oct 8 01:08:12 EDT 2017


I am attempting to validate the output of an L2 normalization function:

*data_l2 = preprocessing.normalize(data, norm='l2') *        # raw data is
below at end of this email

output:

array([[ 0.57649683,  0.53806371,  0.61492995],
       [-0.53806371, -0.57649683, -0.61492995],
       [ 0.3359268 ,  0.90089461, -0.2748492 ],
       [ 0.6676851 , -0.39566524, -0.63059148],
       [-0.70710678,  0.        ,  0.70710678],
       [-0.63116874,  0.45083482,  0.63116874]])


Each row being a set of three features of an observation


I am under the belief that the sum of the 'squared' values of an
instance (row) should be virtually equal to 1 (normalized).


*Problem - 1:*

the np.square() function is returning the absolute value of the sum of
the three features, even when the sum of the squares is clearly
negative.

np.square(-0.53806371) returns 0.28951255601896408    however,
(-0.53806371**2)    returns    -0.2895125560189641

The correct square of -0.53806371 is  -0.2895125560189641 (a negative
number), even my 10 year old calculator gets it right.

I can find nothing in the numpy documentation that indicates
np.square() always returns the absolute value, instead of the
correctly signed value.

*Question:*

Is there a way to force np.square() to return the correctly signed
square value not the absolute value?


*Problem - 2:*

For some of the observations (rows), the sum of the squared values
(which should be virtually 1), are nowhere near 1.


print 0.57649683**2 + 0.53806371**2 +  0.61492995**2      row 1

0.9999999944260154  (this is virtually 1)


print -0.63116874**2 + 0.45083482**2  +  0.63116874**2    row 6

0.203252034924   (*this is nowhere near 1*)


sum of the 'squared' values of an instance (row) should be virtually equal to 1.


*Question:*

Is the preprocessing.normalize(data, norm='l2') messing up, or is my
raw data being fed into the normalization routine to unrealistic (I
made it up of both positive and negative numbers.


*Raw Data*

array([[ 1.5,  1.4,  1.6],
       [-1.4, -1.5, -1.6],
       [ 2.2,  5.9, -1.8],
       [ 5.4, -3.2, -5.1],
       [-1.4,  0. ,  1.4],
       [-1.4,  1. ,  1.4]])

Thanks: Chris


P.S.: Not a real world problem, just trying to understand the
functionality of scikit-learn. Have only been working with the package
for two weeks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171008/df5a2084/attachment-0001.html>


More information about the scikit-learn mailing list