[scikit-learn] Validating L2 - Least Squares - sum of squares, During a Normalization Function
Christopher Pfeifer
chrispfeifer8557 at gmail.com
Sun Oct 8 01:08:12 EDT 2017
I am attempting to validate the output of an L2 normalization function:
*data_l2 = preprocessing.normalize(data, norm='l2') * # raw data is
below at end of this email
output:
array([[ 0.57649683, 0.53806371, 0.61492995],
[-0.53806371, -0.57649683, -0.61492995],
[ 0.3359268 , 0.90089461, -0.2748492 ],
[ 0.6676851 , -0.39566524, -0.63059148],
[-0.70710678, 0. , 0.70710678],
[-0.63116874, 0.45083482, 0.63116874]])
Each row being a set of three features of an observation
I am under the belief that the sum of the 'squared' values of an
instance (row) should be virtually equal to 1 (normalized).
*Problem - 1:*
the np.square() function is returning the absolute value of the sum of
the three features, even when the sum of the squares is clearly
negative.
np.square(-0.53806371) returns 0.28951255601896408 however,
(-0.53806371**2) returns -0.2895125560189641
The correct square of -0.53806371 is -0.2895125560189641 (a negative
number), even my 10 year old calculator gets it right.
I can find nothing in the numpy documentation that indicates
np.square() always returns the absolute value, instead of the
correctly signed value.
*Question:*
Is there a way to force np.square() to return the correctly signed
square value not the absolute value?
*Problem - 2:*
For some of the observations (rows), the sum of the squared values
(which should be virtually 1), are nowhere near 1.
print 0.57649683**2 + 0.53806371**2 + 0.61492995**2 row 1
0.9999999944260154 (this is virtually 1)
print -0.63116874**2 + 0.45083482**2 + 0.63116874**2 row 6
0.203252034924 (*this is nowhere near 1*)
sum of the 'squared' values of an instance (row) should be virtually equal to 1.
*Question:*
Is the preprocessing.normalize(data, norm='l2') messing up, or is my
raw data being fed into the normalization routine to unrealistic (I
made it up of both positive and negative numbers.
*Raw Data*
array([[ 1.5, 1.4, 1.6],
[-1.4, -1.5, -1.6],
[ 2.2, 5.9, -1.8],
[ 5.4, -3.2, -5.1],
[-1.4, 0. , 1.4],
[-1.4, 1. , 1.4]])
Thanks: Chris
P.S.: Not a real world problem, just trying to understand the
functionality of scikit-learn. Have only been working with the package
for two weeks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171008/df5a2084/attachment-0001.html>
More information about the scikit-learn
mailing list