[scikit-learn] Validating L2 - Least Squares - sum of squares, During a Normalization Function

Joel Nothman joel.nothman at gmail.com
Sun Oct 8 06:32:27 EDT 2017


(normalize(X) * normalize(X)).sum(axis=1) works fine here.

But I was unaware of these quirks in Python's implementation of pow:

Numpy seems to be consistent in returning nan when a negative float is
raised to a non-integer (or equivalent float) power. By only calculating
integer powers of negative floats, the absolute value is returned in
suqareing. I assume this follows C conventions?

Python, on the other hand, seems to do strange things:

Numpy:
>>> np.array(-.6) ** 2.1
nan
>>> np.array(-.6+0j) ** 2.1
(0.32532987876940411+0.10570608538524294j)

Python 3.6.2 returns the norm of the complex power:
>>> -.6 ** 2.1
-0.3420720779420435
>>> (-.6 + 0j) ** 2.1
(0.3253298787694041+0.10570608538524294j)
>>> (((-.6 + 0j) ** 2.1).real ** 2 + ((-.6 + 0j) ** 2.1).imag ** 2) ** .5
0.3420720779420434

Very strangely, putting the LHS in parentheses performs complex power in
Python.

>>> (-.6) ** 2.1
(0.3253298787694041+0.10570608538524294j)

At https://docs.python.org/3/reference/expressions.html:

Raising a negative number to a fractional power results in a complex
<https://docs.python.org/3/library/functions.html#complex> number. (In
earlier versions it raised a ValueError
<https://docs.python.org/3/library/exceptions.html#ValueError>.)

By "in earlier versions" it means Python 2. I don't know why this should
only be the case where the LHS is parenthesised. Seems like a CPython bug!

On 8 October 2017 at 16:08, Christopher Pfeifer <chrispfeifer8557 at gmail.com>
wrote:

> I am attempting to validate the output of an L2 normalization function:
>
> *data_l2 = preprocessing.normalize(data, norm='l2') *        # raw data
> is below at end of this email
>
> output:
>
> array([[ 0.57649683,  0.53806371,  0.61492995],
>        [-0.53806371, -0.57649683, -0.61492995],
>        [ 0.3359268 ,  0.90089461, -0.2748492 ],
>        [ 0.6676851 , -0.39566524, -0.63059148],
>        [-0.70710678,  0.        ,  0.70710678],
>        [-0.63116874,  0.45083482,  0.63116874]])
>
>
> Each row being a set of three features of an observation
>
>
> I am under the belief that the sum of the 'squared' values of an instance (row) should be virtually equal to 1 (normalized).
>
>
> *Problem - 1:*
>
> the np.square() function is returning the absolute value of the sum of the three features, even when the sum of the squares is clearly negative.
>
> np.square(-0.53806371) returns 0.28951255601896408    however, (-0.53806371**2)    returns    -0.2895125560189641
>
> The correct square of -0.53806371 is  -0.2895125560189641 (a negative number), even my 10 year old calculator gets it right.
>
> I can find nothing in the numpy documentation that indicates np.square() always returns the absolute value, instead of the correctly signed value.
>
> *Question:*
>
> Is there a way to force np.square() to return the correctly signed square value not the absolute value?
>
>
> *Problem - 2:*
>
> For some of the observations (rows), the sum of the squared values (which should be virtually 1), are nowhere near 1.
>
>
> print 0.57649683**2 + 0.53806371**2 +  0.61492995**2      row 1
>
> 0.9999999944260154  (this is virtually 1)
>
>
> print -0.63116874**2 + 0.45083482**2  +  0.63116874**2    row 6
>
> 0.203252034924   (*this is nowhere near 1*)
>
>
> sum of the 'squared' values of an instance (row) should be virtually equal to 1.
>
>
> *Question:*
>
> Is the preprocessing.normalize(data, norm='l2') messing up, or is my raw data being fed into the normalization routine to unrealistic (I made it up of both positive and negative numbers.
>
>
> *Raw Data*
>
> array([[ 1.5,  1.4,  1.6],
>        [-1.4, -1.5, -1.6],
>        [ 2.2,  5.9, -1.8],
>        [ 5.4, -3.2, -5.1],
>        [-1.4,  0. ,  1.4],
>        [-1.4,  1. ,  1.4]])
>
> Thanks: Chris
>
>
> P.S.: Not a real world problem, just trying to understand the functionality of scikit-learn. Have only been working with the package for two weeks.
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171008/e987b320/attachment.html>


More information about the scikit-learn mailing list