[SciPy-User] [Numpy-discussion] Why slicing Pandas column and then subtract gives NaN?

Thomas Kluyver takowl at gmail.com
Thu Feb 14 16:53:10 EST 2019


Maybe it's useful to look a bit more at what pandas is doing and why. The
'index' on a series or dataframe labels each row - e.g. if your series is
measuring total sales for each day, its index would be the dates. When you
combine (e.g. subtract) two series, pandas automatically lines up the
indices. So it will join up the numbers for February 14th, even if they're
not in the same position in the data.

In your example, you haven't specified an index, so pandas generates an
integer index which doesn't really mean anything, and aligning on it
doesn't do what you want.

What are you trying to do? If Numpy does exactly what you want, then the
answer might be to use Numpy.

> Isn't Numpy built on top of Pandas?

It's the other way round: pandas is built on Numpy. Pandas indices are an
extra layer of functionality on top of what Numpy does.

On Thu, 14 Feb 2019 at 20:22, C W <tmrsg11 at gmail.com> wrote:

> Hi Paul,
>
> Thanks for your response! I did not find a Pandas list for users, only for
> developers. I'd love to be on there.
>
> result = a.subtract(b.shift()).dropna()
>
> This seems verbose, several layers of parenthesis follow by a dot method.
> I'm new to Python, I thought Python code would be pity and short. Is this
> what everyone will write?
>
> Thank you!
>
>
>
> On Wed, Feb 13, 2019 at 6:50 PM Paul Hobson <pmhobson at gmail.com> wrote:
>
>> This is more a question for the pandas list, but since i'm here i'll take
>> a crack.
>>
>>
>>    - numpy aligns arrays by position.
>>    - pandas aligns by label.
>>
>> So what you did in pandas is roughly equivalent to the following:
>>
>> a = pandas.Series([85, 86, 87, 86], name='a').iloc[1:4].to_frame()
>> b = pandas.Series([15, 72, 2, 3], name='b').iloc[0:3].to_frame()
>> result = a.join(b,how='outer').assign(diff=lambda df: df['a'] - df['b'])
>> print(result)
>>
>>       a     b  diff
>> 0   NaN  15.0   NaN
>> 1  86.0  72.0  14.0
>> 2  87.0   2.0  85.0
>> 3  86.0   NaN   NaN
>>
>> So what I think you want would be the following:
>>
>> a = pandas.Series([85, 86, 87, 86], name='a')
>> b = pandas.Series([15, 72, 2, 3], name='b')
>> result = a.subtract(b.shift()).dropna()
>> print(result)
>> 1    71.0
>> 2    15.0
>> 3    84.0
>> dtype: float64
>>
>>
>>
>> On Wed, Feb 13, 2019 at 2:51 PM C W <tmrsg11 at gmail.com> wrote:
>>
>>> Dear list,
>>>
>>> I have the following to Pandas Series: a, b. I want to slice and then
>>> subtract. Like this: a[1:4] - b[0:3]. Why does it give me NaN? But it works
>>> in Numpy.
>>>
>>> Example 1: did not work
>>> >>>a = pd.Series([85, 86, 87, 86])
>>> >>>b = pd.Series([15, 72, 2, 3])
>>> >>> a[1:4]-b[0:3] 0   NaN 1   14.0 2   85.0 3   NaN
>>> >>> type(a[1:4])
>>> <class 'pandas.core.series.Series'>
>>>
>>> Example 2: worked
>>> If I use values() method, it's converted to a Numpy object. And it works!
>>> >>> a.values[1:4]-b.values[0:3]
>>> array([71, 15, 84])
>>> >>> type(a.values[1:4])
>>> <class 'numpy.ndarray'>
>>>
>>> What's the reason that Pandas in example 1 did not work? Isn't Numpy
>>> built on top of Pandas? So, why is everything ok in Numpy, but not in
>>> Pandas?
>>>
>>> Thanks in advance!
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at python.org
>> https://mail.python.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at python.org
> https://mail.python.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-user/attachments/20190214/16c613ac/attachment-0001.html>


More information about the SciPy-User mailing list