Re: [Numpydiscussion] [SciPyUser] Why slicing Pandas column and then subtract gives NaN?
Thanks a lot, Thomas. I don’t have index when I read in the data. I just want to slice two series to the same length, and subtract. That’s it! I also don’t what numpy methods wrapped within methods. They work, but hard do understand. How would you do it? In Matlab or R, it’s very simple, one line. ________________________________ From: SciPyUser <scipyuserbounces+tmrsg11=gmail.com@python.org> on behalf of Thomas Kluyver <takowl@gmail.com> Sent: Thursday, February 14, 2019 4:54 PM To: SciPy Users List Cc: Discussion of Numerical Python Subject: Re: [SciPyUser] [Numpydiscussion] Why slicing Pandas column and then subtract gives NaN? Maybe it's useful to look a bit more at what pandas is doing and why. The 'index' on a series or dataframe labels each row  e.g. if your series is measuring total sales for each day, its index would be the dates. When you combine (e.g. subtract) two series, pandas automatically lines up the indices. So it will join up the numbers for February 14th, even if they're not in the same position in the data. In your example, you haven't specified an index, so pandas generates an integer index which doesn't really mean anything, and aligning on it doesn't do what you want. What are you trying to do? If Numpy does exactly what you want, then the answer might be to use Numpy.
Isn't Numpy built on top of Pandas?
It's the other way round: pandas is built on Numpy. Pandas indices are an extra layer of functionality on top of what Numpy does. On Thu, 14 Feb 2019 at 20:22, C W <tmrsg11@gmail.com<mailto:tmrsg11@gmail.com>> wrote: Hi Paul, Thanks for your response! I did not find a Pandas list for users, only for developers. I'd love to be on there. result = a.subtract(b.shift()).dropna() This seems verbose, several layers of parenthesis follow by a dot method. I'm new to Python, I thought Python code would be pity and short. Is this what everyone will write? Thank you! On Wed, Feb 13, 2019 at 6:50 PM Paul Hobson <pmhobson@gmail.com<mailto:pmhobson@gmail.com>> wrote: This is more a question for the pandas list, but since i'm here i'll take a crack. * numpy aligns arrays by position. * pandas aligns by label. So what you did in pandas is roughly equivalent to the following: a = pandas.Series([85, 86, 87, 86], name='a').iloc[1:4].to_frame() b = pandas.Series([15, 72, 2, 3], name='b').iloc[0:3].to_frame() result = a.join(b,how='outer').assign(diff=lambda df: df['a']  df['b']) print(result) a b diff 0 NaN 15.0 NaN 1 86.0 72.0 14.0 2 87.0 2.0 85.0 3 86.0 NaN NaN So what I think you want would be the following: a = pandas.Series([85, 86, 87, 86], name='a') b = pandas.Series([15, 72, 2, 3], name='b') result = a.subtract(b.shift()).dropna() print(result) 1 71.0 2 15.0 3 84.0 dtype: float64 On Wed, Feb 13, 2019 at 2:51 PM C W <tmrsg11@gmail.com<mailto:tmrsg11@gmail.com>> wrote: Dear list, I have the following to Pandas Series: a, b. I want to slice and then subtract. Like this: a[1:4]  b[0:3]. Why does it give me NaN? But it works in Numpy. Example 1: did not work
a = pd.Series([85, 86, 87, 86]) b = pd.Series([15, 72, 2, 3]) a[1:4]b[0:3] 0 NaN 1 14.0 2 85.0 3 NaN type(a[1:4]) <class 'pandas.core.series.Series'>
Example 2: worked If I use values() method, it's converted to a Numpy object. And it works!
a.values[1:4]b.values[0:3] array([71, 15, 84]) type(a.values[1:4]) <class 'numpy.ndarray'>
What's the reason that Pandas in example 1 did not work? Isn't Numpy built on top of Pandas? So, why is everything ok in Numpy, but not in Pandas? Thanks in advance! _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org<mailto:NumPyDiscussion@python.org> https://mail.python.org/mailman/listinfo/numpydiscussion _______________________________________________ SciPyUser mailing list SciPyUser@python.org<mailto:SciPyUser@python.org> https://mail.python.org/mailman/listinfo/scipyuser _______________________________________________ SciPyUser mailing list SciPyUser@python.org<mailto:SciPyUser@python.org> https://mail.python.org/mailman/listinfo/scipyuser
I don’t have index when I read in the data. I just want to slice two series to the same length, and subtract. That’s it!
I also don’t what numpy methods wrapped within methods. They work, but hard do understand.
How would you do it? In Matlab or R, it’s very simple, one line.
Why are you using pandas at all? If you want the Matlab equivalent, use NumPy from the beginning (or as soon as possible). I personally agree with you that pandas is too verbose, which is why I mostly use NumPy for this kind of arithmetic, and reserve pandas for advanced data table type functionality (like groupbys and joining on indices). As you saw yourself, a.values[1:4]  b.values[0:3] works great. If you read in your data into NumPy from the beginning, it’ll be a[1:4]  b[0:3] just like in Matlab. (Or even better: a[1:]  b[:1]).
The original data was in CSV format. I read it in using pd.read_csv(). It does have column names, but no row names. I don’t think numpy reads csv files. And also, when I do a[2:5]b[:3], it does not throw any “index out of range” error. I was able to catch that, but in both Matlab and R. You get an error. This is frustrating!! ________________________________ From: NumPyDiscussion <numpydiscussionbounces+tmrsg11=gmail.com@python.org> on behalf of Juan NunezIglesias <jni.soma@gmail.com> Sent: Friday, February 15, 2019 4:15 AM To: Discussion of Numerical Python Subject: Re: [Numpydiscussion] [SciPyUser] Why slicing Pandas column and then subtract gives NaN? I don’t have index when I read in the data. I just want to slice two series to the same length, and subtract. That’s it! I also don’t what numpy methods wrapped within methods. They work, but hard do understand. How would you do it? In Matlab or R, it’s very simple, one line. Why are you using pandas at all? If you want the Matlab equivalent, use NumPy from the beginning (or as soon as possible). I personally agree with you that pandas is too verbose, which is why I mostly use NumPy for this kind of arithmetic, and reserve pandas for advanced data table type functionality (like groupbys and joining on indices). As you saw yourself, a.values[1:4]  b.values[0:3] works great. If you read in your data into NumPy from the beginning, it’ll be a[1:4]  b[0:3] just like in Matlab. (Or even better: a[1:]  b[:1]).
The original data was in CSV format. I read it in using pd.read_csv(). It does have column names, but no row names. I don’t think numpy reads csv files I routinely read csv files using numpy.loadtxt https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
And also, when I do a[2:5]b[:3], it does not throw any “index out of range” error. I was able to catch that, but in both Matlab and R. You get an error. This is frustrating!! That's basic slicing behaviour of python. You might like it or not, but it's baked into the language:
[1,2][:10], [1,2][5:7] ([1, 2], []) One would need very good reasons to break this in case of a thirdparty library.
András
________________________________ From: NumPyDiscussion <numpydiscussionbounces+tmrsg11=gmail.com@python.org> on behalf of Juan NunezIglesias <jni.soma@gmail.com> Sent: Friday, February 15, 2019 4:15 AM To: Discussion of Numerical Python Subject: Re: [Numpydiscussion] [SciPyUser] Why slicing Pandas column and then subtract gives NaN?
I don’t have index when I read in the data. I just want to slice two series to the same length, and subtract. That’s it!
I also don’t what numpy methods wrapped within methods. They work, but hard do understand.
How would you do it? In Matlab or R, it’s very simple, one line.
Why are you using pandas at all? If you want the Matlab equivalent, use NumPy from the beginning (or as soon as possible). I personally agree with you that pandas is too verbose, which is why I mostly use NumPy for this kind of arithmetic, and reserve pandas for advanced data table type functionality (like groupbys and joining on indices).
As you saw yourself, a.values[1:4]  b.values[0:3] works great. If you read in your data into NumPy from the beginning, it’ll be a[1:4]  b[0:3] just like in Matlab. (Or even better: a[1:]  b[:1]). _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On Fri, Feb 15, 2019 at 5:12 AM Mike C <tmrsg11@gmail.com> wrote:
The original data was in CSV format. I read it in using pd.read_csv(). It does have column names, but no row names. I don’t think numpy reads csv files.
If you read a file into a pandas structure, it will have row labels. The default labels are integers that correspond to the ordinal positions of the values. Numpy reads files. https://docs.scipy.org/doc/numpy1.15.0/reference/generated/numpy.loadtxt.ht... https://docs.scipy.org/doc/numpy1.15.0/reference/generated/numpy.genfromtxt... I prefer file IO in pandas, so I don't know which function will better suite your needs. And also, when I do a[2:5]b[:3], it does not throw any “index out of
range” error. I was able to catch that, but in both Matlab and R. You get an error. This is frustrating!!
That's a feature of python in general, not numpy in particular. Every language has its own quirks. The more you immerse yourself in them, the quick you'll learn to adapt to them. paul
 *From:* NumPyDiscussion <numpydiscussionbounces+tmrsg11= gmail.com@python.org> on behalf of Juan NunezIglesias <jni.soma@gmail.com
*Sent:* Friday, February 15, 2019 4:15 AM *To:* Discussion of Numerical Python *Subject:* Re: [Numpydiscussion] [SciPyUser] Why slicing Pandas column and then subtract gives NaN?
I don’t have index when I read in the data. I just want to slice two series to the same length, and subtract. That’s it!
I also don’t what numpy methods wrapped within methods. They work, but hard do understand.
How would you do it? In Matlab or R, it’s very simple, one line.
Why are you using pandas at all? If you want the Matlab equivalent, use NumPy from the beginning (or as soon as possible). I personally agree with you that pandas is too verbose, which is why I mostly use NumPy for this kind of arithmetic, and reserve pandas for advanced data table type functionality (like groupbys and joining on indices).
As you saw yourself, a.values[1:4]  b.values[0:3] works great. If you read in your data into NumPy from the beginning, it’ll be a[1:4]  b[0:3] just like in Matlab. (Or even better: a[1:]  b[:1]). _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
Fair enough. Python has been called the #1 language for data science. If I'm slicing a[2:5] out of range, why not throw an error. This is disappointing! I mean, why would you design a language to slice outside of range? Also, no other language I know have this strange behavior. On Fri, Feb 15, 2019 at 12:51 PM Paul Hobson <pmhobson@gmail.com> wrote:
On Fri, Feb 15, 2019 at 5:12 AM Mike C <tmrsg11@gmail.com> wrote:
The original data was in CSV format. I read it in using pd.read_csv(). It does have column names, but no row names. I don’t think numpy reads csv files.
If you read a file into a pandas structure, it will have row labels. The default labels are integers that correspond to the ordinal positions of the values.
Numpy reads files.
https://docs.scipy.org/doc/numpy1.15.0/reference/generated/numpy.loadtxt.ht...
https://docs.scipy.org/doc/numpy1.15.0/reference/generated/numpy.genfromtxt...
I prefer file IO in pandas, so I don't know which function will better suite your needs.
And also, when I do a[2:5]b[:3], it does not throw any “index out of
range” error. I was able to catch that, but in both Matlab and R. You get an error. This is frustrating!!
That's a feature of python in general, not numpy in particular. Every language has its own quirks. The more you immerse yourself in them, the quick you'll learn to adapt to them. paul
 *From:* NumPyDiscussion <numpydiscussionbounces+tmrsg11= gmail.com@python.org> on behalf of Juan NunezIglesias < jni.soma@gmail.com> *Sent:* Friday, February 15, 2019 4:15 AM *To:* Discussion of Numerical Python *Subject:* Re: [Numpydiscussion] [SciPyUser] Why slicing Pandas column and then subtract gives NaN?
I don’t have index when I read in the data. I just want to slice two series to the same length, and subtract. That’s it!
I also don’t what numpy methods wrapped within methods. They work, but hard do understand.
How would you do it? In Matlab or R, it’s very simple, one line.
Why are you using pandas at all? If you want the Matlab equivalent, use NumPy from the beginning (or as soon as possible). I personally agree with you that pandas is too verbose, which is why I mostly use NumPy for this kind of arithmetic, and reserve pandas for advanced data table type functionality (like groupbys and joining on indices).
As you saw yourself, a.values[1:4]  b.values[0:3] works great. If you read in your data into NumPy from the beginning, it’ll be a[1:4]  b[0:3] just like in Matlab. (Or even better: a[1:]  b[:1]). _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
_______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On 15022019 14:48, C W wrote:
Fair enough. Python has been called the #1 language for data science. If I'm slicing a[2:5] out of range, why not throw an error. This is> disappointing!
No one here is trying to convince you to use Python. If you don't like it, don't use it. Complain in this venue about how you don't like the language is not productive and it is not going to change Python's (or Numpy's) design. I suggest you to instead invest the time to understand why things work the way they do. Cheers, Dan
Dan,
No one here is trying to convince you to use Python. If you don't like it, don't use it.
The problem is not me, it's the language. No need to take it out on me personally. I've used other languages, Python is lacking in this area. I'm being very frank here, just think about it. On Fri, Feb 15, 2019 at 6:53 PM Daniele Nicolodi <daniele@grinta.net> wrote:
On 15022019 14:48, C W wrote:
Fair enough. Python has been called the #1 language for data science. If I'm slicing a[2:5] out of range, why not throw an error. This is> disappointing!
No one here is trying to convince you to use Python. If you don't like it, don't use it. Complain in this venue about how you don't like the language is not productive and it is not going to change Python's (or Numpy's) design. I suggest you to instead invest the time to understand why things work the way they do.
Cheers, Dan _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On Sat, Feb 16, 2019 at 8:42 PM C W <tmrsg11@gmail.com> wrote:
Dan,
No one here is trying to convince you to use Python. If you don't like it, don't use it.
The problem is not me, it's the language. No need to take it out on me personally. I've used other languages, Python is lacking in this area. I'm being very frank here, just think about it.
No one was taking it out on you personally. We're just stating that we're not interested in having a discussion about which semantics are best, much less convincing you that Python's choice is the right one. They have been that way for a long time, and the time for making those decisions is long past. We could not change them now if we wanted to. Empirically, I've been on these lists for about 20 years now, and I have not seen this pop up as a frequent issue causing bugs in real code, so I would submit that if there is a lack (or benefit) compared to other languages, it is small. That said, Python is not alone here. Perl and Ruby have Python's semantics. R introduces NaNs but does not raise an error. Matlab and Julia do raise errors.  Robert Kern
participants (7)

Andras Deak

C W

Daniele Nicolodi

Juan NunezIglesias

Mike C

Paul Hobson

Robert Kern