[Numpy-discussion] [SciPy-User] Why slicing Pandas column and then subtract gives NaN?

Avi Gross avigross at verizon.net
Sun Feb 17 19:24:17 EST 2019


Changing horses mid-stream is not generally a good idea. But python is easy
to extend so there may be other choices out there.

 

The reality is that what matters more than what choices a language makes is
consistency across the language. If you can learn the choices and even
quirks and work with them, you can get things done. I have used languages
where vectors or lists are zero-based and some where they are one-based.
Both work, albeit you may need to adjust an algorithm to add or subtract one
from the index you use or leave the first entry empty. If you use a pandas
data structure without specifying your own index, you get stuck with the
default behavior. If you wanted a 1 or 0 based index, you could add that
carefully and so on. 

 

In theory, you could create an alternate numpy or alternate pandas just like
many of the alternate modules you can find which  often have very different
designs for graphics and there probably already are some out there.
Integrating them with other tools may not be as simple. If you want to use
the machine learning tools available, some functions demand they be given
the data as a pandas DataFrame or a numpy array. Giving them something else
like a python list, even if it is just an object extended from the above,
may well break things.

 

This makes it very hard to port some things to or from other languages with
different design constraints as an algorithm that looks straightforward in
one because it goes with the overall grain of the language may need major
surgery to fit into the other and perhaps might best be rewritten using some
other algorithm that fits better.

 

A different discussion elsewhere about errors was instructive. Many
programmers used to write complex code that checks for all kinds of errors
or conditions. Some python programmers take an approach of expecting errors
to be something you catch so why bother checking for things like an array
index being non-existent. Well, if an error you expect in pandas is not
caught, consider checking before using software that does not enforce it.
And some "errors" are not errors. As mentioned, in R it is NOT a design
error that using an NA value in a calculation silently propagates the NA.
Throughout the language you must either check if there is an NA using the
is.na() function or by telling routines to drop/ignore any NA as in
sum(something, na.rm=TRUE). There are many design decisions you make this
way. So, if your language does something you don't want when it uses an
implicit index, fine. Give it an explicit index. If it does not properly
check the range you provide for validity, do it yourself. 

 

And if the language is so unsuitable for your needs, you can often switch.
In particular, there are now even ways you can start your code in one
language like python or R and then shunt some data structures across a
divide where the other language can do things another way and if needed,
shunt it back and repeat. That may not work well for your application and,
as noted, switching rides in midstream has its dangers.

 

From: NumPy-Discussion
<numpy-discussion-bounces+avigross=verizon.net at python.org> On Behalf Of
Robert Kern
Sent: Sunday, February 17, 2019 3:15 PM
To: Discussion of Numerical Python <numpy-discussion at python.org>
Subject: Re: [Numpy-discussion] [SciPy-User] Why slicing Pandas column and
then subtract gives NaN?

 

On Sat, Feb 16, 2019 at 8:42 PM C W <tmrsg11 at gmail.com
<mailto:tmrsg11 at gmail.com> > wrote:

Dan,

No one here is trying to convince you to use Python. If you don't like
it, don't use it. 

The problem is not me, it's the language. No need to take it out on me
personally. I've used other languages, Python is lacking in this area. I'm
being very frank here, just think about it.

 

No one was taking it out on you personally. We're just stating that we're
not interested in having a discussion about which semantics are best, much
less convincing you that Python's choice is the right one. They have been
that way for a long time, and the time for making those decisions is long
past. We could not change them now if we wanted to. Empirically, I've been
on these lists for about 20 years now, and I have not seen this pop up as a
frequent issue causing bugs in real code, so I would submit that if there is
a lack (or benefit) compared to other languages, it is small

 

That said, Python is not alone here. Perl and Ruby have Python's semantics.
R introduces NaNs but does not raise an error. Matlab and Julia do raise
errors.

 

-- 

Robert Kern

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190217/f36847c3/attachment-0001.html>


More information about the NumPy-Discussion mailing list