[Python-ideas] Have max and min functions ignore None

Michael Selik mike at selik.org
Wed Dec 30 23:30:24 EST 2015


On Tue, Dec 29, 2015, 11:41 AM Chris Barker - NOAA Federal <
chris.barker at noaa.gov> wrote:

> but emit TypeError if
> > asked to cope with None as a value.
> >
> Well, sort of. Numpy arrays are homogenous, you can't have a None in
> an array ( other than an object style). All the Numpy "ufuncs" create
> an array from the input first -- that's where you get your ValueError.
>
> But the Numpy experience is informative -- there have been years of "
> we need a better masked array" discussions, but no consensus on what
> it should be.
>
> For floats, NaN can be used for missing values, but there is no such
> value for integers, and each use case has a sufferer end "obvious"
> interpretation. That's why it's explicit what you want with the nan*
> functions.
>
> I don't think python should decide for users what None means in this
> context.
>

None is obviously the sound of one hand clapping. When you understand its
proper use, you become Enlightened.


> -CHB
>
> > I think the fact both NumPy and pandas support R-style handling of
> > min() and max() counts in favour of having variants of those with
> > additional options for handling missing data values in the standard
> > library statistics module.
>

NumPy and Pandas have a slightly different audience than Python core. The
scientific community often veers more practical than pure, in some cases to
the detriment of code clarity.

> Regards,
> > Nick.
> >
> > P.S. Another option might be to consider the question as part of a
> > general "data cleaning" strategy for the statistics module, similar to
> > the one discussed for pandas at
> > http://pandas.pydata.org/pandas-docs/stable/missing_data.html


I prefer this option. Why solve the special case of max/min when we can
solve (or help solve) the general case of missing data. There's already the
internal ``_coerce`` method. Maybe clean that up for public consumption, or
something like it, adding drop-missing functionality?

If that flies, then there might be room for an ``interpolate(sequence,
method='linear')`` which would be awesome.


> > Even if the statistics module itself doesn't provide the tools to
> > address those problems, it could provide some useful pointers on when
> > someone may want to switch from the standard library module to a more
> > comprehensive solution like pandas that better handles the messy
> > complications of working with real world data (and data formats).
> >
> > --
> > Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20151231/c46bfee9/attachment.html>


More information about the Python-ideas mailing list