On Tue, Dec 29, 2015, 11:41 AM Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:

but emit TypeError if
> asked to cope with None as a value.
>
Well, sort of. Numpy arrays are homogenous, you can't have a None in
an array ( other than an object style). All the Numpy "ufuncs" create
an array from the input first -- that's where you get your ValueError.

But the Numpy experience is informative -- there have been years of "
we need a better masked array" discussions, but no consensus on what
it should be.

For floats, NaN can be used for missing values, but there is no such
value for integers, and each use case has a sufferer end "obvious"
interpretation. That's why it's explicit what you want with the nan*
functions.

I don't think python should decide for users what None means in this context.

None is obviously the sound of one hand clapping. When you understand its proper use, you become Enlightened.

-CHB

> I think the fact both NumPy and pandas support R-style handling of
> min() and max() counts in favour of having variants of those with
> additional options for handling missing data values in the standard
> library statistics module.

NumPy and Pandas have a slightly different audience than Python core. The scientific community often veers more practical than pure, in some cases to the detriment of code clarity.

> Regards,
> Nick.
>
> P.S. Another option might be to consider the question as part of a
> general "data cleaning" strategy for the statistics module, similar to
> the one discussed for pandas at
> http://pandas.pydata.org/pandas-docs/stable/missing_data.html

I prefer this option. Why solve the special case of max/min when we can solve (or help solve) the general case of missing data. There's already the internal ``_coerce`` method. Maybe clean that up for public consumption, or something like it, adding drop-missing functionality?

If that flies, then there might be room for an ``interpolate(sequence, method='linear')`` which would be awesome.

> Even if the statistics module itself doesn't provide the tools to
> address those problems, it could provide some useful pointers on when
> someone may want to switch from the standard library module to a more
> comprehensive solution like pandas that better handles the messy
> complications of working with real world data (and data formats).
>
> --
> Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia