I've moved this to python-ideas where it is more appropriate, as Chris notes 

On Thu, Oct 21, 2021, 8:42 PM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Oct 22, 2021 at 3:23 AM David Mertz, Ph.D.
<david.mertz@gmail.com> wrote:
>
> On Thu, Oct 21, 2021 at 2:52 AM Steven D'Aprano <steve@pearwood.info> wrote:
>>
>> On Tue, Oct 19, 2021 at 05:09:42PM -0700, Michael Selik wrote:
>> > None and its ilk often conflate too many qualities. For example, is it
>> > missing because it doesn't exist, it never existed, or because we never
>> > received a value, despite knowing it must exist?
>
>
>>
>> 30+ years later, and we cannot easily, reliably or portably use NAN
>> payloads. Most people don't care. If we offerred them a dozen or a
>> thousand distinct sentinels for all the various kinds of missing data,
>> how many people would use them and how many would just stick to plain
>> old None?
>
>
> In data science, I have been frustrated by the sparsity of ways of spelling "missing value."

Might be worth redirecting this to -ideas.

> Besides the distinction Michael points out, and that Steven did in relation to NaNs with payloads, I encounter missingness of various other sorts as well.  Crucially,  an important kind of missing data is data where the value I received seems unreliable and I have decided to *impute* missingness rather than accept a value I believe is unreliable.
>
> But there is also something akin to what Michael points out (maybe it's just an example).  For example, "middle name" is something that some people simply do not have, other people choose not to provide on a survey, and others still we just don't know anything beyond "it's not there."
>

And some people have more than one (I have a brother with two of
them). Not the best example to use, since names have WAY more
complexities than different types of absence, but there are other
cases where that sort of thing comes up. For instance, if someone says
on a survey that s/he is in Australia, and then you ask for a
postcode, then leaving it blank should be recorded as "chose not to
provide"; but if the country is listed as Timor-Leste / East Timor,
then "not applicable" would be appropriate, since the country doesn't
use postal codes.

> Of course, when I impute missingness, I can do so at various stages of data cleaning, and for various different reasons or confidences.  None (or NaN) are sort of OK, but carrying metadata as to the nature of missingness would be nice.
>

Right. Using postcodes as an example again, for someone in Australia,
a postcode of "E3B 0H8" doesn't make sense, as that isn't the format
we use. So you could wipe that out and replace it with "No postal
code, malformed data entered".

> So my strawman suggestion is tagging None's.  I suppose spellings like `None[reason]` or `None(reason)` are appealing.
>
> An obvious problem that I recognize is that it's not obvious this can "play nice" with the common idiom `if mydata is not None: ...`.  None really is a singleton, and a "tagged singleton" or "annotated singleton" probably doesn't work well with Python's object model.
>
> My goal, of course, would be to have TaggedNone be a kind of subclass of None, in the same way that bool is a subclass of int, and hence True is a kind of 1.  However, I'd want a large number of custom None's, with some sort of accessible string or numeric code or something to inspect which one it was.
>

But this is where I start to disagree. None should remain a singleton,
but "no data available" could be its own thing, tied in with the way
that you do your data storage and stats. As such, you wouldn't be
checking it with 'is', so you wouldn't have that problem (the Python
'is' operator will only ever test for actual object identity).

Keep None simple and dependable, and then "Missing Data" can be an
entire class of values if you so desire.

ChrisA
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6NY5NQCJR3ROFBWWFOVD47HJFBQJC3IZ/
Code of Conduct: http://python.org/psf/codeofconduct/