On Tue, Oct 19, 2021 at 05:09:42PM -0700, Michael Selik wrote:
> None and its ilk often conflate too many qualities. For example, is it
> missing because it doesn't exist, it never existed, or because we never
> received a value, despite knowing it must exist?
30+ years later, and we cannot easily, reliably or portably use NAN
payloads. Most people don't care. If we offerred them a dozen or a
thousand distinct sentinels for all the various kinds of missing data,
how many people would use them and how many would just stick to plain
old None?
In data science, I have been frustrated by the sparsity of ways of spelling "missing value."
Besides the distinction Michael points out, and that Steven did in relation to NaNs with payloads, I encounter missingness of various other sorts as well. Crucially, an important kind of missing data is data where the value I received seems unreliable and I have decided to *impute* missingness rather than accept a value I believe is unreliable.
But there is also something akin to what Michael points out (maybe it's just an example). For example, "middle name" is something that some people simply do not have, other people choose not to provide on a survey, and others still we just don't know anything beyond "it's not there."
Of course, when I impute missingness, I can do so at various stages of data cleaning, and for various different reasons or confidences. None (or NaN) are sort of OK, but carrying metadata as to the nature of missingness would be nice.
So my strawman suggestion is tagging None's. I suppose spellings like `None[reason]` or `None(reason)` are appealing.
An obvious problem that I recognize is that it's not obvious this can "play nice" with the common idiom `if mydata is not None: ...`. None really is a singleton, and a "tagged singleton" or "annotated singleton" probably doesn't work well with Python's object model.
My goal, of course, would be to have TaggedNone be a kind of subclass of None, in the same way that bool is a subclass of int, and hence True is a kind of 1. However, I'd want a large number of custom None's, with some sort of accessible string or numeric code or something to inspect which one it was.