Fwd: [Python-Dev] Re: Semi-proposal: Tagged None
I actually thought this thread was on pyhon-ideas where it is now
appropriate. Chris quotes everything I wrote, and his comments are all
useful, so let me move starting with that.
---------- Forwarded message ---------
From: Chris Angelico
On Thu, Oct 21, 2021 at 2:52 AM Steven D'Aprano
wrote:
On Tue, Oct 19, 2021 at 05:09:42PM -0700, Michael Selik wrote:
None and its ilk often conflate too many qualities. For example, is it missing because it doesn't exist, it never existed, or because we never received a value, despite knowing it must exist?
30+ years later, and we cannot easily, reliably or portably use NAN payloads. Most people don't care. If we offerred them a dozen or a thousand distinct sentinels for all the various kinds of missing data, how many people would use them and how many would just stick to plain old None?
In data science, I have been frustrated by the sparsity of ways of spelling "missing value."
Besides the distinction Michael points out, and that Steven did in relation to NaNs with payloads, I encounter missingness of various other sorts as well. Crucially, an important kind of missing data is data where
Might be worth redirecting this to -ideas. the value I received seems unreliable and I have decided to *impute* missingness rather than accept a value I believe is unreliable.
But there is also something akin to what Michael points out (maybe it's
just an example). For example, "middle name" is something that some people simply do not have, other people choose not to provide on a survey, and others still we just don't know anything beyond "it's not there."
And some people have more than one (I have a brother with two of them). Not the best example to use, since names have WAY more complexities than different types of absence, but there are other cases where that sort of thing comes up. For instance, if someone says on a survey that s/he is in Australia, and then you ask for a postcode, then leaving it blank should be recorded as "chose not to provide"; but if the country is listed as Timor-Leste / East Timor, then "not applicable" would be appropriate, since the country doesn't use postal codes.
Of course, when I impute missingness, I can do so at various stages of data cleaning, and for various different reasons or confidences. None (or NaN) are sort of OK, but carrying metadata as to the nature of missingness would be nice.
So my strawman suggestion is tagging None's. I suppose spellings like `None[reason]` or `None(reason)` are appealing.
An obvious problem that I recognize is that it's not obvious this can "play nice" with the common idiom `if mydata is not None: ...`. None really is a singleton, and a "tagged singleton" or "annotated singleton"
Right. Using postcodes as an example again, for someone in Australia, a postcode of "E3B 0H8" doesn't make sense, as that isn't the format we use. So you could wipe that out and replace it with "No postal code, malformed data entered". probably doesn't work well with Python's object model.
My goal, of course, would be to have TaggedNone be a kind of subclass of
None, in the same way that bool is a subclass of int, and hence True is a kind of 1. However, I'd want a large number of custom None's, with some sort of accessible string or numeric code or something to inspect which one it was.
But this is where I start to disagree. None should remain a singleton, but "no data available" could be its own thing, tied in with the way that you do your data storage and stats. As such, you wouldn't be checking it with 'is', so you wouldn't have that problem (the Python 'is' operator will only ever test for actual object identity). Keep None simple and dependable, and then "Missing Data" can be an entire class of values if you so desire. ChrisA _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6NY5NQCJ... Code of Conduct: http://python.org/psf/codeofconduct/
I think this is a really good and interesting idea. I do have one question - how would you envision this interacting with the sentinel values from PEP 661 (https://www.python.org/dev/peps/pep-0661/) ? Could 'Tagged None' and sentinel values be the same thing? Thanks, Doug
This is indeed largely the same motivation as PEP 661, which I supported
when raised, but got no clear consensus.
"Missing" is definitely a family of related sentinels, although sentinels
are more general.
On Thu, Oct 21, 2021, 10:15 PM Doug Swarin
I think this is a really good and interesting idea. I do have one question - how would you envision this interacting with the sentinel values from PEP 661 (https://www.python.org/dev/peps/pep-0661/) ? Could 'Tagged None' and sentinel values be the same thing?
Thanks, Doug _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/K2XEZT... Code of Conduct: http://python.org/psf/codeofconduct/
+1 for me on the usefulness of having a way to express "Missing Data". One common use-case I encounter is in function signatures when one wants arguments that are meant as overrides: class Foo: def __init__(self, bar=42, baz=None): self.bar = bar self.baz = baz def do_something(bar_override=None, baz_override=None) ... It works if None is not a valid value for bar or baz, but if it is, you'd have to do something like this class Missing: pass MISSING = object() class Foo: ... def do_something(bar_override=MISSING, baz_override=MISSING) ... Which quickly becomes a problematic to type-hint: Union[Optional[int], Missing] would be the correct way (mypy seems to agree), but is technically wrong. Missing is a sentinel value, not a valid value for this argument. If MISSING were an instance of a subclass of None, then Optional[int] should (in theory) work and users of my method would be None the wiser (pun firmly intended :p). PS: BTW, Missing/MISSING is how the dataclasses module does it
Sorry I posted a bit fast, my two example above should read: class Foo: def __init__(self, bar=42, baz=None): self.bar = bar self.baz = baz # def do_something(self, bar_override=None, baz_override=None) ... And the second one: class Missing: pass MISSING = Missing() class Foo: ... def do_something(self, bar_override=MISSING, baz_override=MISSING) ...
participants (3)
-
David Mertz, Ph.D.
-
Doug Swarin
-
Thomas Mc Kay