On Tue, May 26, 2020 at 12:02 PM Dominik Vilsmeier <dominik.vilsmeier@gmx.de> wrote:
The NumPy, deque, and lru_cache cases are all ones where None is a perfect sentinel and the hypothetical 'undef' syntax would have zero value.

For both `deque` and `lru_cache` None is a sensible argument so it can't act as a sentinel. It just happens that these two cases don't need to check if an argument was supplied or not, so they don't need a sentinel.

I think getting caught up in the specific functions is a bit of a rabbit hole.  I think the general point stands that None is usually fine as sentinel, and in the rare cases it's not, it's easy to define your own.

But lru_cache() and deque() seem kinda obvious here.  The default cache size is a finite number, but we can pass in any other positive integer for maxsize.  Therefore, nothing in the domain of positive integers can carry the sentinel meaning "unbounded" or "infinite."  We need a special value to signal different behavior (in other words, a sentinel).

In C/C++, we'd probably use -1. It's an integer, but not one that makes sense as a size.  I suppose we might use float('inf') as a name for unbounded, but then there are issues of converting that float to an int... or really just doing what currently happens of taking a different code path.  For every practical purpose, sys.maxsize would be fine.  It is not technically infinite, but it's far larger than the amount of memory you can possibly cache.  I do that relative often myself... "really big" is enough like "infinite" for most practical purposes (I could make an even bigger integer in Python, of course, but that it mnemonic and plenty big).

Or we could use a string like '"UNBOUNDED"'. Or an enumeration. Or a module constant.  But since there is just one special state/code path to signal, None is a perfect sentinel.

If `keepdims` is not supplied, it won't be passed on to sub-classes; if it is set to None then the sub-class receives `keepdims=None` as well:

Yeah, OK, there's a slight different if you subclass ndarray. I've never felt an urge to do that, and never seen code that did... but it's possible.
 
For `np.concatenate` None is a meaningful argument to `axis` since it will flatten the arrays before concatenation.

This is again very similar to the sentinel in lru_cache().  It means "use a different approach" to the algorithm.  I'm not sure what the C code does, but in concept it's basically:

sentinel = None
if axis is sentinel:
    a = a.flatten()
    b = b.flatten()
    axis = 0
... rest of logic ...
 
I was wondering if anyone would mention Pandas, which is great, but in many ways and abuse of Pythonic programming. There None in an initializing collection (often) gets converted to NaN, both of which mean "missing", which is something different. This is kind of an abuse of both None and NaN... which they know, and introduced an experimental pd.NA for exactly that reason... Unfortunately, so far, actually using of.NA is cumbersome, but hopefully that gets better next version.
I wouldn't say it's an abuse, it's an interpretation of these values. Using NaN has the clear advantage that it fits into a float array so it's memory efficient.

I know why they did it.  But as a data scientist, I sometimes (even often) care about the difference between "this computation went wonky" and "this data was never collected."  NaN is being used for both meanings, but they are actually importantly different cases.

--
The dead increasingly dominate and strangle both the living and the
not-yet born.  Vampiric capital and undead corporate persons abuse
the lives and control the thoughts of homo faber. Ideas, once born,
become abortifacients against new conceptions.