[Python-ideas] Re: Add Scalar to collections.abc with str, bytes, etc. being both Scalar and Sequence (therefore also Container)

14 Oct 2019

      Chris Angelico wrote:
...
On Mon, Oct 14, 2019 at 8:48 AM Steve Jorgensen stevej@stevej.name wrote:
[snip]
You want to drill down into all containers except strings, so why not
just say that?
Scalar = (str, bytes)
A scalar is not just str or bytes though. It is also anything else that's not a collection, so numbers, dates, booleans, and instances of any other classes (standard or custom) that are not collections. A string is just one of the few examples of something that is a collection but should be treated as a scalar by anything that asks whether it is one.

f we were still talking about Python 2, then `unicode` would be another such type, but in Python 3 we do have `bytes`, `bytearray`, and `memoryview`. In addition, we might encounter subclasses of `collections.UserString` (which do not identify as subclasses of `str`).

There is no guarantee that stringlike classes are the only cases anyone would ever want to be treated this way. In fact, you pointed out `namedtuple` below, so that's another case. Also, there can be custom classes that one might want to register as `Scalar`. For instance, Someone has a date-like class that can be treated as sequences of year, mont, day values, those members are still primarily attributes and secondarily sequence items. This would be something one might want to register as `Scalar` (even if not implemented as a `namedtuple` or a subclass thereof.
...
If a future Python version introduces a new string-like type, you
would have to make a decision as to whether it should be treated as a
string or as a collection of strings. (Assuming it isn't actually a
subclass of str or bytes, in which case it's been made clear.) Whether
something should be treated as atomic or iterable depends on usage,
not just the inherent attributes of the type; for instance, a
namedtuple is iterable (as a subclass of tuple), but many namedtuple
instances should be treated as atomic, and not drilled down into.
Well, if such a new type is truly stringlike, and `Scalar` has been added to the standard `collections.abc`, then clearly it should be registered as `Scalar` in `collections.abc` just as `str` would be. As I mentioned above, I think that `namedtuple` also makes sense to be scalar since its members are ore often primarily attributes than primarily values in a sequence.
...
(Classic example: a point, defined as (x,y), should almost certainly
be seen as atomic. Even more so if you define a point as (R, theta).)
Alternatively, do what str.startswith/str.endswith do and mandate that
a pair be one of a specific set of types. That's usually the easiest
plan when working with both sequences and mappings; you can define
sequence behaviour as applying ONLY to tuples and lists (and maybe
sets), mapping behaviour applying ONLY to dicts, and every other
object will be treated as a scalar. It's a small restriction on your
caller, and a massive simplicity. But if you truly want to support
arbitrary sequences (or perhaps arbitrary containers, or arbitrary
iterables), then just special-case whichever string types make sense
to you, and run with it.
Parsing out 2 different points from that…

Assuming `Scalar` can't clean up every case we can imagine for this kind of ambiguity can come up with doesn't mean it's not potentially very helpful. After the discussion so far, I happen to still feel like this is a compelling idea that I'd like to see implemented.

The problem with special-casing is when interfacing between different bodies of code. That is to say, using libraries or writing libraries for others to use. Having a central `Scalar` class that different bodies of code can register their classes to be provides a clean way for bodies of code to share this information without having to actually know ahead of time which other bodies of code in an integrated application might need to use that information.