We should have an explicit concept of emptiness for collections

Hi all, The Programming Recommendations section in PEP-8 states "For sequences, (strings, lists, tuples), use the fact that empty sequences are false:" # Correct: if not seq: if seq: # Wrong: if len(seq): if not len(seq): In the talk "When Python Practices Go Wrong" Brandon Rhodes makes a good point against this practice based on "explicit is better than implicit" (https://youtu.be/S0No2zSJmks?t=873). He advertizes using if len(seq): While that is as explicit as one can get within the current language, it could still be more explicit: Semantically, we're not interested in the (zero) length of the sequence, but want to know if it is empty. **Proposal** Therefore, I propose syntax for an explicit empty check if isempty(seq): (i) or if seq.is_empty() (ii) This proposal is mainly motivated by the Zen verses "Explicit is better than implicit" and "Readability counts". The two variants have slightly different advantages and disadvantages: (i) would add an __isempty__ protocol and a corresponding isempty() builtin function. One could argue that this is somewhat redundant with __len__. However, this is a typical pattern in collection.abc abstract base classes: There are only a few abstract methods and futher concepts are added as mixin methods (which are by default implemented using the abstract methods). https://docs.python.org/3/library/collections.abc.html#collections-abstract-... (ii) would alternatively only implement a method on the collection. There's also precedence for predefined methods on collections, e.g. Sequence.index(). Advantages over the protocol approach are: - It's a smaller language change than adding a protocol and a builtin function - The order seq.is_empty() matches the order used in spoken english ("if the sequence is empty") which is more readable than isempty(seq). - It's tab-completable (-> usability) Disadvantages: - Emptiness is similar to length, and people might be used to the builtin-method concept for such things. - A protocol is more powerful than a method: One can support isempty() via len() even for objects that do not implement __isempty__. However, this more a theoretical advantage. I assume that in practice all collections derive from the collection.abc classes, so if the is_empty() method is implemented there, all typical use cases should be covered. I'm looking forward to your feedback! Tim

On Sun, Aug 22, 2021 at 10:28 PM Tim Hoffmann via Python-ideas <python-ideas@python.org> wrote:
I don't see that this gives anything above len(seq). Taking an example from the video you linked to: def unfriend(subject, users): if not users: return remove_edges('friend', subject, users) "What is the type of users? The only hint you are given is that it is used in an 'if' statement." Not true. You also have the fact that the name "users" is a plural. Based on that, and that alone, I would assume that it is some sort of collection. And then he goes on to say that it could be the integer 12. Okay, sure. But if you care about that distinction, "if len(users):" is absolutely fine here, and proves that your function will break if given an integer number of users rather than a collection of them. In Python, isempty(x) is spelled bool(x). That's simply what it means. I ask you, for these types, what should isempty do, what does bool do, and what does len do? * datetime.timedelta * range * slice * an SQL query that has yet to be executed * an SQL result set Under what circumstances would bool(x) differ from isempty(x)? Under what circumstances should the distinction be made? And when it should be made, would bool(len(x)) be different from isempty(x)? To be quite honest, my usual answer to "What is the type of X?" is "I don't care" (or "I don't care, as long as it is <quality>" eg iterable or subscriptable etc). The unfriend function shouldn't need to care what kind of thing it's been given, but if it does, Python has these things called type hints that can provide extra information to a static analyzer (and also to a human who's reading the code). But static analysis is often smart enough to not need them - and, quite frankly, static analysis is pretty near to magic with its ability to sniff out problems that programmers wouldn't even have thought to check for. (I've seen some Coverity reports and been pretty astonished at its detail.) What's the problem being solved by isempty? Are there any situations that couldn't be solved by either running a type checker, or by using len instead of bool? ChrisA

I agree that determining the type is possible most of the time, either by type hints or a static analyzer. Using len is possible, with the limitation that you need a full `len(x) == 0` for numpy arrays (see discussion above). The type aspect was emphasized in the video. I'm not too worried about that explicitly. The video was more of a starting point for me to reconsider the ideom `if not users`. My conclusion (and thus proposal) differs from the video. On a technical level, everything can be solved with the current language capabilities. The main advantage is clearer semantics (explicit is better / readability counts): - Re bool: As experienced python users we are used to translate `if not users` to "if users is empty" or "if we have no users", but it *is* less explicit than `if users.is_empty()`. - Re len: `if not len(users)` or `if len(users) == 0` is more explicit, but its semantically on a lower level. Counting elements is a more detailed operation than only checking if we have any element. That detail is not needed and distracting if we are only interested in is_empty. This is vaguely similar to iterating over indices (`for i in range(len(users))`) vs. iterating over elements (`for user in users`). We don't iterate over indices because that's usually a detail we don't need. I acknowledge that discussing readability can be vague and subjective, not least because we are used to the current ideoms. I'm also aware that we should be very conservative on adding new API, in particular if it's not technically necessary. However, Python makes a strong point on readability, which IMHO is one of the major reasons for its success. I belive that in that context adding is_empty syntactic sugar would be a clear improvement.

On 8/23/21 2:06 PM, Tim Hoffmann via Python-ideas wrote:
On a technical level, everything can be solved with the current language capabilities. The main advantage is clearer semantics (explicit is better / readability counts)
Note the explicit and readability are often at odds with each other. -- ~Ethan~

On 2021-08-23 at 21:06:46 -0000, Tim Hoffmann via Python-ideas <python-ideas@python.org> wrote:
I mentally translate "if not users" to "if there are not users" or "if there are no users." Whether users is a list (or some other sequence), a dict (or some other mapping), a set, or even some oddball collection type (e.g., the responses from a database query) is an implementation detail about which I don't care at that point.
Exactly. Asking whether a collection contains an element is on a slightly lower level than asking whether or not "there are any [Xs]."

Quoting the subject line: "We should have an explicit concept of emptiness for collections" We do. It's spelled: len(collection) == 0 You can't get more explicit than that. -- Steve

On 2021-08-22 17:36, Thomas Grainger wrote:
bool((len(collection) == 0) is True) == True and issubclass(True, bool)
'True' is a reserved word, so you don't need to check it. However, 'bool' might have been overridden, so: __builtins__.bool((len(collection) == 0) is True) == True Come to think of it, 'len' might have been overridden too, so: __builtins__.bool((__builtins__.len(collection) == 0) is True) == True

On Sun, Aug 22, 2021 at 07:01:28PM +0300, Serhiy Storchaka wrote:
(len(collection) == 0) is True
Ha ha, yes, very good, you got me. But the trouble is, if you don't trust the truth value of the predicate, it is hard to know when to stop: len(collection) == 0 (len(collection) == 0) is True ((len(collection) == 0) is True) is True (((len(collection) == 0) is True) is True) is True ((((len(collection) == 0) is True) is True)) is True # ... *wink* MRAB and Ricky: `__builtins__` is a CPython implementation detail and is reserved for the interpreter's private use. Other implementations may not even have it. The right way to write your code should be import builtins builtins.bool((builtins.len(collection) == 0) is True) is True -- Steve

Everyone in this thread should absolutely read Lewis Caroll's delightful and "What the Tortoise Said to Achilles." It's a very short 3-page story that addressed exactly this topic in 1895... even before Guido's Time Machine. One free copy of the public domain work is at: https://wmpeople.wm.edu/asset/index/cvance/Carroll On Sun, Aug 22, 2021 at 8:30 PM Steven D'Aprano <steve@pearwood.info> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Mon, Aug 23, 2021 at 12:13 AM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
That he's mad, 'tis true, 'tis true 'tis pity, And pity 'tis, 'tis true -- Hamlet, Act 2, Scene 2 --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

In all seriousness this is an actual problem with numpy/pandas arrays where: ``` Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.
eg https://pandas.pydata.org/pandas-docs/version/1.3.0/user_guide/gotchas.html#using-if-truth-statements-with-pandas
> Should it be True because it’s not zero-length, or False because there are False values? It is unclear, so instead, pandas raises a ValueError:
I'm not sure I believe the author here - I think it's clear. It should be True because it's not zero-length.

here's another fun one "A False midnight": https://lwn.net/Articles/590299/ https://bugs.python.org/issue13936#msg212771

On Mon, Aug 23, 2021 at 11:56 PM Thomas Grainger <tagrain@gmail.com> wrote:
here's another fun one "A False midnight": https://lwn.net/Articles/590299/ https://bugs.python.org/issue13936#msg212771
That was a consequence of a time value being an integer, and thus zero (midnight) was false. It was changed, but - as is fitting for a well-used language - backward compatibility was important. Modern versions of Python don't have that problem. ChrisA

On Mon, Aug 23, 2021 at 6:54 AM Thomas Grainger <tagrain@gmail.com> wrote:
This is a great example of the problem of the assumption of zero as representing false. I’ve always been ambivalent about Python’s concept of Truthiness (“something or nothing”). If I were to write my own language, I would probably require a actual Boolean for, eg, an if statement. The fact is that what defines falsiness is use case dependent. Numbers are the best example, zero. A often be a perfectly meaningful number. Which id why Brandon suggested that testing the length of a sequence was a good way to be explicit about what you mean by false in a particular context. But I see no reason to add a standardized way to check for an empty container- again “emptiness” may not be obviously defined either. Numpy arrays, (or Pandas Dataframes) are a good example here — there are more than one way to think of them as false - but maybe more than one way to think of them as empty too: Is a shape () (scalar) array empty ? Or shape (100, 0) ? Or shape (0, 100) Or shape (0,0,0,0) Or a rank 1 array that’s all zeros? Or all false? Anyway, the point is that “emptiness” may be situation specific, just like truthiness is. So explicitly specifying that you are looking for len(container) == 0 is more clear than isempty(container) would be, even if it did exist. With duck typing, you may well not know what type you are dealing with, but you absolutely need to know how you expect that object to behave in the context of your code. So if having a zero length is meaningful in your code — then only objects with a length will work, which is just fine. -CHB _______________________________________________
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, Aug 24, 2021 at 3:27 PM Christopher Barker <pythonchb@gmail.com> wrote:
More a problem with the assumption that midnight is zero. There's nothing at all wrong with zero being false - if, for instance, you're looking at a timedelta, a width of zero seconds is an empty time interval, and that can indeed be false. (Consider: What is the overlap between segment X and segment Y? Find the intersection between them; if the intersection has zero width, there is no overlap and the two time periods do not collide.)
I’ve always been ambivalent about Python’s concept of Truthiness (“something or nothing”). If I were to write my own language, I would probably require a actual Boolean for, eg, an if statement.
That becomes very frustrating. Before you design your own language, I strongly recommend trying out REXX, C, Python, LPC or Pike, JavaScript, and SourcePawn, just because they have such distinctly different type systems. (Obviously you'll have tried some of those already, but try to complete the set.) And by "try out", I mean spend a good amount of time coding in them. Get to know what's frustrating about them. For instance, what does "if 0.0" mean? (Python: False. C: False. JavaScript: False. Pike: True. REXX: Error. SourcePawn: False, but can become True if the tag changes. SourcePawn doesn't have types, it has tags.) Similarly, what is the truth value of an empty array (or equivalent in each language? (Python: False. C: True. JavaScript: True. REXX: Concept does not exist. SourcePawn: Error. Pike: True.) Not one of these languages is fundamentally *wrong*, but they disagree on what logical choices to make. (Yes, I'm being fairly generous towards SourcePawn here. Truth be told, it sucks.) If you're going to make changes from the way Python does things, be sure to have tried a language that already works that way, and see what the consequences are.
The fact is that what defines falsiness is use case dependent. Numbers are the best example, zero. A often be a perfectly meaningful number.
Of course it's a perfectly meaningful number, but you won't be saying "if x:" to figure out whether x is a meaningful number. The question is, what *else* does it mean? For instance, if you're asking "where in this string does the letter w first occur?", then 0 is a perfectly meaningful result, indicating that it's found at the very start of the string. But that's not about whether it's a number; it's whether it's a string index.
Which id why Brandon suggested that testing the length of a sequence was a good way to be explicit about what you mean by false in a particular context.
When people say "explicit is better than implicit", they usually mean "code I like is better than code I don't like". And then they get (somewhat rightly) lampooned by people like Steven who interpret "explicit" to mean "using more words to mean nothing", which clearly violates the Zen. What ARE you being explicit about? Are you stating the concept "check if there are any users", or are you stating the concept "take the list of users, count how many people there are in it, and check if that number is greater than zero"? The first one is written "if users:" and the second "if len(users) > 0:". They are BOTH explicit, but they are stating different things. The point of a high level programming language is that we can express abstract concepts. We don't have to hold the interpreter's hand and say "to figure out how many people are in the list, take the past-end-of-list pointer, subtract the list base pointer, and divide by the size of a list element". We say "give me the length of the list". And one huge advantage is that the interpreter is free to give you that length in any way it likes (directly storing the length, using pointer arithmetic, or even iterating over a sparse list and counting the slots that are occupied). Only be "more explicit" (in the sense of using lower-level constructs) if you need to be.
So explicitly specifying that you are looking for len(container) == 0 is more clear than isempty(container) would be, even if it did exist.
I'm not sure that that's any better. One is asking "how much stuff is in the container? Zero items?" and the other is asking "is this container empty?". They're different concepts. They will (usually) give the same result, but conceptually they're different, and it's not a scale of "more explicit" to "less explicit".
With duck typing, you may well not know what type you are dealing with, but you absolutely need to know how you expect that object to behave in the context of your code. So if having a zero length is meaningful in your code — then only objects with a length will work, which is just fine.
Yes. And it should make perfect sense to write something like this: if stuff: print("My stuff:") for thing in stuff: print(thing) where you omit the header if there are no elements to iterate over. What is the type of "stuff"? Do you care? Well, it can't be a generator, since those will always be true; but it can be any sort of collection. Be explicit about what you care about. Don't shackle the code with unnecessary constraints, and don't hand-hold the interpreter unnecessarily. ChrisA

On Mon, Aug 23, 2021 at 10:26:47PM -0700, Christopher Barker wrote:
That's not really a problem with zero representing false. Its a problem with representing time of day as a number where midnight is zero :-) *Durations* can be represented satisfactorily as a number (at least I can't think of any problems off the top of my head) but wall times (the number you see when you look at a clock) aren't really *numbers* in any real sense. You can't say "3:15pm times 2 is 6:30pm", 1:00am is not the multiplicative identity element and midnight is not the annihilating element (zero). We conventionally represent clock times as numbers, but they're more akin to ordinal data. They have an order, but you can't do arithmetic on them.
"Please sir, can we have some more Pascal" *wink* How about short-circuiting `or` and `and` operators? I'm not judging, just commenting. It surprises me that people who are extremely comfortable with duck-typing pretty much any other data type often draw the line at duck-typing bools.
The fact is that what defines falsiness is use case dependent. Numbers are the best example, zero. A often be a perfectly meaningful number.
Fun fact: not only did the ancient Greek mathematicians not count zero as a number, but the Pythagoreans didn't count 1 as a number either. https://www.britannica.com/topic/number-symbolism/Pythagoreanism Zero is a perfectly meaningful number, but it is special: it is the identity element for addition and subtraction, and the annihilating element for multiplication, and it has no inverse in the Reals. Even in number systems which allow division by zero, you still end up with weird shit like 1/0 = 2/0 = 3/0 ... -- Steve

On Tue, Aug 24, 2021 at 4:31 PM Steven D'Aprano <steve@pearwood.info> wrote:
... midnight is not the annihilating element (zero).
Unless you're Cinderella, of course.
Which puts them in a similar category to degrees Celsius - you can compare them, but there's no fundamental zero point. (It makes sense to say "what would the temperature be if it were 3° higher", but not "what is double this temperature" - not in Celsius.)
To be fair, that's more a question of "what is a number". If you have "a thing", is that "a number of things", or are they separate concepts? We have settled on the idea that "a thing" is a special case of "a number of things" mathematically, but try going to someone and saying "I have a number of wives" and seeing whether they think that 1 is a number. (And if zero is a number, then I also have a number of wives. Oh, and I have a number of pet cobras, too, so don't trespass on my property.)
Numbers in general are useful concepts that help us with real-world problems, but it's really hard to pin them down. You can easily explain what "three apples" looks like, and you can show what "ten apples" looks like, and from that, you can intuit that the difference between them is the addition or removal of "seven apples". Generalizing that gives you a concept of numbers. But what is the operation that takes you from "three apples" to "three apples"? Well, obviously, it was removing zero elephants, how are you so dimwitted as to have not figured that out?!? In that sense, zero is very special, as it represents NOT doing anything, the LACK of a transition. (Negative numbers are a lot weirder. I have a cardboard box with three cats in it, and five cats climb out and run all over the room. Which clearly means that, if two cats go back into the box, it will be empty.) Division by zero has to be interpreted in a particular way. Calculus lets us look at nonsensical concepts like "instantaneous rate of change", which can be interpreted as the rise over the run where the run has zero length. In that sense, "dividing by zero" is really "find the limit of dividing smaller and smaller rises by their correspondingly smaller and smaller runs", and is just as meaningful as any other form of limit-based calculation (eg that 0.999999... is equal to 1). Mathematicians define "equal" and "divide" and "zero" etc in ways that are meaningful, useful, and not always intuitive. In programming, we get to do the same thing. So what IS zero? What does it mean? *IT DEPENDS*. Sometimes it's a basis point (like midnight, or ice water). Sometimes it's a scalar (like "zero meters"). Sometimes it indicates an absence. And that's why naively and blindly using programming concepts will inevitably trip you up. Oh, if only Python gave us a way to define our own data types with our own meanings, and then instruct the language in how to interpret them as "true" or "false".... that would solve all these problems..... ChrisA

Christopher Barker wrote:
Just like length is. It's a basic concept and like __bool__ and __len__ it should be upon the objects to specify what empty means.
So explicitly specifying that you are looking for len(container) == 0 is more clear than isempty(container) would be, even if it did exist.
As written in another post here, `len(container) == 0` is on a lower abstraction level than `isempty(container)`.

On Tue, 24 Aug 2021 at 12:07, Tim Hoffmann via Python-ideas <python-ideas@python.org> wrote:
Just like length is. It's a basic concept and like __bool__ and __len__ it should be upon the objects to specify what empty means.
It feels like these arguments in the abstract are mostly going round in circles. It's possible something has been mentioned earlier in this thread, but I don't recall if so - but is there any actual real-world code that would be substantially improved if we had built into the language a protocol that users could override in their classes to explicitly define what "is empty" meant for that class? Some things to consider: 1. It would have to be a case where neither len(x) == 0 or bool(x) did the right thing. 2. We can discount classes that maliciously have bizarre behaviour, I'm asking for *real world* use cases. 3. It would need to have demonstrable benefits over a user-defined "isempty" function (see below). 4. Use cases that *don't* involve numpy/pandas would be ideal - the scientific/data science community have deliberately chosen to use container objects that are incompatible in many ways with "standard" containers. Those incompatibilities are deeply rooted in the needs and practices of that ecosystem, and frankly, anyone working with those objects should be both well aware of, and comfortable with, the amount of special-casing they need. To illustrate the third point, we can right now do the following: from functools import singledispatch @singledispatch def isempty(container): return len(container) == 0 # If you are particularly wedded to special methods, you could even do # # @singledispatch # def isempty(container): # if hasattr(container, "__isempty__"): # return container.__isempty() # return len(container) == 0 # # But frankly I think this is unnecessary. I may be in a minority here, though. @isempty.register def _(arr: numpy.ndarray): return len(arr.ravel()) == 0 So any protocol built into the language needs to be somehow better than that. If someone wanted to propose that the above (default) definition of isempty were added to the stdlib somewhere, so that people could register specialisations for their own code, then that might be more plausible - at least it wouldn't have to achieve the extremely high bar of usefulness to warrant being part of the language itself. I still don't think it's sufficiently useful to be worth having in the stdlib, but you're welcome to have a go at making the case... Paul

I also have the feeling that this is going round in circles. So let me get back to the core question: **How do you check if a container is empty?** IMHO the answer should not depend on the container. While emptiness may mean different things for different types. The check syntax can and should still be uniform. Not a solution: 0) The current `if not seq` syntax. "check Falsiness instead of emptiness" is a simplification, which is not always possible. Possible solutions: 1) Always use `if len(seq) == 0`. I think, this would works. But would we want to write that in PEP-8 instead of `if not seq`? To me, this feels a bit too low level. 2) A protocol would formalize that concept by building respective syntax into the language. But I concede that it may be overkill. 3) The simple solution would be to add `is_empty()` methods to all stdlib containers and encourage third party libs to adopt that convention as well. That would give a uniform syntax by convention. Reflecting the discussion in this thread, I now favor variant 3). Tim

Oh, if I'm going to be a smart-ass, I should probably remember that I need a `not` in there. No need to correct me, I saw it as soon as pressing send. Nonetheless, this is an unnecessary method or function. Truthiness is non-emptiness for most purposes. And where it's not, you need something more specialized to the purpose at hand. On Tue, Aug 24, 2021, 6:14 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:

On 8/24/21 3:03 PM, Tim Hoffmann via Python-ideas wrote:
**How do you check if a container is empty?**
IMHO the answer should not depend on the container.
I think this is the fly in the ointment -- just about everything, from len() to bool(), to add, to iter() /all/ depend on the container -- even equality depends on the container. `and`, `or`, and `not` partially depend on the container (via bool()). Only `is` is truly independent.
And since (3) is a method on the container, it absolutely "depends on the container". -- ~Ethan~

Ethan Furman wrote:
Sorry, I think you got me wrong: The meaning and thus implementation depends on the type (e.g. each container defines its own __len__()). However the syntax for querying length that is still uniformly `len(container)`. Likewise I'd like to have a uniform syntax for emptiness, and not different syntax for different types (`if not container` / `if len(array) == 0` / `if dataframe.empty`).

Hi Tim, I'm sorry if this has been brought up before, but *aside from PEP 8* is there anything wrong with using "if len(a)" for nonempty, or "if not len(a)" for empty? It would seem to work for numpy and pandas arrays, and it works for builtin sequences. Also, it has the advantage of being 100% backwards compatible. :-) Surely conforming to PEP 8 shouldn't need an addition to the language or stdlib? Or does it not work? On Tue, Aug 24, 2021 at 3:42 PM Tim Hoffmann via Python-ideas < python-ideas@python.org> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 2021-08-25 00:48, Guido van Rossum wrote:
What is the cost of 'len'? If it's always O(1), then it's not a problem, but if it's not O(1) (counting the items in a tree, for example) and you're not interested in how many items there are but only whether there's at least one, then...

I wanted to do a survey of various "aggregates" in Python to see if any stand out as making the usual `if stuff: ...` troublesome. I wrote a little script at https://github.com/DavidMertz/LanguagePractice/blob/main/python/aggregates.p... . I'm being deliberately vague about an "aggregate." It might not be a collection strictly speaking, but it is something that might seems to "contain" values in some sense. Basically, I think that standard library and built-in stuff behaves per PEP8. Some other libraries go their own way. I throw in a linked-list implementation I found on PyPI. I've never used it beyond this script; but per what it is, it cannot implement `len()` on O(1) (well, it *could* if it does extra bookkeeping; but then it's kinda a different thing). In NumPy land, the np.empty vs. np.zeros case is another oddball. On my system, my memory happened to have some prior value that wasn't zero; that could vary between runs, in principle. These are the results: Expr: '' | Value: '' Truth: False | Length: 0 Expr: list() | Value: [] Truth: False | Length: 0 Expr: tuple() | Value: () Truth: False | Length: 0 Expr: dict() | Value: {} Truth: False | Length: 0 Expr: set() | Value: set() Truth: False | Length: 0 Expr: bytearray() | Value: bytearray(b'') Truth: False | Length: 0 Expr: bytearray(1) | Value: bytearray(b'\x00') Truth: True | Length: 1 Expr: bytearray([0]) | Value: bytearray(b'\x00') Truth: True | Length: 1 Expr: array.array('i') | Value: array('i') Truth: False | Length: 0 Expr: array.array('i', []) | Value: array('i') Truth: False | Length: 0 Expr: Nothing() | Value: EmptyNamedTuple() Truth: False | Length: 0 Expr: deque() | Value: deque([]) Truth: False | Length: 0 Expr: deque([]) | Value: deque([]) Truth: False | Length: 0 Expr: ChainMap() | Value: ChainMap({}) Truth: False | Length: 0 Expr: queue.Queue() | Value: <queue.Queue object at 0x7f0940dd2190> Truth: True | Length: No length Expr: asyncio.Queue() | Value: <Queue at 0x7f0940dd2190 maxsize=0> Truth: True | Length: No length Expr: multiprocessing.Queue() | Value: <multiprocessing.queues.Queue object at 0x7f0940dd2190> Truth: True | Length: No length Expr: np.ndarray(1,) | Value: array([5.e-324]) Truth: True | Length: 1 Expr: np.ndarray((1,0)) | Value: array([], shape=(1, 0), dtype=float64) Truth: False | Length: 1 Expr: np.empty((1,)) | Value: array([5.e-324]) Truth: True | Length: 1 Expr: np.zeros((1,)) | Value: array([0.]) Truth: False | Length: 1 Expr: np.zeros((2,)) | Value: array([0., 0.]) Truth: No Truthiness | Length: 2 Expr: np.ones((1,)) | Value: array([1.]) Truth: True | Length: 1 Expr: np.ones((2,)) | Value: array([1., 1.]) Truth: No Truthiness | Length: 2 Expr: pd.Series() | Value: Series([], dtype: float64) Truth: No Truthiness | Length: 0 Expr: pd.DataFrame() | Value: Empty DataFrame Truth: No Truthiness | Length: 0 Expr: xr.DataArray() | Value: <xarray.DataArray ()> Truth: True | Length: No length Expr: linkedadt.LinkedList() | Value: <linkedadt.LinkedList object at 0x7f08d8d77f40> Truth: False | Length: 0 -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

It seems the conversation has confused two related concepts: 1) The default bool() implementation (Truthiness) -- this is what the OP said was recommended by PEP 8: "For sequences, (strings, lists, tuples), use the fact that empty sequences are false:" -- there is some debate about that whether this is a good recommendation or not, but i don't think that's the OPs point. Rather: 2) That there is no standard way to explicitly test containers for "emptiness" - - there is only length, with the assumption that len(something) == 0 or not len(something) is a good way to test for emptiness. I still don't see why length is not perfectly adequate, but i wonder if there are any "containers" i.e.things that could be empty, in the std lib that don't support length. Looking in the ABCs, a Container is something that supports `in`, and a Sized is something that supports len()-- so in theory, there could be a Container that does not have a length. Are there any in the std lib? Perhaps the ABCs are instructive in another way here -- if we were to add a __empty__ or some such dunder, what ABC would require it? Container? or yet another one-method ABC? -CHB On Tue, Aug 24, 2021 at 8:39 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

“Container” is a kind of pun, it’s something with a __contains__ method. The thing you’re looking for is “Collection”, which is the base for sequences, mappings and sets. I also note that the discussion seems quite stuck. —Guido On Tue, Aug 24, 2021 at 21:55 Christopher Barker <pythonchb@gmail.com> wrote:
-- --Guido (mobile)

On Tue, Aug 24, 2021 at 10:12 PM Guido van Rossum <guido@python.org> wrote:
“Container” is a kind of pun, it’s something with a __contains__ method. The thing you’re looking for is “Collection”.
Hmm, perhaps we should tweak the docs, the section is titled: "Abstract Base Classes for Containers" But yes, Collection is what I (and probably the OP) was looking for, in which case, all Collections support len(), so the way to explicitly check if they are empty is len(). Nothing to be done here. I also note that the discussion seems quite stuck.
indeed. I'm done :-) -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, Aug 24, 2021 at 7:23 PM MRAB <python@mrabarnett.plus.com> wrote:
It's a pretty universal assumption that len() is O(1) -- something that doesn't do that probably shouldn't implement __len__(). (And yeah, there's probably some tree package around that does implement an O(N) __len__(). People do a lot of silly things though, we can't handle *everything*.)
It was pointed out to me that numpy allows arrays that have no elements but a nonzero first dimension. People could disagree about whether that should be considered empty. I'm not sure about Pandas, but IIRC a Dataframe is always a table of rows, with all rows having the same number of columns. Here I'd say that if there's at least one row in the table, I'd call it non-empty, even if the rows have no columns. This conforms to the emptiness of [()]. It's possible that there's a common use case in the data science world where this should be counted as empty, but to me, that would be inconsistent -- a row with zero columns is still a row. (For numpy arrays my intuition is less clear, since there's more symmetry between the dimensions.) So then the next question is, what's the use case? What code are people writing that may receive either a stdlib container or a numpy array, and which needs to do something special if there are no elements? Maybe computing the average? AFAICT Tim Hoffman (the OP) never said. PS. Why is anyone thinking that an array containing all zeros (and at least one zero) might be considered empty? That seems a totally different kind of test. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Tue, Aug 24, 2021 at 9:50 PM Guido van Rossum <guido@python.org> wrote:
indeed -- you can kinda-sorta map an array to nested lists, e.g, is : [[],[],[],[]] an empty list? It has a length of 4, and so does a (4, 0) sized numpy array. numpy is actually mostly consistent with the std lib in that regard.
I think there may have been confusion with boolean Falsiness indicating emptiness, as it does for Sequences. An array containing all zeros might well be considered False, though not empty. Of course, that's why numpy arrays raise a DeprecationWarning when used that way. Another issue with numpy arrays is that they are non resizable, so needing to check if one is empty is pretty rare -- there is no way that everything could have been removed, or nothing put in in the first place -- the emptiness could have, and likely would have been checked before it was created. -CHB
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Guido van Rossum wrote:
There's two parts to the answer: 1) There functions e.g. in scipy and matplotlib that accept both numpy arrays and lists of flows. Speaking from matplotlib experience: While eventually we coerce that data to a uniform internal format, there are cases in which we need to keep the original data and only convert on a lower internal level. We often can return early in a function if there is no data, which is where the emptiness check comes in. We have to take extra care to not do the PEP-8 recommended emptiness check using `if not data`. 2) Even for cases that cannot have different types in the same code, it is unsatisfactory that I have to write `if not seq` but `if len(array) == 0` depending on the expected data. IMHO whatever the recommended syntax for emptiness checking is, it should be the same for lists and arrays and dataframes.

On Wed, 25 Aug 2021 at 14:13, Tim Hoffmann via Python-ideas <python-ideas@python.org> wrote:
You don't. You can write a local isempty() function in matplotlib, and add a requirement *in your own style guide* that all emptiness checks use this function. Why do people think that they can't write project-specific style guides, and everything must be in PEP 8? That baffles me.
2) Even for cases that cannot have different types in the same code, it is unsatisfactory that I have to write `if not seq` but `if len(array) == 0` depending on the expected data. IMHO whatever the recommended syntax for emptiness checking is, it should be the same for lists and arrays and dataframes.
You don't have to do any such thing. You can just write "if len(x) == 0" everywhere. No-one is *making* you write "if not seq". All I can see here is people making problems for themselves that they don't need to. Sorry if that's not how it appears to you, but I'm genuinely struggling to see why this is an issue that can't be solved by individual projects/users. The only exception I can see is the question "what's the best way to suggest to newcomers?" but that's more of a tutorial/documentation question, than one of standardisation, style guides or language features. Paul

I agree with Guido. The only problem here is third-party libraries that don't use bool() to indicate emptiness. If you need to support those, use len(). But this doesn't mean a change to the standard library, because those third-party libraries are, well, third-party. We don't need a more explicit way to specify emptiness. bool(seq) is fine. On Wed, Aug 25, 2021, 8:05 AM Guido van Rossum <guido@python.org> wrote:

Ok, I have no problem ignoring PEP-8 where it's not applicable. I've brought this topic up, because I thought there could be a improvement either in PEP-8 and/or by adding something to the language. I still do think that, but I accept that the problem it solves is not considered relevant enough here to take further action. Anyway, thanks all for the discussion! Tim

On Tue, 24 Aug 2021 at 23:06, Tim Hoffmann via Python-ideas <python-ideas@python.org> wrote:
I will note that if we take things to extremes, that constraint simply cannot be adhered to - types can define bool, len, and any other special method we care to invent, however they like. With that in mind, I'd argue that if a collection defines bool and len in such a way that "not bool(c)" or "len(c) == 0" doesn't mean "c is empty", then it is a special case, and has deliberately chosen to be a special case. Yes, I know that makes numpy arrays and Pandas dataframes special cases. As I said, they have deliberately chosen to not follow normal conventions. Take it up with them if you care to. (IMO there's no point - they have reasonable justifications for their choices, and it's too late to change anyway). Based on the "obvious" intent of the classes in collections.abc, I'd say that if you test "len(c) == 0" then you can reasonably say that you cover all collections. If you want to support the weird multi-dimensional zero-sized numpy arrays that have len != 0, then special case them. But frankly, I'd wait until a user comes up with a reason why you need to support them, who can tell you what they expect your code to *do* with them in the first place... Again, practical use cases rather than abstract questions. Paul

Paul Moore wrote:
I can agree that "len(c) == 0" should mean "c is empty" for all practical purposes (and numpy and pandas conform with that). But "bool(c)" should mean "c is empty" is an arbitrary additional constraint. For some types (e.g. numpy, padas) that cannot be fulfilled in a logically consisent way. By requiring that additional constraint, the Python languages forces these containers to be a special case. To sum things up, I've again written the Premise: It is valuable for the language and ecosystem to not have this special casing. I.e. I don't want to write `if not seq` on the one hand and `if len(array) == 0`. The syntax for checking if a list or an array is empty should be the same. Conclusion: If so, PEP-8 has to be changed because "For sequences, (strings, lists, tuples), use the fact that empty sequences are false:" is not a universal solution. The question then is, what other syntax to use for an emptiness check. Possible solutions: 1) The length check is a possible solution: "For sequences, (strings, lists, tuples), test emptiness by `if len(seq) == 0`. N.b. I think this is more readable than the variant `if not len(seq)` (but that could be discussed as well). 2) The further question is: If we change the PEP 8 recommendation anyway, can we do better than the length check? IMHO a length check is semantically on a lower level than an empty-check. Counting elements is a more detailed operation than only checking if we have any element. That detail is not needed and distracting if we are only interested in is_empty. This is vaguely similar to iterating over indices (`for i in range(len(users))`) vs. iterating over elements (`for user in users`). We don't iterate over indices because that's usually a detail we don't need. So one can argue we shouldn't count elements if we only wan to know if there are any. If we come to the conclusion that an explicit empty check is better than a length=0 check, there are again different ways how that could be implmented again. My favorite solution now would be adding `is_empty()` methods to all standard containers and encourage numpy and pandas to add these methods as well. (Alternatively an empty protocol would be a more formal solution to an explicit check).

On Wed, 25 Aug 2021 at 14:00, Tim Hoffmann via Python-ideas <python-ideas@python.org> wrote:
I can agree that "len(c) == 0" should mean "c is empty" for all practical purposes (and numpy and pandas conform with that).
OK
But "bool(c)" should mean "c is empty" is an arbitrary additional constraint. For some types (e.g. numpy, padas) that cannot be fulfilled in a logically consisent way. By requiring that additional constraint, the Python languages forces these containers to be a special case.
It's another way of expressing the intent, which is potentially useful if you don't care about those special cases. We're all consenting adults here, there's no "constraint" involved.
The *language* (by which I assume you mean "Python") doesn't have any special casing. It just has behaviour. It's valuable for educating beginners, maybe, to have an easily expressed way of checking for emptiness, but for "the language and ecosystem"? I don't think so - it's valuable to allow people to express their intent in the way that is most natural to them, IMO.
Conclusion: If so, PEP-8 has to be changed because "For sequences, (strings, lists, tuples), use the fact that empty sequences are false:" is not a universal solution.
"Has to be" is extremely strong here. PEP 8 is a set of *guidelines* that people should use with judgement and thought, not a set of rules to be slavishly followed. And in fact, I'd argue that describing a numpy array or a Pandas dataframe as a "sequence" is pretty inaccurate anyway, so assuming that the statement "use the fact that empty sequences are false" applies is fairly naive anyway. But if someone wants to alter PEP 8 to suggest using len() instead, I'm not going to argue, I *would* get cross, though, if the various PEP 8 inspired linters started complaining when I used "if seq" to test sequences for emptiness.
If someone is reading PEP 8 and can't make their own choice between "if len(seq) == 0" and "if not len(seq)" then they should go and read a Python tutorial, not expect the style guide to tell them what to do :-(
2) The further question is: If we change the PEP 8 recommendation anyway, can we do better than the length check? IMHO a length check is semantically on a lower level than an empty-check. Counting elements is a more detailed operation than only checking if we have any element. That detail is not needed and distracting if we are only interested in is_empty. This is vaguely similar to iterating over indices (`for i in range(len(users))`) vs. iterating over elements (`for user in users`). We don't iterate over indices because that's usually a detail we don't need. So one can argue we shouldn't count elements if we only wan to know if there are any.
Not without either changing the language/stdlib or recommending a user-defined function or 3rd party library. To put that another way, the current Python language and stdlib doesn't currently have (or need, IMO) a better way of checking for emptiness. If you want to argue that it needs one, you need an argument far better than "to have something to put in PEP 8".
If we come to the conclusion that an explicit empty check is better than a length=0 check, there are again different ways how that could be implmented again. My favorite solution now would be adding `is_empty()` methods to all standard containers and encourage numpy and pandas to add these methods as well. (Alternatively an empty protocol would be a more formal solution to an explicit check).
Well, that's "if false then ..." in my opinion. I.e., I don't think you've successfully argued that an explicit empty check is better, and I don't think you have a hope of doing so if your only justification is "so we can recommend it in PEP 8". Paul

On Mon, 2021-08-23 at 13:51 +0000, Thomas Grainger wrote:
In all seriousness this is an actual problem with numpy/pandas arrays where:
<snip>
It must be undefined because the operators are elementwise operators and the concept of non-emptiness being True does not make sense for elementwise containers. arr = np.arange(5) if arr < 0: arr *= -1 is code that makes sense if `arr` was a number, but it is not meaningful for arrays. The second, distinct, problem is that `len` is non-obvious for many N-D containers: arr = np.ones(shape=(5, 0)) assert len(arr) == 5 assert arr.size == 0 # it is empty. NumPy breaks the contract that `len` is already the same as "size" [1]. (And we are stuck with it probably...) So the length definition of truth only works out for Python containers because `container == 0` is always obviously `False` already and you never have the `len != size` problem. An argument that I will bring is that the bigger problem for arrays may be that we don't have a concept for "elementwise container" (or maybe higher dimensional container, but I think the elementwise is the important distinction). A "size" protocol would be useful to deal with NumPy's choice of `len`! But, an "has elementwise operations" protocol may be more generally useful to code dealing with a mix of NumPy arrays or Python sequences – and even NumPy itself. (E.g. it also tells you that `+` will not concatenate and it could tell NumPy whether it should try coercing to an array or not.) Cheers, Sebastian [1] I will not argue that this is the best way to define it, I don't like the list-of-lists analogy, so I think that `len(arr) == arr.size` and `arr.__iter__` iterating all elements would be a better definition. (Making the notion of "length" equivalent to "size".) Or even refusing `len` unless 1-D! That would make everyone who argues to use `len()` always correct, or at least never incorrect. But it is simply not what we got...

On Mon, Aug 23, 2021 at 01:51:17PM -0000, Thomas Grainger wrote:
In all seriousness this is an actual problem with numpy/pandas arrays where:
Indeed. Third-party classes can do anything they like. It is a shame that numpy's API made the decision that they did, but I don't think it is very common to use numpy arrays in a context where they are expected to duck-type as collections. -- Steve

Steven D'Aprano wrote:
I don't think it is very common to use numpy arrays in a context where they are expected to duck-type as collections.
Maybe not "numpy arrays duck-type as collections", but it is very common that arrays and sequences are used interchangably. Numpy has created the term "array-like" for this. Well technically, an array-like is something that `np.array()` can turn into an array. But from a user point of view array-like effectively means array or sequence. If you want to write a function that accepts array-like `values`, you have to change a check `if values` to `if len(values) == 0`. That works for both but is against the PEP8 recommendation. This is a shortcoming of the language.

On 8/23/21 1:15 PM, Tim Hoffmann via Python-ideas wrote:
Numpy is not Python, but a specialist third-party package that has made specialist choices about basic operations -- that does not sound like a shortcoming of the language. It seems to me that the appropriate fix is for numpy to have an "is_empty()" function that knows how to deal with arrays and array-like structures, not force every container to grow a new method. -- ~Ethan~

Ethan Furman wrote:
It seems to me that the appropriate fix is for numpy to have an "is_empty()" function that knows how to deal with arrays and array-like structures, not force every container to grow a new method.
Yes, numpy could and probably should have an "is_empty()" method. However, defining a method downstream breaks duck typing and maybe even more important authors have to mentally switch between the two empty-check variants `if users` and `if users.is_empty()` depending on the context. Ethan Furman wrote:
The "specialist choices" ``if len(values) == 0` in Numpy are the best you can do within the capabilities of the Python language if you want the code to function with lists and arrays. For Numpy to do better Python would need to either provide the above mentioned "has element-wise operations" protocol or an is_empty protocol. I consider emptiness-check a basic concept that should be consistent and easy to use across containers. Tim

On 8/23/21 2:31 PM, Tim Hoffmann via Python-ideas wrote:
Ethan Furman wrote:
The context being whether or not you working with numpy? Is there generic code that works with both numpy arrays and other non-numpy data? Do we have any non-numpy examples of this problem?
Python has an emptiness-check and numpy chose to repurpose it -- that is not Python's problem nor a shortcoming in Python. Suppose we add an `.is_empty()` method, and five years down the road another library repurposes that method, that other library then becomes popular, and we then have two "emptiness" checks that are no longer consistent -- do we then add a third? -- ~Ethan~

Ethan Furman wrote:
E.g. SciPy and Matplotlib accept array-like inputs such as lists of numbers. They often convert internally to numpy arrays, but speaking for the case of Matplotlib, we sometimes need to preseve the original data and/or delay conversion, so we have to take extra care when checking inputs. Pandas has the same Problem. "The truth value of a DataFrame is ambiguous". They have introduced a `DataFrame.empty` property. I suspect the same problem exists for other types like xarray or tensorflow tensors, but I did not check.
Python has an emptiness-check and numpy chose to repurpose it -- that is not Python's problem nor a shortcoming in Python.
Python does not have an emptiness-check. Empty containers map to False and we suggest in PEP8 to use the False-check as a stand in for the emptiness-check. But logically falsiness and emptiness are two separate things. Numpy (IMHO with good justification) repurposed the False-check, but that left them without a standard emptiness check.
I don't think this is a valid argument. We would introduce the concept of emptiness. That can happen only once. What empty means for an object is determined by the respective library and implemented there, similar to what `len` means. There can't be any inconsistency.

On 24Aug2021 06:55, tim.hoffmann@mailbox.org <tim.hoffmann@mailbox.org> wrote:
In my mind PEP8 says that the emptiness check should be expressed as a False-check for Pythonic containers. That is a good thing to me. It does not mean we've no notion of emptiness, it says that if you've got something which can be empty it should be possible to check that by doing a False-check. And _that_ implies that for containers, the False-check _is_ emptiness. Defined, to me. Numpy has made a different decision, probably because a heap of operators on their arrays map the operator onto the array elements. To them, that is useful enough to lose some Python-idiom coherence for the wider gain _in that special domain_. If we're chasing rough edges, consider queue.Queue:
I would often like to treat Queues as a container of queued items, basicly because I'd like to be able to probe for the presence of queued items via the emptiness idiom. But I can't. It does has a .empty() method. I don't even know what my point is here :-( But I am definitely -1 on weaking the bool(container) idiom as a test for empty/nonempty, and also for asking every container implementation on the planet to grow a new is_empty() method. Cheers, Cameron Simpson <cs@cskk.id.au>

On 21.08.2021 23:33, Tim Hoffmann via Python-ideas wrote:
I assume your function would first check that the argument is a sequence and then check its length to determine emptiness. That doesn't strike me as more explicit. It's just shorter than first doing the type check and then testing the length. For the method case, it's completely up to the object to define what "empty" means, e.g. could be a car object which is fully fueled but doesn't have passengers. That's very flexible, but also requires all sequences to play along, which is hard. When you write "if not seq: ..." in a Python application, you already assume that seq is a sequence, so the type check is implicit (you can make it explicit by adding a type annotation and applying a type checked; either static or dynamic) and you can assume that seq is empty if the boolean test returns False. Now, you can easily add a helper function which implements your notion of "emptiness" to your applications. The question is: would it make sense to add this as a builtin. My take on this is: not really, since it just adds a type check and not much else. This is not enough to warrant the added complexity for people learning Python. You may want to propose adding a new operator.is_empty() function which does this, though. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Aug 24 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

On Sun, Aug 22, 2021 at 10:28 PM Tim Hoffmann via Python-ideas <python-ideas@python.org> wrote:
I don't see that this gives anything above len(seq). Taking an example from the video you linked to: def unfriend(subject, users): if not users: return remove_edges('friend', subject, users) "What is the type of users? The only hint you are given is that it is used in an 'if' statement." Not true. You also have the fact that the name "users" is a plural. Based on that, and that alone, I would assume that it is some sort of collection. And then he goes on to say that it could be the integer 12. Okay, sure. But if you care about that distinction, "if len(users):" is absolutely fine here, and proves that your function will break if given an integer number of users rather than a collection of them. In Python, isempty(x) is spelled bool(x). That's simply what it means. I ask you, for these types, what should isempty do, what does bool do, and what does len do? * datetime.timedelta * range * slice * an SQL query that has yet to be executed * an SQL result set Under what circumstances would bool(x) differ from isempty(x)? Under what circumstances should the distinction be made? And when it should be made, would bool(len(x)) be different from isempty(x)? To be quite honest, my usual answer to "What is the type of X?" is "I don't care" (or "I don't care, as long as it is <quality>" eg iterable or subscriptable etc). The unfriend function shouldn't need to care what kind of thing it's been given, but if it does, Python has these things called type hints that can provide extra information to a static analyzer (and also to a human who's reading the code). But static analysis is often smart enough to not need them - and, quite frankly, static analysis is pretty near to magic with its ability to sniff out problems that programmers wouldn't even have thought to check for. (I've seen some Coverity reports and been pretty astonished at its detail.) What's the problem being solved by isempty? Are there any situations that couldn't be solved by either running a type checker, or by using len instead of bool? ChrisA

I agree that determining the type is possible most of the time, either by type hints or a static analyzer. Using len is possible, with the limitation that you need a full `len(x) == 0` for numpy arrays (see discussion above). The type aspect was emphasized in the video. I'm not too worried about that explicitly. The video was more of a starting point for me to reconsider the ideom `if not users`. My conclusion (and thus proposal) differs from the video. On a technical level, everything can be solved with the current language capabilities. The main advantage is clearer semantics (explicit is better / readability counts): - Re bool: As experienced python users we are used to translate `if not users` to "if users is empty" or "if we have no users", but it *is* less explicit than `if users.is_empty()`. - Re len: `if not len(users)` or `if len(users) == 0` is more explicit, but its semantically on a lower level. Counting elements is a more detailed operation than only checking if we have any element. That detail is not needed and distracting if we are only interested in is_empty. This is vaguely similar to iterating over indices (`for i in range(len(users))`) vs. iterating over elements (`for user in users`). We don't iterate over indices because that's usually a detail we don't need. I acknowledge that discussing readability can be vague and subjective, not least because we are used to the current ideoms. I'm also aware that we should be very conservative on adding new API, in particular if it's not technically necessary. However, Python makes a strong point on readability, which IMHO is one of the major reasons for its success. I belive that in that context adding is_empty syntactic sugar would be a clear improvement.

On 8/23/21 2:06 PM, Tim Hoffmann via Python-ideas wrote:
On a technical level, everything can be solved with the current language capabilities. The main advantage is clearer semantics (explicit is better / readability counts)
Note the explicit and readability are often at odds with each other. -- ~Ethan~

On 2021-08-23 at 21:06:46 -0000, Tim Hoffmann via Python-ideas <python-ideas@python.org> wrote:
I mentally translate "if not users" to "if there are not users" or "if there are no users." Whether users is a list (or some other sequence), a dict (or some other mapping), a set, or even some oddball collection type (e.g., the responses from a database query) is an implementation detail about which I don't care at that point.
Exactly. Asking whether a collection contains an element is on a slightly lower level than asking whether or not "there are any [Xs]."

Quoting the subject line: "We should have an explicit concept of emptiness for collections" We do. It's spelled: len(collection) == 0 You can't get more explicit than that. -- Steve

On 2021-08-22 17:36, Thomas Grainger wrote:
bool((len(collection) == 0) is True) == True and issubclass(True, bool)
'True' is a reserved word, so you don't need to check it. However, 'bool' might have been overridden, so: __builtins__.bool((len(collection) == 0) is True) == True Come to think of it, 'len' might have been overridden too, so: __builtins__.bool((__builtins__.len(collection) == 0) is True) == True

On Sun, Aug 22, 2021 at 07:01:28PM +0300, Serhiy Storchaka wrote:
(len(collection) == 0) is True
Ha ha, yes, very good, you got me. But the trouble is, if you don't trust the truth value of the predicate, it is hard to know when to stop: len(collection) == 0 (len(collection) == 0) is True ((len(collection) == 0) is True) is True (((len(collection) == 0) is True) is True) is True ((((len(collection) == 0) is True) is True)) is True # ... *wink* MRAB and Ricky: `__builtins__` is a CPython implementation detail and is reserved for the interpreter's private use. Other implementations may not even have it. The right way to write your code should be import builtins builtins.bool((builtins.len(collection) == 0) is True) is True -- Steve

Everyone in this thread should absolutely read Lewis Caroll's delightful and "What the Tortoise Said to Achilles." It's a very short 3-page story that addressed exactly this topic in 1895... even before Guido's Time Machine. One free copy of the public domain work is at: https://wmpeople.wm.edu/asset/index/cvance/Carroll On Sun, Aug 22, 2021 at 8:30 PM Steven D'Aprano <steve@pearwood.info> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Mon, Aug 23, 2021 at 12:13 AM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
That he's mad, 'tis true, 'tis true 'tis pity, And pity 'tis, 'tis true -- Hamlet, Act 2, Scene 2 --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

In all seriousness this is an actual problem with numpy/pandas arrays where: ``` Python 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.
eg https://pandas.pydata.org/pandas-docs/version/1.3.0/user_guide/gotchas.html#using-if-truth-statements-with-pandas
> Should it be True because it’s not zero-length, or False because there are False values? It is unclear, so instead, pandas raises a ValueError:
I'm not sure I believe the author here - I think it's clear. It should be True because it's not zero-length.

here's another fun one "A False midnight": https://lwn.net/Articles/590299/ https://bugs.python.org/issue13936#msg212771

On Mon, Aug 23, 2021 at 11:56 PM Thomas Grainger <tagrain@gmail.com> wrote:
here's another fun one "A False midnight": https://lwn.net/Articles/590299/ https://bugs.python.org/issue13936#msg212771
That was a consequence of a time value being an integer, and thus zero (midnight) was false. It was changed, but - as is fitting for a well-used language - backward compatibility was important. Modern versions of Python don't have that problem. ChrisA

On Mon, Aug 23, 2021 at 6:54 AM Thomas Grainger <tagrain@gmail.com> wrote:
This is a great example of the problem of the assumption of zero as representing false. I’ve always been ambivalent about Python’s concept of Truthiness (“something or nothing”). If I were to write my own language, I would probably require a actual Boolean for, eg, an if statement. The fact is that what defines falsiness is use case dependent. Numbers are the best example, zero. A often be a perfectly meaningful number. Which id why Brandon suggested that testing the length of a sequence was a good way to be explicit about what you mean by false in a particular context. But I see no reason to add a standardized way to check for an empty container- again “emptiness” may not be obviously defined either. Numpy arrays, (or Pandas Dataframes) are a good example here — there are more than one way to think of them as false - but maybe more than one way to think of them as empty too: Is a shape () (scalar) array empty ? Or shape (100, 0) ? Or shape (0, 100) Or shape (0,0,0,0) Or a rank 1 array that’s all zeros? Or all false? Anyway, the point is that “emptiness” may be situation specific, just like truthiness is. So explicitly specifying that you are looking for len(container) == 0 is more clear than isempty(container) would be, even if it did exist. With duck typing, you may well not know what type you are dealing with, but you absolutely need to know how you expect that object to behave in the context of your code. So if having a zero length is meaningful in your code — then only objects with a length will work, which is just fine. -CHB _______________________________________________
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, Aug 24, 2021 at 3:27 PM Christopher Barker <pythonchb@gmail.com> wrote:
More a problem with the assumption that midnight is zero. There's nothing at all wrong with zero being false - if, for instance, you're looking at a timedelta, a width of zero seconds is an empty time interval, and that can indeed be false. (Consider: What is the overlap between segment X and segment Y? Find the intersection between them; if the intersection has zero width, there is no overlap and the two time periods do not collide.)
I’ve always been ambivalent about Python’s concept of Truthiness (“something or nothing”). If I were to write my own language, I would probably require a actual Boolean for, eg, an if statement.
That becomes very frustrating. Before you design your own language, I strongly recommend trying out REXX, C, Python, LPC or Pike, JavaScript, and SourcePawn, just because they have such distinctly different type systems. (Obviously you'll have tried some of those already, but try to complete the set.) And by "try out", I mean spend a good amount of time coding in them. Get to know what's frustrating about them. For instance, what does "if 0.0" mean? (Python: False. C: False. JavaScript: False. Pike: True. REXX: Error. SourcePawn: False, but can become True if the tag changes. SourcePawn doesn't have types, it has tags.) Similarly, what is the truth value of an empty array (or equivalent in each language? (Python: False. C: True. JavaScript: True. REXX: Concept does not exist. SourcePawn: Error. Pike: True.) Not one of these languages is fundamentally *wrong*, but they disagree on what logical choices to make. (Yes, I'm being fairly generous towards SourcePawn here. Truth be told, it sucks.) If you're going to make changes from the way Python does things, be sure to have tried a language that already works that way, and see what the consequences are.
The fact is that what defines falsiness is use case dependent. Numbers are the best example, zero. A often be a perfectly meaningful number.
Of course it's a perfectly meaningful number, but you won't be saying "if x:" to figure out whether x is a meaningful number. The question is, what *else* does it mean? For instance, if you're asking "where in this string does the letter w first occur?", then 0 is a perfectly meaningful result, indicating that it's found at the very start of the string. But that's not about whether it's a number; it's whether it's a string index.
Which id why Brandon suggested that testing the length of a sequence was a good way to be explicit about what you mean by false in a particular context.
When people say "explicit is better than implicit", they usually mean "code I like is better than code I don't like". And then they get (somewhat rightly) lampooned by people like Steven who interpret "explicit" to mean "using more words to mean nothing", which clearly violates the Zen. What ARE you being explicit about? Are you stating the concept "check if there are any users", or are you stating the concept "take the list of users, count how many people there are in it, and check if that number is greater than zero"? The first one is written "if users:" and the second "if len(users) > 0:". They are BOTH explicit, but they are stating different things. The point of a high level programming language is that we can express abstract concepts. We don't have to hold the interpreter's hand and say "to figure out how many people are in the list, take the past-end-of-list pointer, subtract the list base pointer, and divide by the size of a list element". We say "give me the length of the list". And one huge advantage is that the interpreter is free to give you that length in any way it likes (directly storing the length, using pointer arithmetic, or even iterating over a sparse list and counting the slots that are occupied). Only be "more explicit" (in the sense of using lower-level constructs) if you need to be.
So explicitly specifying that you are looking for len(container) == 0 is more clear than isempty(container) would be, even if it did exist.
I'm not sure that that's any better. One is asking "how much stuff is in the container? Zero items?" and the other is asking "is this container empty?". They're different concepts. They will (usually) give the same result, but conceptually they're different, and it's not a scale of "more explicit" to "less explicit".
With duck typing, you may well not know what type you are dealing with, but you absolutely need to know how you expect that object to behave in the context of your code. So if having a zero length is meaningful in your code — then only objects with a length will work, which is just fine.
Yes. And it should make perfect sense to write something like this: if stuff: print("My stuff:") for thing in stuff: print(thing) where you omit the header if there are no elements to iterate over. What is the type of "stuff"? Do you care? Well, it can't be a generator, since those will always be true; but it can be any sort of collection. Be explicit about what you care about. Don't shackle the code with unnecessary constraints, and don't hand-hold the interpreter unnecessarily. ChrisA

On Mon, Aug 23, 2021 at 10:26:47PM -0700, Christopher Barker wrote:
That's not really a problem with zero representing false. Its a problem with representing time of day as a number where midnight is zero :-) *Durations* can be represented satisfactorily as a number (at least I can't think of any problems off the top of my head) but wall times (the number you see when you look at a clock) aren't really *numbers* in any real sense. You can't say "3:15pm times 2 is 6:30pm", 1:00am is not the multiplicative identity element and midnight is not the annihilating element (zero). We conventionally represent clock times as numbers, but they're more akin to ordinal data. They have an order, but you can't do arithmetic on them.
"Please sir, can we have some more Pascal" *wink* How about short-circuiting `or` and `and` operators? I'm not judging, just commenting. It surprises me that people who are extremely comfortable with duck-typing pretty much any other data type often draw the line at duck-typing bools.
The fact is that what defines falsiness is use case dependent. Numbers are the best example, zero. A often be a perfectly meaningful number.
Fun fact: not only did the ancient Greek mathematicians not count zero as a number, but the Pythagoreans didn't count 1 as a number either. https://www.britannica.com/topic/number-symbolism/Pythagoreanism Zero is a perfectly meaningful number, but it is special: it is the identity element for addition and subtraction, and the annihilating element for multiplication, and it has no inverse in the Reals. Even in number systems which allow division by zero, you still end up with weird shit like 1/0 = 2/0 = 3/0 ... -- Steve

On Tue, Aug 24, 2021 at 4:31 PM Steven D'Aprano <steve@pearwood.info> wrote:
... midnight is not the annihilating element (zero).
Unless you're Cinderella, of course.
Which puts them in a similar category to degrees Celsius - you can compare them, but there's no fundamental zero point. (It makes sense to say "what would the temperature be if it were 3° higher", but not "what is double this temperature" - not in Celsius.)
To be fair, that's more a question of "what is a number". If you have "a thing", is that "a number of things", or are they separate concepts? We have settled on the idea that "a thing" is a special case of "a number of things" mathematically, but try going to someone and saying "I have a number of wives" and seeing whether they think that 1 is a number. (And if zero is a number, then I also have a number of wives. Oh, and I have a number of pet cobras, too, so don't trespass on my property.)
Numbers in general are useful concepts that help us with real-world problems, but it's really hard to pin them down. You can easily explain what "three apples" looks like, and you can show what "ten apples" looks like, and from that, you can intuit that the difference between them is the addition or removal of "seven apples". Generalizing that gives you a concept of numbers. But what is the operation that takes you from "three apples" to "three apples"? Well, obviously, it was removing zero elephants, how are you so dimwitted as to have not figured that out?!? In that sense, zero is very special, as it represents NOT doing anything, the LACK of a transition. (Negative numbers are a lot weirder. I have a cardboard box with three cats in it, and five cats climb out and run all over the room. Which clearly means that, if two cats go back into the box, it will be empty.) Division by zero has to be interpreted in a particular way. Calculus lets us look at nonsensical concepts like "instantaneous rate of change", which can be interpreted as the rise over the run where the run has zero length. In that sense, "dividing by zero" is really "find the limit of dividing smaller and smaller rises by their correspondingly smaller and smaller runs", and is just as meaningful as any other form of limit-based calculation (eg that 0.999999... is equal to 1). Mathematicians define "equal" and "divide" and "zero" etc in ways that are meaningful, useful, and not always intuitive. In programming, we get to do the same thing. So what IS zero? What does it mean? *IT DEPENDS*. Sometimes it's a basis point (like midnight, or ice water). Sometimes it's a scalar (like "zero meters"). Sometimes it indicates an absence. And that's why naively and blindly using programming concepts will inevitably trip you up. Oh, if only Python gave us a way to define our own data types with our own meanings, and then instruct the language in how to interpret them as "true" or "false".... that would solve all these problems..... ChrisA

Christopher Barker wrote:
Just like length is. It's a basic concept and like __bool__ and __len__ it should be upon the objects to specify what empty means.
So explicitly specifying that you are looking for len(container) == 0 is more clear than isempty(container) would be, even if it did exist.
As written in another post here, `len(container) == 0` is on a lower abstraction level than `isempty(container)`.

On Tue, 24 Aug 2021 at 12:07, Tim Hoffmann via Python-ideas <python-ideas@python.org> wrote:
Just like length is. It's a basic concept and like __bool__ and __len__ it should be upon the objects to specify what empty means.
It feels like these arguments in the abstract are mostly going round in circles. It's possible something has been mentioned earlier in this thread, but I don't recall if so - but is there any actual real-world code that would be substantially improved if we had built into the language a protocol that users could override in their classes to explicitly define what "is empty" meant for that class? Some things to consider: 1. It would have to be a case where neither len(x) == 0 or bool(x) did the right thing. 2. We can discount classes that maliciously have bizarre behaviour, I'm asking for *real world* use cases. 3. It would need to have demonstrable benefits over a user-defined "isempty" function (see below). 4. Use cases that *don't* involve numpy/pandas would be ideal - the scientific/data science community have deliberately chosen to use container objects that are incompatible in many ways with "standard" containers. Those incompatibilities are deeply rooted in the needs and practices of that ecosystem, and frankly, anyone working with those objects should be both well aware of, and comfortable with, the amount of special-casing they need. To illustrate the third point, we can right now do the following: from functools import singledispatch @singledispatch def isempty(container): return len(container) == 0 # If you are particularly wedded to special methods, you could even do # # @singledispatch # def isempty(container): # if hasattr(container, "__isempty__"): # return container.__isempty() # return len(container) == 0 # # But frankly I think this is unnecessary. I may be in a minority here, though. @isempty.register def _(arr: numpy.ndarray): return len(arr.ravel()) == 0 So any protocol built into the language needs to be somehow better than that. If someone wanted to propose that the above (default) definition of isempty were added to the stdlib somewhere, so that people could register specialisations for their own code, then that might be more plausible - at least it wouldn't have to achieve the extremely high bar of usefulness to warrant being part of the language itself. I still don't think it's sufficiently useful to be worth having in the stdlib, but you're welcome to have a go at making the case... Paul

I also have the feeling that this is going round in circles. So let me get back to the core question: **How do you check if a container is empty?** IMHO the answer should not depend on the container. While emptiness may mean different things for different types. The check syntax can and should still be uniform. Not a solution: 0) The current `if not seq` syntax. "check Falsiness instead of emptiness" is a simplification, which is not always possible. Possible solutions: 1) Always use `if len(seq) == 0`. I think, this would works. But would we want to write that in PEP-8 instead of `if not seq`? To me, this feels a bit too low level. 2) A protocol would formalize that concept by building respective syntax into the language. But I concede that it may be overkill. 3) The simple solution would be to add `is_empty()` methods to all stdlib containers and encourage third party libs to adopt that convention as well. That would give a uniform syntax by convention. Reflecting the discussion in this thread, I now favor variant 3). Tim

Oh, if I'm going to be a smart-ass, I should probably remember that I need a `not` in there. No need to correct me, I saw it as soon as pressing send. Nonetheless, this is an unnecessary method or function. Truthiness is non-emptiness for most purposes. And where it's not, you need something more specialized to the purpose at hand. On Tue, Aug 24, 2021, 6:14 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:

On 8/24/21 3:03 PM, Tim Hoffmann via Python-ideas wrote:
**How do you check if a container is empty?**
IMHO the answer should not depend on the container.
I think this is the fly in the ointment -- just about everything, from len() to bool(), to add, to iter() /all/ depend on the container -- even equality depends on the container. `and`, `or`, and `not` partially depend on the container (via bool()). Only `is` is truly independent.
And since (3) is a method on the container, it absolutely "depends on the container". -- ~Ethan~

Ethan Furman wrote:
Sorry, I think you got me wrong: The meaning and thus implementation depends on the type (e.g. each container defines its own __len__()). However the syntax for querying length that is still uniformly `len(container)`. Likewise I'd like to have a uniform syntax for emptiness, and not different syntax for different types (`if not container` / `if len(array) == 0` / `if dataframe.empty`).

Hi Tim, I'm sorry if this has been brought up before, but *aside from PEP 8* is there anything wrong with using "if len(a)" for nonempty, or "if not len(a)" for empty? It would seem to work for numpy and pandas arrays, and it works for builtin sequences. Also, it has the advantage of being 100% backwards compatible. :-) Surely conforming to PEP 8 shouldn't need an addition to the language or stdlib? Or does it not work? On Tue, Aug 24, 2021 at 3:42 PM Tim Hoffmann via Python-ideas < python-ideas@python.org> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 2021-08-25 00:48, Guido van Rossum wrote:
What is the cost of 'len'? If it's always O(1), then it's not a problem, but if it's not O(1) (counting the items in a tree, for example) and you're not interested in how many items there are but only whether there's at least one, then...

I wanted to do a survey of various "aggregates" in Python to see if any stand out as making the usual `if stuff: ...` troublesome. I wrote a little script at https://github.com/DavidMertz/LanguagePractice/blob/main/python/aggregates.p... . I'm being deliberately vague about an "aggregate." It might not be a collection strictly speaking, but it is something that might seems to "contain" values in some sense. Basically, I think that standard library and built-in stuff behaves per PEP8. Some other libraries go their own way. I throw in a linked-list implementation I found on PyPI. I've never used it beyond this script; but per what it is, it cannot implement `len()` on O(1) (well, it *could* if it does extra bookkeeping; but then it's kinda a different thing). In NumPy land, the np.empty vs. np.zeros case is another oddball. On my system, my memory happened to have some prior value that wasn't zero; that could vary between runs, in principle. These are the results: Expr: '' | Value: '' Truth: False | Length: 0 Expr: list() | Value: [] Truth: False | Length: 0 Expr: tuple() | Value: () Truth: False | Length: 0 Expr: dict() | Value: {} Truth: False | Length: 0 Expr: set() | Value: set() Truth: False | Length: 0 Expr: bytearray() | Value: bytearray(b'') Truth: False | Length: 0 Expr: bytearray(1) | Value: bytearray(b'\x00') Truth: True | Length: 1 Expr: bytearray([0]) | Value: bytearray(b'\x00') Truth: True | Length: 1 Expr: array.array('i') | Value: array('i') Truth: False | Length: 0 Expr: array.array('i', []) | Value: array('i') Truth: False | Length: 0 Expr: Nothing() | Value: EmptyNamedTuple() Truth: False | Length: 0 Expr: deque() | Value: deque([]) Truth: False | Length: 0 Expr: deque([]) | Value: deque([]) Truth: False | Length: 0 Expr: ChainMap() | Value: ChainMap({}) Truth: False | Length: 0 Expr: queue.Queue() | Value: <queue.Queue object at 0x7f0940dd2190> Truth: True | Length: No length Expr: asyncio.Queue() | Value: <Queue at 0x7f0940dd2190 maxsize=0> Truth: True | Length: No length Expr: multiprocessing.Queue() | Value: <multiprocessing.queues.Queue object at 0x7f0940dd2190> Truth: True | Length: No length Expr: np.ndarray(1,) | Value: array([5.e-324]) Truth: True | Length: 1 Expr: np.ndarray((1,0)) | Value: array([], shape=(1, 0), dtype=float64) Truth: False | Length: 1 Expr: np.empty((1,)) | Value: array([5.e-324]) Truth: True | Length: 1 Expr: np.zeros((1,)) | Value: array([0.]) Truth: False | Length: 1 Expr: np.zeros((2,)) | Value: array([0., 0.]) Truth: No Truthiness | Length: 2 Expr: np.ones((1,)) | Value: array([1.]) Truth: True | Length: 1 Expr: np.ones((2,)) | Value: array([1., 1.]) Truth: No Truthiness | Length: 2 Expr: pd.Series() | Value: Series([], dtype: float64) Truth: No Truthiness | Length: 0 Expr: pd.DataFrame() | Value: Empty DataFrame Truth: No Truthiness | Length: 0 Expr: xr.DataArray() | Value: <xarray.DataArray ()> Truth: True | Length: No length Expr: linkedadt.LinkedList() | Value: <linkedadt.LinkedList object at 0x7f08d8d77f40> Truth: False | Length: 0 -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

It seems the conversation has confused two related concepts: 1) The default bool() implementation (Truthiness) -- this is what the OP said was recommended by PEP 8: "For sequences, (strings, lists, tuples), use the fact that empty sequences are false:" -- there is some debate about that whether this is a good recommendation or not, but i don't think that's the OPs point. Rather: 2) That there is no standard way to explicitly test containers for "emptiness" - - there is only length, with the assumption that len(something) == 0 or not len(something) is a good way to test for emptiness. I still don't see why length is not perfectly adequate, but i wonder if there are any "containers" i.e.things that could be empty, in the std lib that don't support length. Looking in the ABCs, a Container is something that supports `in`, and a Sized is something that supports len()-- so in theory, there could be a Container that does not have a length. Are there any in the std lib? Perhaps the ABCs are instructive in another way here -- if we were to add a __empty__ or some such dunder, what ABC would require it? Container? or yet another one-method ABC? -CHB On Tue, Aug 24, 2021 at 8:39 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

“Container” is a kind of pun, it’s something with a __contains__ method. The thing you’re looking for is “Collection”, which is the base for sequences, mappings and sets. I also note that the discussion seems quite stuck. —Guido On Tue, Aug 24, 2021 at 21:55 Christopher Barker <pythonchb@gmail.com> wrote:
-- --Guido (mobile)

On Tue, Aug 24, 2021 at 10:12 PM Guido van Rossum <guido@python.org> wrote:
“Container” is a kind of pun, it’s something with a __contains__ method. The thing you’re looking for is “Collection”.
Hmm, perhaps we should tweak the docs, the section is titled: "Abstract Base Classes for Containers" But yes, Collection is what I (and probably the OP) was looking for, in which case, all Collections support len(), so the way to explicitly check if they are empty is len(). Nothing to be done here. I also note that the discussion seems quite stuck.
indeed. I'm done :-) -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, Aug 24, 2021 at 7:23 PM MRAB <python@mrabarnett.plus.com> wrote:
It's a pretty universal assumption that len() is O(1) -- something that doesn't do that probably shouldn't implement __len__(). (And yeah, there's probably some tree package around that does implement an O(N) __len__(). People do a lot of silly things though, we can't handle *everything*.)
It was pointed out to me that numpy allows arrays that have no elements but a nonzero first dimension. People could disagree about whether that should be considered empty. I'm not sure about Pandas, but IIRC a Dataframe is always a table of rows, with all rows having the same number of columns. Here I'd say that if there's at least one row in the table, I'd call it non-empty, even if the rows have no columns. This conforms to the emptiness of [()]. It's possible that there's a common use case in the data science world where this should be counted as empty, but to me, that would be inconsistent -- a row with zero columns is still a row. (For numpy arrays my intuition is less clear, since there's more symmetry between the dimensions.) So then the next question is, what's the use case? What code are people writing that may receive either a stdlib container or a numpy array, and which needs to do something special if there are no elements? Maybe computing the average? AFAICT Tim Hoffman (the OP) never said. PS. Why is anyone thinking that an array containing all zeros (and at least one zero) might be considered empty? That seems a totally different kind of test. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Tue, Aug 24, 2021 at 9:50 PM Guido van Rossum <guido@python.org> wrote:
indeed -- you can kinda-sorta map an array to nested lists, e.g, is : [[],[],[],[]] an empty list? It has a length of 4, and so does a (4, 0) sized numpy array. numpy is actually mostly consistent with the std lib in that regard.
I think there may have been confusion with boolean Falsiness indicating emptiness, as it does for Sequences. An array containing all zeros might well be considered False, though not empty. Of course, that's why numpy arrays raise a DeprecationWarning when used that way. Another issue with numpy arrays is that they are non resizable, so needing to check if one is empty is pretty rare -- there is no way that everything could have been removed, or nothing put in in the first place -- the emptiness could have, and likely would have been checked before it was created. -CHB
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Guido van Rossum wrote:
There's two parts to the answer: 1) There functions e.g. in scipy and matplotlib that accept both numpy arrays and lists of flows. Speaking from matplotlib experience: While eventually we coerce that data to a uniform internal format, there are cases in which we need to keep the original data and only convert on a lower internal level. We often can return early in a function if there is no data, which is where the emptiness check comes in. We have to take extra care to not do the PEP-8 recommended emptiness check using `if not data`. 2) Even for cases that cannot have different types in the same code, it is unsatisfactory that I have to write `if not seq` but `if len(array) == 0` depending on the expected data. IMHO whatever the recommended syntax for emptiness checking is, it should be the same for lists and arrays and dataframes.

On Wed, 25 Aug 2021 at 14:13, Tim Hoffmann via Python-ideas <python-ideas@python.org> wrote:
You don't. You can write a local isempty() function in matplotlib, and add a requirement *in your own style guide* that all emptiness checks use this function. Why do people think that they can't write project-specific style guides, and everything must be in PEP 8? That baffles me.
2) Even for cases that cannot have different types in the same code, it is unsatisfactory that I have to write `if not seq` but `if len(array) == 0` depending on the expected data. IMHO whatever the recommended syntax for emptiness checking is, it should be the same for lists and arrays and dataframes.
You don't have to do any such thing. You can just write "if len(x) == 0" everywhere. No-one is *making* you write "if not seq". All I can see here is people making problems for themselves that they don't need to. Sorry if that's not how it appears to you, but I'm genuinely struggling to see why this is an issue that can't be solved by individual projects/users. The only exception I can see is the question "what's the best way to suggest to newcomers?" but that's more of a tutorial/documentation question, than one of standardisation, style guides or language features. Paul

I agree with Guido. The only problem here is third-party libraries that don't use bool() to indicate emptiness. If you need to support those, use len(). But this doesn't mean a change to the standard library, because those third-party libraries are, well, third-party. We don't need a more explicit way to specify emptiness. bool(seq) is fine. On Wed, Aug 25, 2021, 8:05 AM Guido van Rossum <guido@python.org> wrote:

Ok, I have no problem ignoring PEP-8 where it's not applicable. I've brought this topic up, because I thought there could be a improvement either in PEP-8 and/or by adding something to the language. I still do think that, but I accept that the problem it solves is not considered relevant enough here to take further action. Anyway, thanks all for the discussion! Tim

On Tue, 24 Aug 2021 at 23:06, Tim Hoffmann via Python-ideas <python-ideas@python.org> wrote:
I will note that if we take things to extremes, that constraint simply cannot be adhered to - types can define bool, len, and any other special method we care to invent, however they like. With that in mind, I'd argue that if a collection defines bool and len in such a way that "not bool(c)" or "len(c) == 0" doesn't mean "c is empty", then it is a special case, and has deliberately chosen to be a special case. Yes, I know that makes numpy arrays and Pandas dataframes special cases. As I said, they have deliberately chosen to not follow normal conventions. Take it up with them if you care to. (IMO there's no point - they have reasonable justifications for their choices, and it's too late to change anyway). Based on the "obvious" intent of the classes in collections.abc, I'd say that if you test "len(c) == 0" then you can reasonably say that you cover all collections. If you want to support the weird multi-dimensional zero-sized numpy arrays that have len != 0, then special case them. But frankly, I'd wait until a user comes up with a reason why you need to support them, who can tell you what they expect your code to *do* with them in the first place... Again, practical use cases rather than abstract questions. Paul

Paul Moore wrote:
I can agree that "len(c) == 0" should mean "c is empty" for all practical purposes (and numpy and pandas conform with that). But "bool(c)" should mean "c is empty" is an arbitrary additional constraint. For some types (e.g. numpy, padas) that cannot be fulfilled in a logically consisent way. By requiring that additional constraint, the Python languages forces these containers to be a special case. To sum things up, I've again written the Premise: It is valuable for the language and ecosystem to not have this special casing. I.e. I don't want to write `if not seq` on the one hand and `if len(array) == 0`. The syntax for checking if a list or an array is empty should be the same. Conclusion: If so, PEP-8 has to be changed because "For sequences, (strings, lists, tuples), use the fact that empty sequences are false:" is not a universal solution. The question then is, what other syntax to use for an emptiness check. Possible solutions: 1) The length check is a possible solution: "For sequences, (strings, lists, tuples), test emptiness by `if len(seq) == 0`. N.b. I think this is more readable than the variant `if not len(seq)` (but that could be discussed as well). 2) The further question is: If we change the PEP 8 recommendation anyway, can we do better than the length check? IMHO a length check is semantically on a lower level than an empty-check. Counting elements is a more detailed operation than only checking if we have any element. That detail is not needed and distracting if we are only interested in is_empty. This is vaguely similar to iterating over indices (`for i in range(len(users))`) vs. iterating over elements (`for user in users`). We don't iterate over indices because that's usually a detail we don't need. So one can argue we shouldn't count elements if we only wan to know if there are any. If we come to the conclusion that an explicit empty check is better than a length=0 check, there are again different ways how that could be implmented again. My favorite solution now would be adding `is_empty()` methods to all standard containers and encourage numpy and pandas to add these methods as well. (Alternatively an empty protocol would be a more formal solution to an explicit check).

On Wed, 25 Aug 2021 at 14:00, Tim Hoffmann via Python-ideas <python-ideas@python.org> wrote:
I can agree that "len(c) == 0" should mean "c is empty" for all practical purposes (and numpy and pandas conform with that).
OK
But "bool(c)" should mean "c is empty" is an arbitrary additional constraint. For some types (e.g. numpy, padas) that cannot be fulfilled in a logically consisent way. By requiring that additional constraint, the Python languages forces these containers to be a special case.
It's another way of expressing the intent, which is potentially useful if you don't care about those special cases. We're all consenting adults here, there's no "constraint" involved.
The *language* (by which I assume you mean "Python") doesn't have any special casing. It just has behaviour. It's valuable for educating beginners, maybe, to have an easily expressed way of checking for emptiness, but for "the language and ecosystem"? I don't think so - it's valuable to allow people to express their intent in the way that is most natural to them, IMO.
Conclusion: If so, PEP-8 has to be changed because "For sequences, (strings, lists, tuples), use the fact that empty sequences are false:" is not a universal solution.
"Has to be" is extremely strong here. PEP 8 is a set of *guidelines* that people should use with judgement and thought, not a set of rules to be slavishly followed. And in fact, I'd argue that describing a numpy array or a Pandas dataframe as a "sequence" is pretty inaccurate anyway, so assuming that the statement "use the fact that empty sequences are false" applies is fairly naive anyway. But if someone wants to alter PEP 8 to suggest using len() instead, I'm not going to argue, I *would* get cross, though, if the various PEP 8 inspired linters started complaining when I used "if seq" to test sequences for emptiness.
If someone is reading PEP 8 and can't make their own choice between "if len(seq) == 0" and "if not len(seq)" then they should go and read a Python tutorial, not expect the style guide to tell them what to do :-(
2) The further question is: If we change the PEP 8 recommendation anyway, can we do better than the length check? IMHO a length check is semantically on a lower level than an empty-check. Counting elements is a more detailed operation than only checking if we have any element. That detail is not needed and distracting if we are only interested in is_empty. This is vaguely similar to iterating over indices (`for i in range(len(users))`) vs. iterating over elements (`for user in users`). We don't iterate over indices because that's usually a detail we don't need. So one can argue we shouldn't count elements if we only wan to know if there are any.
Not without either changing the language/stdlib or recommending a user-defined function or 3rd party library. To put that another way, the current Python language and stdlib doesn't currently have (or need, IMO) a better way of checking for emptiness. If you want to argue that it needs one, you need an argument far better than "to have something to put in PEP 8".
If we come to the conclusion that an explicit empty check is better than a length=0 check, there are again different ways how that could be implmented again. My favorite solution now would be adding `is_empty()` methods to all standard containers and encourage numpy and pandas to add these methods as well. (Alternatively an empty protocol would be a more formal solution to an explicit check).
Well, that's "if false then ..." in my opinion. I.e., I don't think you've successfully argued that an explicit empty check is better, and I don't think you have a hope of doing so if your only justification is "so we can recommend it in PEP 8". Paul

On Mon, 2021-08-23 at 13:51 +0000, Thomas Grainger wrote:
In all seriousness this is an actual problem with numpy/pandas arrays where:
<snip>
It must be undefined because the operators are elementwise operators and the concept of non-emptiness being True does not make sense for elementwise containers. arr = np.arange(5) if arr < 0: arr *= -1 is code that makes sense if `arr` was a number, but it is not meaningful for arrays. The second, distinct, problem is that `len` is non-obvious for many N-D containers: arr = np.ones(shape=(5, 0)) assert len(arr) == 5 assert arr.size == 0 # it is empty. NumPy breaks the contract that `len` is already the same as "size" [1]. (And we are stuck with it probably...) So the length definition of truth only works out for Python containers because `container == 0` is always obviously `False` already and you never have the `len != size` problem. An argument that I will bring is that the bigger problem for arrays may be that we don't have a concept for "elementwise container" (or maybe higher dimensional container, but I think the elementwise is the important distinction). A "size" protocol would be useful to deal with NumPy's choice of `len`! But, an "has elementwise operations" protocol may be more generally useful to code dealing with a mix of NumPy arrays or Python sequences – and even NumPy itself. (E.g. it also tells you that `+` will not concatenate and it could tell NumPy whether it should try coercing to an array or not.) Cheers, Sebastian [1] I will not argue that this is the best way to define it, I don't like the list-of-lists analogy, so I think that `len(arr) == arr.size` and `arr.__iter__` iterating all elements would be a better definition. (Making the notion of "length" equivalent to "size".) Or even refusing `len` unless 1-D! That would make everyone who argues to use `len()` always correct, or at least never incorrect. But it is simply not what we got...

On Mon, Aug 23, 2021 at 01:51:17PM -0000, Thomas Grainger wrote:
In all seriousness this is an actual problem with numpy/pandas arrays where:
Indeed. Third-party classes can do anything they like. It is a shame that numpy's API made the decision that they did, but I don't think it is very common to use numpy arrays in a context where they are expected to duck-type as collections. -- Steve

Steven D'Aprano wrote:
I don't think it is very common to use numpy arrays in a context where they are expected to duck-type as collections.
Maybe not "numpy arrays duck-type as collections", but it is very common that arrays and sequences are used interchangably. Numpy has created the term "array-like" for this. Well technically, an array-like is something that `np.array()` can turn into an array. But from a user point of view array-like effectively means array or sequence. If you want to write a function that accepts array-like `values`, you have to change a check `if values` to `if len(values) == 0`. That works for both but is against the PEP8 recommendation. This is a shortcoming of the language.

On 8/23/21 1:15 PM, Tim Hoffmann via Python-ideas wrote:
Numpy is not Python, but a specialist third-party package that has made specialist choices about basic operations -- that does not sound like a shortcoming of the language. It seems to me that the appropriate fix is for numpy to have an "is_empty()" function that knows how to deal with arrays and array-like structures, not force every container to grow a new method. -- ~Ethan~

Ethan Furman wrote:
It seems to me that the appropriate fix is for numpy to have an "is_empty()" function that knows how to deal with arrays and array-like structures, not force every container to grow a new method.
Yes, numpy could and probably should have an "is_empty()" method. However, defining a method downstream breaks duck typing and maybe even more important authors have to mentally switch between the two empty-check variants `if users` and `if users.is_empty()` depending on the context. Ethan Furman wrote:
The "specialist choices" ``if len(values) == 0` in Numpy are the best you can do within the capabilities of the Python language if you want the code to function with lists and arrays. For Numpy to do better Python would need to either provide the above mentioned "has element-wise operations" protocol or an is_empty protocol. I consider emptiness-check a basic concept that should be consistent and easy to use across containers. Tim

On 8/23/21 2:31 PM, Tim Hoffmann via Python-ideas wrote:
Ethan Furman wrote:
The context being whether or not you working with numpy? Is there generic code that works with both numpy arrays and other non-numpy data? Do we have any non-numpy examples of this problem?
Python has an emptiness-check and numpy chose to repurpose it -- that is not Python's problem nor a shortcoming in Python. Suppose we add an `.is_empty()` method, and five years down the road another library repurposes that method, that other library then becomes popular, and we then have two "emptiness" checks that are no longer consistent -- do we then add a third? -- ~Ethan~

Ethan Furman wrote:
E.g. SciPy and Matplotlib accept array-like inputs such as lists of numbers. They often convert internally to numpy arrays, but speaking for the case of Matplotlib, we sometimes need to preseve the original data and/or delay conversion, so we have to take extra care when checking inputs. Pandas has the same Problem. "The truth value of a DataFrame is ambiguous". They have introduced a `DataFrame.empty` property. I suspect the same problem exists for other types like xarray or tensorflow tensors, but I did not check.
Python has an emptiness-check and numpy chose to repurpose it -- that is not Python's problem nor a shortcoming in Python.
Python does not have an emptiness-check. Empty containers map to False and we suggest in PEP8 to use the False-check as a stand in for the emptiness-check. But logically falsiness and emptiness are two separate things. Numpy (IMHO with good justification) repurposed the False-check, but that left them without a standard emptiness check.
I don't think this is a valid argument. We would introduce the concept of emptiness. That can happen only once. What empty means for an object is determined by the respective library and implemented there, similar to what `len` means. There can't be any inconsistency.

On 24Aug2021 06:55, tim.hoffmann@mailbox.org <tim.hoffmann@mailbox.org> wrote:
In my mind PEP8 says that the emptiness check should be expressed as a False-check for Pythonic containers. That is a good thing to me. It does not mean we've no notion of emptiness, it says that if you've got something which can be empty it should be possible to check that by doing a False-check. And _that_ implies that for containers, the False-check _is_ emptiness. Defined, to me. Numpy has made a different decision, probably because a heap of operators on their arrays map the operator onto the array elements. To them, that is useful enough to lose some Python-idiom coherence for the wider gain _in that special domain_. If we're chasing rough edges, consider queue.Queue:
I would often like to treat Queues as a container of queued items, basicly because I'd like to be able to probe for the presence of queued items via the emptiness idiom. But I can't. It does has a .empty() method. I don't even know what my point is here :-( But I am definitely -1 on weaking the bool(container) idiom as a test for empty/nonempty, and also for asking every container implementation on the planet to grow a new is_empty() method. Cheers, Cameron Simpson <cs@cskk.id.au>

On 21.08.2021 23:33, Tim Hoffmann via Python-ideas wrote:
I assume your function would first check that the argument is a sequence and then check its length to determine emptiness. That doesn't strike me as more explicit. It's just shorter than first doing the type check and then testing the length. For the method case, it's completely up to the object to define what "empty" means, e.g. could be a car object which is fully fueled but doesn't have passengers. That's very flexible, but also requires all sequences to play along, which is hard. When you write "if not seq: ..." in a Python application, you already assume that seq is a sequence, so the type check is implicit (you can make it explicit by adding a type annotation and applying a type checked; either static or dynamic) and you can assume that seq is empty if the boolean test returns False. Now, you can easily add a helper function which implements your notion of "emptiness" to your applications. The question is: would it make sense to add this as a builtin. My take on this is: not really, since it just adds a type check and not much else. This is not enough to warrant the added complexity for people learning Python. You may want to propose adding a new operator.is_empty() function which does this, though. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Aug 24 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/
participants (19)
-
2QdxY4RzWzUUiLuE@potatochowder.com
-
Cameron Simpson
-
Chris Angelico
-
Christopher Barker
-
David Mertz, Ph.D.
-
Ethan Furman
-
Finn Mason
-
Guido van Rossum
-
Marc-Andre Lemburg
-
MRAB
-
Paul Moore
-
Ricky Teachey
-
Sebastian Berg
-
Serhiy Storchaka
-
Steven D'Aprano
-
Thomas Grainger
-
Tim Hoffmann
-
tim.hoffmann@mailbox.org
-
Valentin Berlier