[Python-ideas] Predicate Sets

Andrew Barnert abarnert at yahoo.com
Mon Jan 20 08:56:49 CET 2014


From: Daniel da Silva <var.mail.daniel at gmail.com>
Sent: Sunday, January 19, 2014 3:41 PM


>Overview:
>    Sets in mathematics can be defined by a list of elements without repetitions, and alternatively by a predicate (function) that determines inclusion.

The whole point of modern set theory is that sets cannot be defined by a predicate alone; only by a predicate _and a set to apply it over_. Which we already have in set comprehensions.

And your suggestion has the exact same problem that naive set theory had:

>>> myset = predicateset(lambda s: s.startswith('a'))
>>> 'xyz' in myset
False

>>> russellset = predicateset(lambda s: s not in s)
>>> russellset in russelset

Presumably this should cause the computer to scream "DOES NOT COMPUTE!" and blow up, which I think would be hard to implement in CPython.

Still, this could be useful despite not being mathematically consistent. Python functions don't have to be mathematical functions, and you could easily just state that using a predicateset that turns out to be a proper class as undefined behavior, so it's perfectly acceptable if an implementation wants to hang forever or fail with a recursion error or whatever.

Anyway, the way you've designed this, as far as I can tell, there's nothing stopping it from being a module on PyPI that you can come back and propose for inclusion in the stdlib if a lot of people start using it. So I'd say go for it. (And you can even propose syntax, a comprehension with no for clause: {x if expression(x)}, if it's popular enough that seems warranted.)

Also, this isn't a Set in Python terms—or an Iterable or a Sized; it's just a Container. Which is perfectly reasonable, and means len(s) and iter(s) failing is exactly what you should expect. But the name could lead people to expect it to be a Set. Then again, "predicatecontainer" sounds horrible, so maybe the small potential for confusion is fine.

You still need to work out the details. Most of them seem easy, but there are some interesting questions.

 * It's presumably immutable, and therefore Hashable. (It can fail if its predicate isn't—which most callables are, but that's not guaranteed—but I believe that's fine for Hashables.)

 * Is the predicate callable accessible through a public name, or do you have to access it through __contains__?
 * Presumably intersection, union, difference, and symmetric_difference with another predicateset do the obvious thing (or/and/and not/xor the predicates). Or is there something more efficient you could do? There are some modules on PyPI that deal with boolean combinations of predicates; maybe just borrow the design or even import the implementation from one of them?
 * intersection with a set or other Iterable can return a set, equivalent to {x for x in s if x in ps}. And __rand__ allows it to work in the wrong direction when using the operator. But set.intersection(predicateset) will raise a TypeError, and there's not much you can do about that. (And the same goes for the other methods.)
 * union, difference, and symmetric difference with an Iterable presumably turns the other argument into a predicateset(x in s) and then operates on that? Or is there a better way to do it?
 * isdisjoint with a set or other iterable is easy, but what about with another predicateset? An error?
 * issubset and issuperset don't seem implementable, except in the special case that one predicate is made by intersection or union from the other; do they just not exist?

 * Do you want other operations from naive set theory that don't make sense for Python sets, like the unconstrained complement? They could all be implemented with the existing operations and a set of all things (e.g., self.complement() is just predicateset(lambda x: True)).difference(self)), so maybe not. But they might be convenient. (Again, tying in with the boolean-predicates libraries, most of them have a "not" type operation.)

The big problem is coming up with a compelling use case. This one doesn't sell me:

    bar_files = search_files('bar', exclude=predicateset(lambda fname: not fname.endswith('~'))) 


It seems like it make more sense to have exclude take a function, so you could just write:

    bar_files = search_files('bar', exclude=lambda fname: not fname.endswith('~'))

In general, calling a function is just as easy, natural, and readable as testing membership; calling filter or using a comprehension would generally be simpler than creating a predicateset just to use intersection; etc.

And in cases where sometimes a container is useful, but sometimes a function is better… well, look at re.sub or BeautifulSoup.find. I've seen people who didn't know that you could pass a function to re.sub, but nobody who, on seeing it, had any trouble understanding what it did.

Maybe there's a use for "legacy" APIs that were designed around containers and would be hard to change. For example, many file-picker dialogs let you specify the acceptable extensions, but not a filter function. But in most cases, that's because they're ultimately calling some underlying C/ObjC/.NET/whatever function that needs an array, and a predicateset won't help there anyway. (Or, put another way, they're not designed around containers, they're designed around iterables.)


More information about the Python-ideas mailing list