[Python-ideas] a set of enum.Enum values rather than the construction of bit-sets as the "norm"?

Franklin? Lee leewangzhong+python at gmail.com
Sat Dec 30 23:50:33 EST 2017


On Fri, Dec 29, 2017 at 11:25 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sat, Dec 30, 2017 at 02:56:46AM +1100, Chris Angelico wrote:
>> On Sat, Dec 30, 2017 at 2:38 AM, Steven D'Aprano <steve at pearwood.info> wrote:
>> > The lack of support for the `in` operator is a major difference, but
>> > there's also `len` (equivalent to "count the one bits"), superset
>> > and subset testing, various in-place mutator methods, etc. Java has a
>> > BitSet class, and you can see the typical sorts of operations
>> > commonly required:
>> >
>> > https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html
>>
>> Okay. A subclass of int could easily add a few more. Counting the 1
>> bits isn't difficult; superset and subset testing are actually the
>> same as 'contains' but with more than one bit at a time. (In fact,
>> checking if a set contains a subset is *easier* with ints than with
>> actual sets!) Are in-place mutators that big a deal? I'm sure there
>> are sets in languages with no mutables.
>
> We seem to be talking at cross-purposes.
>
> Obviously we can and do already use ints as if they were set-like data
> structures. For example, the re module already does so. If you want to
> call that a kind of "bit set", I'm okay with that, but Wikipedia
> suggests that "bit array" is the canonical name:
>
>     "bit array (also known as bit map , bit set, bit string, or
>     bit vector)"
>
> https://en.wikipedia.org/wiki/Bit_array
>
> The obvious reason why is that sets are unordered but arrays of bits are
> not: 0b1000 is not the same "set" as 0b0010.

I think "bit-set" was used because it has semantic meaning in this context.

In your example, it is not the bits that are ordered, but the values,
which have a canonical order (or, more generally, a specified order).
0b1000 represents the set {3}, while 0b0010 represents the set {1}. A
bit set representation is, in fact, unordered, since {1,3} and {3,1}
are both represented by the same int. The values of a bit array are
the bits themselves, but the values of a bitset are the indices which
have a 1.

> I think I have beaten this dead horse enough. This was a minor point
> about the terminology being used, so I think we're now just waiting on
> Paddy to clarify what his proposal means in concrete terms.

Paddy might want something like this:
- For existing APIs which take int or IntFlag flags, allow them to
also take a set (or perhaps any collection) of flags.
- In new APIs, take sets of Enum flags, and don't make them IntFlag.
- Documentation should show preference toward using sets of Enum
flags. Tutorials should pass sets.


More information about the Python-ideas mailing list