On Fri, Dec 29, 2017 at 11:25 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Dec 30, 2017 at 02:56:46AM +1100, Chris Angelico wrote:
On Sat, Dec 30, 2017 at 2:38 AM, Steven D'Aprano <steve@pearwood.info> wrote:
The lack of support for the `in` operator is a major difference, but there's also `len` (equivalent to "count the one bits"), superset and subset testing, various in-place mutator methods, etc. Java has a BitSet class, and you can see the typical sorts of operations commonly required:
https://docs.oracle.com/javase/8/docs/api/java/util/BitSet.html
Okay. A subclass of int could easily add a few more. Counting the 1 bits isn't difficult; superset and subset testing are actually the same as 'contains' but with more than one bit at a time. (In fact, checking if a set contains a subset is *easier* with ints than with actual sets!) Are in-place mutators that big a deal? I'm sure there are sets in languages with no mutables.
We seem to be talking at cross-purposes.
Obviously we can and do already use ints as if they were set-like data structures. For example, the re module already does so. If you want to call that a kind of "bit set", I'm okay with that, but Wikipedia suggests that "bit array" is the canonical name:
"bit array (also known as bit map , bit set, bit string, or bit vector)"
https://en.wikipedia.org/wiki/Bit_array
The obvious reason why is that sets are unordered but arrays of bits are not: 0b1000 is not the same "set" as 0b0010.
I think "bit-set" was used because it has semantic meaning in this context. In your example, it is not the bits that are ordered, but the values, which have a canonical order (or, more generally, a specified order). 0b1000 represents the set {3}, while 0b0010 represents the set {1}. A bit set representation is, in fact, unordered, since {1,3} and {3,1} are both represented by the same int. The values of a bit array are the bits themselves, but the values of a bitset are the indices which have a 1.
I think I have beaten this dead horse enough. This was a minor point about the terminology being used, so I think we're now just waiting on Paddy to clarify what his proposal means in concrete terms.
Paddy might want something like this: - For existing APIs which take int or IntFlag flags, allow them to also take a set (or perhaps any collection) of flags. - In new APIs, take sets of Enum flags, and don't make them IntFlag. - Documentation should show preference toward using sets of Enum flags. Tutorials should pass sets.