
Ok, so at the risk of muddying the waters even more, I've put together yet another possible way to do enums, and would be interested to hear comments.. Based on the list of required/desired/undesired properties laid out in the enum PEP thread, I believe a lot of the back and forth to date has been due to the fact that all of the proposed implementations fall short of fulfilling at least some of the desired properties for a general-purpose enum implementation in different ways. I've put together something based on ideas from Tim's, Barry's, other things thrown out in discussion, and a few of my own which I think comes closer to ticking off most of the boxes. (The code and some examples are available at https://github.com/foogod/pyenum) Enums/groups are defined by creating subclasses of Enum, similarly to Barry's implementation. The distinction is that the base "Enum" class does not have any associated values (they're just singleton objects with names). At a basic level, the values themselves are defined like so: class Color (Enum): RED = __ GREEN = __ BLUE = __ As has been (quite correctly) pointed out before, the single-underscore (_) has a well-established meaning in most circles related to gettext/etc, so for this application I picked the next best thing, the double-underscore (__) instead. I think this is reasonably mnemonic as a "fill in the blank" placeholder, and also not unduly verbose. One advantage of using __ over, say, ellipsis (...) is that since it involves a name resolution, we can add (just a little!) magic to generate distinct __ objects on each reference, so, for example, the following actually works as the user probably expects it to: class Color (Enum): RED = CRIMSON = __ BLUE = __ (RED and CRIMSON actually become aliases referring to the same enum value, but BLUE is a different enum value) One other rather notable advantage to __ is that we can define a special multiplication behavior for it which allows us to make the syntax much more compact: class Color (Enum): RED, GREEN, BLUE, ORANGE, VIOLET, BEIGE, ULTRAVIOLET, ULTRABEIGE = __ * 8 (as an aside, I have an idea about how we might be able to get rid of the "* 8" altogether, but it requires a different (I think generally useful) change to the language which I will propose separately) Each enum value is actually an instance of its defining class, so you can determine what type of enum something is with simple inheritance checks:
isinstance(Color.RED, Color) True
For enums which need to have int values, we can use IntEnum instead of Enum: class Errno (IntEnum): EPERM = 1 ENOENT = 2 ESRCH = 3 EINTR = 4 (Technically, IntEnum is just a special case of the more generic TypeEnum class: class IntEnum (TypeEnum, basetype=int): pass ..which means that theoretically you could define enums based on (almost) any base type (examples.py has an example using floats)) Anyway, as expected, enums have reasonable strs/reprs, and can be easily translated to/from base values using traditional "casting" syntax:
You can also lookup enums by name using index notation:
Errno['EPERM'] <__main__.Errno.EPERM (1)>
It's actually worth noting here that TypeEnums are actually subclasses of their basetype, so IntEnums are also ints, and can be used as drop-in replacements for any existing code which is expecting an int argument. They can also be compared directly as if they were ints: if exc.errno == Errno.EPERM: do_something() TypeEnums enforce uniqueness:
However, you can define "aliases" for enums, if you want to:
Of course, pure (valueless) Enum instances are logical singletons (i.e. they only compare equal to themselves), so there's no potential for duplication there. Finally, a special feature of __ is that it can also be called with parameters to create enum values with docstrings or other metadata:
A few other notable properties: - Enum values are hashable, so they can be used as dict keys, etc. - Though enum values will compare equal to their associated basetype values (Errno.EPERM == 1), enum values from different classes never compare equal to each other (Foo.A != Errno.EPERM). This prevents enum-aware code from accidentally mistaking one enum for a completely different enum (with a completely different meaning) simply because they map to the same int value. - Using the class constructor does not create new enums, it just returns the existing singleton enum associated with the passed value (Errno(1) returns Errno.EPERM, not a new Errno). If there is no predefined enum matching that value, a ValueError is raised, thus using the constructor is basically a "cast" operation, and only works within the supported range of values. (New enum values can be created using the .new() method on the class, however, if desired) So... thoughts? --Alex

On 02/22/2013 11:25 AM, Alex Stewart wrote:
Ok, so at the risk of muddying the waters even more, I've put together yet another possible way to do enums, and would be interested to hear comments..
Largely similar to my own implementation -- so of course I like it! :)
A few other notable properties:
* Enum values are hashable, so they can be used as dict keys, etc.
Problem here: should we have our enums hash the same as the underlying value? Consider: --> import yaenum --> class Color(yaenum.Enum): ... black ... red ... green ... blue ... --> class Literature(yaenum.Enum): ... scifi ... fantasy ... mystery ... pop ... --> Color.black Color('black', value=0) --> Literature.scifi Literature('scifi', value=0) --> black = Color.black --> scifi = Literature.scifi --> black == 0 True --> hash(black) 0 --> scifi == 0 True --> hash(scifi) 0 --> black == scifi False --> hash(0) 0 --> huh = dict() --> huh[black] = 9 --> huh {Color('black', value=0): 9} --> huh[0] 9 --> huh[scifi] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: Literature('scifi', value=0) --> huh[scifi] = 11 --> huh {Color('black', value=0): 9, Literature('scifi', value=0): 11} --> huh[0] 9 --> del huh[0] --> huh[0] 11 From a practicality standpoint the question is: How likely is it to use different enum classes as keys? -- ~Ethan~

On Fri, Feb 22, 2013 at 3:45 PM, Alexandre Zani <alexandre.zani@gmail.com>wrote:
Actually, not really. The only requirement in Python is that if two objects compare equal, they need to have the same hash value, but there is no requirement the other direction (indeed, that would be a problem in many cases as the domain of hash values is inherently much smaller than the domain of all possible object data, so it must be expected that you will get hash duplication from time to time, no matter what method you use for calculating the hashes). In the case of pure Enums, they're just singleton object instances, so they hash the same as any other object, which works pretty much as one would expect. In the case of TypeEnums, since they do compare equal to the underlying value, they must also hash the same. However hashes are just a "hint": Two objects can hash the same but not compare equal, in which case they will be considered to be different objects when used as dictionary keys, etc. There is actually an example of this in the examples.py I put up in github with the reference code. It creates a dictionary with two keys, which are two different Enums, which both use the same int value. Because they do not compare equal, they are treated as different keys by Python, as I think most people would expect.. --Alex

On Fri, Feb 22, 2013 at 3:16 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
From a practicality standpoint the question is: How likely is it to use different enum classes as keys?
Having the same hash value isn't the problem.
The problem is that black == 0 and scifi == 0. So when the hash values collide, it chains them and then uses == to compare the 0 to find the matching value in the table. To avoid this problem ensure that hash(enum) != hash(int(enum)) [or whatever the base type of the enum is]. Actually, I'm not sure that works without looking at the implementation of dict. When there are multiple values in a bucket, does it compare hash values first or does it jump right to ==? --- Bruce

On Fri, Feb 22, 2013 at 3:57 PM, Bruce Leban <bruce@leapyear.org> wrote:
To avoid this problem ensure that hash(enum) != hash(int(enum)) [or whatever the base type of the enum is].
Never mind. That's a bad idea. As others pointed out, if two objects compare equal, they have to have the same hash value. This weird behavior is a side effect of having non-transitive equality (a == b, b == c but a != c) and objects like dicts are not designed to work with objects with non-transitive equality. Not having transitivity when it's expected leads to weird behavior. For example, if you have non-transitive inequality (a < b < c < a, as is the case in PHP, for example) then sorting may not work and some sorting code can even get into infinite loops. --- Bruce Latest blot post: Alice's Puzzle Page http://www.vroospeak.com

On 02/22/2013 03:57 PM, Bruce Leban wrote:
True -- the problem is using two different enum classes with the same underlying int value as keys and then using ints instead of the enums to try to access the values.
If we do this, then a plain int won't be able to look up an enum key, which would mean that int enums are not drop-in replacements for int's currently be used in enum-type contexts. I'm inclined to leave the problem as is, though, unless somebody has a use case where they are using two different enum classes in a single dictionary?
Actually, I'm not sure that works without looking at the implementation of dict. When there are multiple values in a bucket, does it compare hash values first or does it jump right to ==?
dict uses hash first -- after all, it's a hash table. :) -- ~Ethan~

On Friday, February 22, 2013 4:41:00 PM UTC-8, stoneleaf wrote:
I'll admit the behavior in this case is a little unconventional, but to be honest I think it's something that most people would find made a reasonable amount of sense if they stopped and thought about it for a minute, so I don't think it's that horrible (and seems like it'd be pretty rare anyway).. In my opinion it's far less disturbing than some of the behavior we already have in the language for certain types of keys:
--Alex

On 02/22/2013 11:25 AM, Alex Stewart wrote:
Ok, so at the risk of muddying the waters even more, I've put together yet another possible way to do enums, and would be interested to hear comments..
Largely similar to my own implementation -- so of course I like it! :)
A few other notable properties:
* Enum values are hashable, so they can be used as dict keys, etc.
Problem here: should we have our enums hash the same as the underlying value? Consider: --> import yaenum --> class Color(yaenum.Enum): ... black ... red ... green ... blue ... --> class Literature(yaenum.Enum): ... scifi ... fantasy ... mystery ... pop ... --> Color.black Color('black', value=0) --> Literature.scifi Literature('scifi', value=0) --> black = Color.black --> scifi = Literature.scifi --> black == 0 True --> hash(black) 0 --> scifi == 0 True --> hash(scifi) 0 --> black == scifi False --> hash(0) 0 --> huh = dict() --> huh[black] = 9 --> huh {Color('black', value=0): 9} --> huh[0] 9 --> huh[scifi] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: Literature('scifi', value=0) --> huh[scifi] = 11 --> huh {Color('black', value=0): 9, Literature('scifi', value=0): 11} --> huh[0] 9 --> del huh[0] --> huh[0] 11 From a practicality standpoint the question is: How likely is it to use different enum classes as keys? -- ~Ethan~

On Fri, Feb 22, 2013 at 3:45 PM, Alexandre Zani <alexandre.zani@gmail.com>wrote:
Actually, not really. The only requirement in Python is that if two objects compare equal, they need to have the same hash value, but there is no requirement the other direction (indeed, that would be a problem in many cases as the domain of hash values is inherently much smaller than the domain of all possible object data, so it must be expected that you will get hash duplication from time to time, no matter what method you use for calculating the hashes). In the case of pure Enums, they're just singleton object instances, so they hash the same as any other object, which works pretty much as one would expect. In the case of TypeEnums, since they do compare equal to the underlying value, they must also hash the same. However hashes are just a "hint": Two objects can hash the same but not compare equal, in which case they will be considered to be different objects when used as dictionary keys, etc. There is actually an example of this in the examples.py I put up in github with the reference code. It creates a dictionary with two keys, which are two different Enums, which both use the same int value. Because they do not compare equal, they are treated as different keys by Python, as I think most people would expect.. --Alex

On Fri, Feb 22, 2013 at 3:16 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
From a practicality standpoint the question is: How likely is it to use different enum classes as keys?
Having the same hash value isn't the problem.
The problem is that black == 0 and scifi == 0. So when the hash values collide, it chains them and then uses == to compare the 0 to find the matching value in the table. To avoid this problem ensure that hash(enum) != hash(int(enum)) [or whatever the base type of the enum is]. Actually, I'm not sure that works without looking at the implementation of dict. When there are multiple values in a bucket, does it compare hash values first or does it jump right to ==? --- Bruce

On Fri, Feb 22, 2013 at 3:57 PM, Bruce Leban <bruce@leapyear.org> wrote:
To avoid this problem ensure that hash(enum) != hash(int(enum)) [or whatever the base type of the enum is].
Never mind. That's a bad idea. As others pointed out, if two objects compare equal, they have to have the same hash value. This weird behavior is a side effect of having non-transitive equality (a == b, b == c but a != c) and objects like dicts are not designed to work with objects with non-transitive equality. Not having transitivity when it's expected leads to weird behavior. For example, if you have non-transitive inequality (a < b < c < a, as is the case in PHP, for example) then sorting may not work and some sorting code can even get into infinite loops. --- Bruce Latest blot post: Alice's Puzzle Page http://www.vroospeak.com

On 02/22/2013 03:57 PM, Bruce Leban wrote:
True -- the problem is using two different enum classes with the same underlying int value as keys and then using ints instead of the enums to try to access the values.
If we do this, then a plain int won't be able to look up an enum key, which would mean that int enums are not drop-in replacements for int's currently be used in enum-type contexts. I'm inclined to leave the problem as is, though, unless somebody has a use case where they are using two different enum classes in a single dictionary?
Actually, I'm not sure that works without looking at the implementation of dict. When there are multiple values in a bucket, does it compare hash values first or does it jump right to ==?
dict uses hash first -- after all, it's a hash table. :) -- ~Ethan~

On Friday, February 22, 2013 4:41:00 PM UTC-8, stoneleaf wrote:
I'll admit the behavior in this case is a little unconventional, but to be honest I think it's something that most people would find made a reasonable amount of sense if they stopped and thought about it for a minute, so I don't think it's that horrible (and seems like it'd be pretty rare anyway).. In my opinion it's far less disturbing than some of the behavior we already have in the language for certain types of keys:
--Alex
participants (4)
-
Alex Stewart
-
Alexandre Zani
-
Bruce Leban
-
Ethan Furman