Change magic strings to enums

A bit ago I was reading some of the python docs ( https://docs.python.org/3.6/library/warnings.html ), the warning module, and I noticed a table of magic strings. I can think of a few other places where magic strings are used - for example, string encoding/decoding locales and strictness, and probably a number of other places. Since Python 3.4, We've been having Enums. Wouldn't it be cleaner to use enums by default instead of those magic strings ? for example, for warnings filter actions, (section 29.5.2), quite near the top of the page. You could declare in the warnings module: class FilterAction(Enum): Error = 'error' Ignore = 'ignore' Always = 'always' Default = 'default' Module = 'module' Once = 'once' And put in the docs that people should use enums. For as long as a transition period would last, any entrypoint into the module where currently the magic string is used, you could transform it with the single line: action = FilterAction(action) # transforms from old magic string to shiny new enum member Then whenever enough people have shifted/4.0 comes around, the string argument version can be deprecated. Pro's - no magic strings - more clarity - easier to get all possible values with for example type checking editors (because of the type annotation) Con's : - implementation effort (as long as the modules are in pure Python, I could perhaps help. Im not to confident about my C skills tho) Backwards compatibility wouldn't be an issue because of the easy transformation from the old string as long as we use those string as the enum values. ofc, precise names of enums/members is up for debate. I personally prefer the above version to ALLCAPS, but that might be my comparitive lack of C experience. These named constants are generally done in all caps, but they're also often because of lack of a simple enum class. I tried to search for this, but couldn't find any discussion about it. Apologies if this has been rejected before. Jacco

On 24 April 2018 at 22:52, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
"It's cleaner" isn't a user problem though. The main justification for using enums is that they're easier to interpret in log messages and expection tracebacks than opaque constants, and that argument is much weaker for well-chosen string constants than it is for other constants (like the numeric constants in the socket and errno modules). For backwards compatibility reasons, we'd want to keep accepting the plain string versions anyway (implicitly converting them to their enum counterparts). At a less philosophical level, many of the cases where we use magic strings are in code that has to work even when the import system isn't working yet - that's relatively straightforward to achieve when the code is only relying on strings with particular contents, but *much* harder if they're relying on a higher level type like enum objects. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I 2018-04-24 15:58 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
I guess we could add inconsistency as a con, then, since if the import system isn't working at places where you'd like to use the Enums (or even executing python code ?). This would mean that to the casual observer, it'd be arbitrary where they could be used instead. I wonder how many of these would be in places used by most people, though. I don't mind putting in some time to figure it out, but I have no idea where to start. Is there any easily searchable place where I could scan the standard library for occurences of magic strings ?

On Wed, Apr 25, 2018 at 01:18:10AM +1000, Chris Angelico wrote:
It shouldn't be self-evident, because the use of strings in the warnings module doesn't match the most common accepted meaning of magic strings. https://en.wikipedia.org/wiki/Magic_string Possibly Jacco was thinking of "magic constants": https://en.wikipedia.org/wiki/Magic_number_%28programming%29#Unnamed_numeric... (although in this case, they're text constants, not numerical). But this seems like a fairly benign example to my eyes: the strings aren't likely to change their values. As discussed here: https://softwareengineering.stackexchange.com/questions/221034/usage-of-magi... not all uses of literals are harmful. A fairly lightweight change would be to add named constants to the warning module: ERROR = 'error' etc, and refactor the module to use the named constants instead of hard-coded strings. I'm surprised that it doesn't already do that. That would be 100% backwards compatible, without the cost of importing and creating enums, but allow consumers of the warnings module to inspect it and import symbolic names if they so choose: from warnings import ERROR By the way, I notice that the warnings module makes heavy use of assert to check user-supplied input. That's dangerous since asserts can be disabled, and also poor API design: it means that even if the asserts trigger on error (which isn't guaranteed), they raise the wrong kind of exception: AssertionError instead of TypeError for bad types or ValueError for bad values. -- Steve

On Wed, Apr 25, 2018 at 3:19 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I assumed this to be the case, yes. Many people decry "magic numbers" where the number 7 means one thing, and the number 83 means something else; but I'm less convinced that the textual equivalent is as problematic. (If not "magic strings", what would you call them?)
Yeah, that would be one way to do it. But I'd still like to know what problems are being solved by this, as a means of determining whether they're being solved adequately. Is it the risk of misspellings? Because that can happen just as easily with the imported name as with a string literal. (The from-import pollutes your namespace, and "warnings.EROR" is a run-time failure just as much as a misspelled string literal would be.) Is it the need to list all the possible strings? That could be done with something like __future__.all_feature_names or the way the logging module translates level names into numbers. Something else? ChrisA

On 25 April 2018 at 01:06, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
Running './python -X importtime -Wd -c "pass"' with Python 3.7 gives a pretty decent list of the parts of the standard library that constitute the low level core that we try to keep independent of everything else (there's a slightly smaller core that omits the warning module and it's dependencies - leaving "-Wd" off the command line will give that list).
I wonder how many of these would be in places used by most people, though.
Searching the documentation for :data: fields, and then checking those to see which ones had already been converted to enums would likely be your best bet. You wouldn't be able to get a blanket approval for "Let's convert all the magic strings to Enums" though - you'd need to make the case that each addition of a new Enum provided a genuine API improvement for the affected module (e.g. I suspect a plausible case could potentially be made for converting some of the inspect module state introspection APIs over to StringEnum, so it was easier to iterate over the valid states in a consistent way, but even there I'd need to see a concrete proposal before I made up my mind). Making the case for IntEnum usage tends to be much easier, simply due to the runime introspection benefits that it brings. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 04/24/2018 10:32 AM, Antoine Pitrou wrote:
Also beware the import time cost of having a widely-used module like "warnings" depend on the "enum" module and its own dependencies.
With all the recent changes to Python, I should go through and see which dependencies are no longer needed. -- ~Ethan~

On 25 April 2018 at 04:56, Ethan Furman <ethan@stoneleaf.us> wrote:
I was checking this with "./python -X importtime -c 'import enum'", and the overall import time was around 9 ms with a cold disk cache, and 2 ms with a warm one. In both cases, importing "types" and "_collections" accounted for around a 3rd of the time, with the bulk of the execution time being enum's own module level code. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

One of my main reasons would be the type-checking from tools like Pycharm, which is the one I use. If I don't remember the exact strings, I won't have to switch to my browser to look up the documentation, but instead I type the enum name, and the typechecker will give me the members with correct spelling - all I need to remember is a vague idea of what option did what. The option names will be reminders instead of the thing to remember. Perhaps the string encode/decode would be a better case, tho. Is it latin 1 or latin-1 ? utf-8 or UTF-8 ? They might be fast to look up if you know where to look (probably the top result of googling "python string encoding utf 8", and it's the second and first option respectively IIRC. But I shouldn't -have- to recall correctly), but it's still a lot faster if you can type "Encoding.U" and it gives you the option. I'll go and see if I can make a small list of modules using these kind of strings that aren't of the essential core when I get home this evening. My apologies if magic strings isn't the correct word. Despite that, I believe everyone knows what I intend to say.

On Wed, Apr 25, 2018 at 6:06 PM, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
There are so many encodings that I don't think an enum would be practical. Also, their canonical names are not valid identifiers, so you would have to futz around just as much - is it Encoding.ISO_8859_1 or Encoding.ISO88591 or something else? Perhaps an alternative tool in PyCharm is the solution. There's no reason that you can't have tab completion inside string literals; imagine, for instance, if >> open("/usr/lo << could tab-complete "/usr/local" and let you fill in a valid path name from your file system. Tab-completing a set of common encodings would be reasonably easy. Tab-completing a set of constant strings for the warnings module, even easier. Maybe there could be a way to store this info on a function object, and then PyCharm just has to follow instructions ("this arg of this function uses these strings")? Possibly as a type annotation, even - instead of saying "this takes a string", it can say "this takes a string drawn from these options"? The strings themselves don't have to be any different. ChrisA

2018-04-25 10:30 GMT+02:00 Chris Angelico <rosuav@gmail.com>:
Which is where the auto-completion comes in. type Encoding.IS - and you'd have a list of options. If the function is annotated with the type of the enum, it'll even suggest you that. I'll freely admit that the amount of encodings might make this a bad idea. On the other hand....it might make it a good idea as well. See a list of possibilities, and the IDE can filter it as you type. IIRC tho, you can add encodings at runtime, while you can't add Enum members. If you actually can, than this might be unsuitable for Enum solution. (looking at the docs 7.2:codecs though, the error handling strings look...tasty ;p)
But where would you get a list of these strings, and how'd you define that list of strings ? It's currently commonly accepted that annotations are for types, and the later PEP's about the subject seem to assume this without question. Another annotation-like functionality ? hardcode the list of possibilities inside pycharm ? Also, perhaps I want to select the string first, store in into a variable, then pass that variable into a function. How would any checker ever know what im going to feed that string into ? But an Enum, I can use that as type annotation myself and then it'll know without question which are legal arguments.
Pycharm doesn't execute your code - it scans it. It wont know what you store on a function object. As whether type annotations can mean "a string from either of these options" would require a currently not known to me interpretation of type annotations - and probably incompatible with what most people currenlty use them for. Forgive me if I misunderstand you, but aren't you really just trying to use those strings as enum members when you define a function like "takes one of these strings as argument" ? Because as far as I know, except from some fluff, that's exactly what enums are and are intended for - a unique set of keys that all have special meaning.

On Wed, Apr 25, 2018 at 7:12 PM, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
Pycharm doesn't execute your code - it scans it. It wont know what you store on a function object.
How does it currently know what type something is? If you define something using an enum, how is PyCharm going to know what the members of that enum are?
That's *one of* the things you can do with an enum. There are many other features of enumerations, and I'm trying to boil your idea down to its most compact form. You don't need all the power of Enum - you just want to be able to list off the valid options. ChrisA

On Wed, Apr 25, 2018, 02:13 Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
... Which is where the auto-completion comes in. ...
Designing the language with auto-complete in mind feels wrong to me. It assumes a very sophisticated IDE and may lead to lazy design compromises. --Guido

Even if not just for the autocompletion, it would be more explicit that it's not just a random string like you'd pass to print(), but it has a specific meaning. Something in PEP 20 about explicit and implicit ? Autocompletion might be a good advantage, but 1) the IDE would need to know what to autocomplete it to, and you probably shouldn't special-case the stdlib like you'd need to with strings, and 2) enums -are- more explicit. When there's a distinct and limited set of options, they're just the tool for the job. (or at least, a much better tool for this job than to remember colors, which is used all over their documentation). Im naming auto-completion here as a solid reason, but when clean code itself can be considered a solid reason, I think that's probably the real reason. 2018-04-26 1:11 GMT+02:00 Soni L. <fakedme+py@gmail.com>:

On Wed, Apr 25, 2018 at 10:06:56AM +0200, Jacco van Dorp wrote:
Perhaps the string encode/decode would be a better case, tho. Is it latin 1 or latin-1 ? utf-8 or UTF-8 ?
py> 'abc'.encode('latin 1') == 'abc'.encode('LATIN-1') True py> 'abc'.encode('utf8') == 'abc'.encode('UTF 8') == 'abc'.encode('UtF_8') True Encoding names are normalised before being used.
If you did this with Encodings.ISO you would get a couple of dozen possibilities. ISO-8859-1 ISO-8859-7 ISO-8859-14 ISO-8859-15 etc, just to pick a few at random. How do you know which one you want? In general, there's not really much *practical* use-case for code completion on encodings, aside from just exploratory mucking about in the interactive interpreter. There are too many codecs (multiple dozen), the names are too similar and not self-explanatory, and they can have aliases. It would be like doing code-completion on an object and getting a couple of dozen methods looking like method1245 method1246 method1247 method2390 method2395 Besides, aside from UTF-16, UTF-8 and ASCII, we shouldn't encourage the use of most codecs except for legacy data. And when working with legacy data, we really need to know ahead of time what the encoding is, and declare it as constant or application option. (Or, worst case, we've used chardet or another encoding guesser, and stored the name of the encoding in a variable.) I don't really see a big advantage aside from laziness for completing on encodings. And while laziness is a virtue in programmers, that only goes so far before it becomes silly. Having to type import encodings enc <tab> .Enc <tab> .u <tab> arrow arrow arrow arrow arrow arrow enter (19 key presses, plus the import) to save from having to type 'utf8' (six keypresses) is not what I would call efficient use of programmer time and effort. (Why so many arrows? Since you'll have to tab past at least utf16 utf16be utf16le utf32 utf32be utf32le utf7 before you get to utf8.) But the biggest problem is that they aren't currently available for introspection anywhere. You can register new codecs, but there's no API for querying the list of currently registered codecs or their aliases. I think that problem would need to be solved first, in which case code completion will then be either easy, or irrelevant. (I'd be perfectly satisfied with an API I could call from the interactive interpreter.) -- Steve

25.04.18 10:57, Ivan Levkivskyi пише:
Creating a new function is very cheap -- just around 50 ns on my computer. Creating a new class is over two orders costly -- around 7 us for an empty class on my computer. Creating a new Enum class is much more costly -- around 40 us for an empty class (or 50 us for IntEnum) plus 7 us per member. Creating a new namedtuple type has the same cost as creating Enum class. It was much costly before 3.7. Thus creating a typical Enum class with 3-5 members is like creating 10 normal classes. Not much modules have 10 classes.

On 25 April 2018 at 11:03, Serhiy Storchaka <storchaka@gmail.com> wrote:
Hm, this is what I wanted to know. I think by rewriting EnumMeta in C we can reduce the creation time of an Enum class (almost) down to the creation time of a normal class, which may be a 4-5x speed-up. What do you think? -- Ivan

On 25 April 2018 at 12:01, Serhiy Storchaka <storchaka@gmail.com> wrote:
I think we can do something similar to ABCMeta, i.e. the metaclass itself will stay defined in Python, but the "hottest" parts of its methods will be replaced with helper functions written in C. This way we can limit complexity of the C code while still getting almost all the performance benefits. -- Ivan

-- Ivan
Adding C speedup module has high bar, especially performance gain is only on startup time. Personally speaking, I want speedup module for `re.compile` than enum. In case of enum, I feel CONSTANT = "magic" is enough for most cases. re doesn't have such workaround, and using third party regular expression library has very high bar, especially when using it in stdlib. But there were some -1 on adding speedup module for re. I think it's same to enum. So I think we should: * Don't use enum blindly; use it only when it's very far better than CONST = "magic". * Add faster API which bypass some slow part, especially for IntEnum.convert() or IntFlag.convert() in socket module. Regards, -- INADA Naoki <songofacandy@gmail.com>

I'm kind of curious why everyone here seems to want to use IntFlags and other mixins. The docs themselves say that their use should be minimized, and tbh I agree with them. Backwards compatiblity can be maintained by allowing the old value and internally converting it to the enum. Combinability is inherent to enum.Flags. There'd be no real reason to use mixins as far as I can see ? 2018-04-26 10:52 GMT+02:00 INADA Naoki <songofacandy@gmail.com>:

On 26 April 2018 at 19:37, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
Serialisation formats are a good concrete example of how problems can arise by switching out concrete types on people:
The mixin variants basically say "If you run into code that doesn't natively understand enums, act like an instance of this type". Since most of the standard library has been around for years, and sometimes even decades, we tend to face a *lot* of backwards compatibility requirements along those lines. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2018-04-26 15:26 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
However, as the docs not, they will be comparable with each other, which should throw an error. Since this is only a problem for this case when serializing (since the functions would allow the old str arguments for probably ever), shouldn't this be something caught when you upgrade the version you run your script under ? It's also a rather simple fix: json.dumps(Enum.a.value) would work just fine. Now im aware that most people don't have 100% test coverage and such. I also rather lack the amount of experience you guys have. Guess im just a bit behind on the practicality beats purity here :) Jacco

On 24 April 2018 at 22:52, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
"It's cleaner" isn't a user problem though. The main justification for using enums is that they're easier to interpret in log messages and expection tracebacks than opaque constants, and that argument is much weaker for well-chosen string constants than it is for other constants (like the numeric constants in the socket and errno modules). For backwards compatibility reasons, we'd want to keep accepting the plain string versions anyway (implicitly converting them to their enum counterparts). At a less philosophical level, many of the cases where we use magic strings are in code that has to work even when the import system isn't working yet - that's relatively straightforward to achieve when the code is only relying on strings with particular contents, but *much* harder if they're relying on a higher level type like enum objects. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I 2018-04-24 15:58 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
I guess we could add inconsistency as a con, then, since if the import system isn't working at places where you'd like to use the Enums (or even executing python code ?). This would mean that to the casual observer, it'd be arbitrary where they could be used instead. I wonder how many of these would be in places used by most people, though. I don't mind putting in some time to figure it out, but I have no idea where to start. Is there any easily searchable place where I could scan the standard library for occurences of magic strings ?

On Wed, Apr 25, 2018 at 01:18:10AM +1000, Chris Angelico wrote:
It shouldn't be self-evident, because the use of strings in the warnings module doesn't match the most common accepted meaning of magic strings. https://en.wikipedia.org/wiki/Magic_string Possibly Jacco was thinking of "magic constants": https://en.wikipedia.org/wiki/Magic_number_%28programming%29#Unnamed_numeric... (although in this case, they're text constants, not numerical). But this seems like a fairly benign example to my eyes: the strings aren't likely to change their values. As discussed here: https://softwareengineering.stackexchange.com/questions/221034/usage-of-magi... not all uses of literals are harmful. A fairly lightweight change would be to add named constants to the warning module: ERROR = 'error' etc, and refactor the module to use the named constants instead of hard-coded strings. I'm surprised that it doesn't already do that. That would be 100% backwards compatible, without the cost of importing and creating enums, but allow consumers of the warnings module to inspect it and import symbolic names if they so choose: from warnings import ERROR By the way, I notice that the warnings module makes heavy use of assert to check user-supplied input. That's dangerous since asserts can be disabled, and also poor API design: it means that even if the asserts trigger on error (which isn't guaranteed), they raise the wrong kind of exception: AssertionError instead of TypeError for bad types or ValueError for bad values. -- Steve

On Wed, Apr 25, 2018 at 3:19 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I assumed this to be the case, yes. Many people decry "magic numbers" where the number 7 means one thing, and the number 83 means something else; but I'm less convinced that the textual equivalent is as problematic. (If not "magic strings", what would you call them?)
Yeah, that would be one way to do it. But I'd still like to know what problems are being solved by this, as a means of determining whether they're being solved adequately. Is it the risk of misspellings? Because that can happen just as easily with the imported name as with a string literal. (The from-import pollutes your namespace, and "warnings.EROR" is a run-time failure just as much as a misspelled string literal would be.) Is it the need to list all the possible strings? That could be done with something like __future__.all_feature_names or the way the logging module translates level names into numbers. Something else? ChrisA

On 25 April 2018 at 01:06, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
Running './python -X importtime -Wd -c "pass"' with Python 3.7 gives a pretty decent list of the parts of the standard library that constitute the low level core that we try to keep independent of everything else (there's a slightly smaller core that omits the warning module and it's dependencies - leaving "-Wd" off the command line will give that list).
I wonder how many of these would be in places used by most people, though.
Searching the documentation for :data: fields, and then checking those to see which ones had already been converted to enums would likely be your best bet. You wouldn't be able to get a blanket approval for "Let's convert all the magic strings to Enums" though - you'd need to make the case that each addition of a new Enum provided a genuine API improvement for the affected module (e.g. I suspect a plausible case could potentially be made for converting some of the inspect module state introspection APIs over to StringEnum, so it was easier to iterate over the valid states in a consistent way, but even there I'd need to see a concrete proposal before I made up my mind). Making the case for IntEnum usage tends to be much easier, simply due to the runime introspection benefits that it brings. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 04/24/2018 10:32 AM, Antoine Pitrou wrote:
Also beware the import time cost of having a widely-used module like "warnings" depend on the "enum" module and its own dependencies.
With all the recent changes to Python, I should go through and see which dependencies are no longer needed. -- ~Ethan~

On 25 April 2018 at 04:56, Ethan Furman <ethan@stoneleaf.us> wrote:
I was checking this with "./python -X importtime -c 'import enum'", and the overall import time was around 9 ms with a cold disk cache, and 2 ms with a warm one. In both cases, importing "types" and "_collections" accounted for around a 3rd of the time, with the bulk of the execution time being enum's own module level code. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

One of my main reasons would be the type-checking from tools like Pycharm, which is the one I use. If I don't remember the exact strings, I won't have to switch to my browser to look up the documentation, but instead I type the enum name, and the typechecker will give me the members with correct spelling - all I need to remember is a vague idea of what option did what. The option names will be reminders instead of the thing to remember. Perhaps the string encode/decode would be a better case, tho. Is it latin 1 or latin-1 ? utf-8 or UTF-8 ? They might be fast to look up if you know where to look (probably the top result of googling "python string encoding utf 8", and it's the second and first option respectively IIRC. But I shouldn't -have- to recall correctly), but it's still a lot faster if you can type "Encoding.U" and it gives you the option. I'll go and see if I can make a small list of modules using these kind of strings that aren't of the essential core when I get home this evening. My apologies if magic strings isn't the correct word. Despite that, I believe everyone knows what I intend to say.

On Wed, Apr 25, 2018 at 6:06 PM, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
There are so many encodings that I don't think an enum would be practical. Also, their canonical names are not valid identifiers, so you would have to futz around just as much - is it Encoding.ISO_8859_1 or Encoding.ISO88591 or something else? Perhaps an alternative tool in PyCharm is the solution. There's no reason that you can't have tab completion inside string literals; imagine, for instance, if >> open("/usr/lo << could tab-complete "/usr/local" and let you fill in a valid path name from your file system. Tab-completing a set of common encodings would be reasonably easy. Tab-completing a set of constant strings for the warnings module, even easier. Maybe there could be a way to store this info on a function object, and then PyCharm just has to follow instructions ("this arg of this function uses these strings")? Possibly as a type annotation, even - instead of saying "this takes a string", it can say "this takes a string drawn from these options"? The strings themselves don't have to be any different. ChrisA

2018-04-25 10:30 GMT+02:00 Chris Angelico <rosuav@gmail.com>:
Which is where the auto-completion comes in. type Encoding.IS - and you'd have a list of options. If the function is annotated with the type of the enum, it'll even suggest you that. I'll freely admit that the amount of encodings might make this a bad idea. On the other hand....it might make it a good idea as well. See a list of possibilities, and the IDE can filter it as you type. IIRC tho, you can add encodings at runtime, while you can't add Enum members. If you actually can, than this might be unsuitable for Enum solution. (looking at the docs 7.2:codecs though, the error handling strings look...tasty ;p)
But where would you get a list of these strings, and how'd you define that list of strings ? It's currently commonly accepted that annotations are for types, and the later PEP's about the subject seem to assume this without question. Another annotation-like functionality ? hardcode the list of possibilities inside pycharm ? Also, perhaps I want to select the string first, store in into a variable, then pass that variable into a function. How would any checker ever know what im going to feed that string into ? But an Enum, I can use that as type annotation myself and then it'll know without question which are legal arguments.
Pycharm doesn't execute your code - it scans it. It wont know what you store on a function object. As whether type annotations can mean "a string from either of these options" would require a currently not known to me interpretation of type annotations - and probably incompatible with what most people currenlty use them for. Forgive me if I misunderstand you, but aren't you really just trying to use those strings as enum members when you define a function like "takes one of these strings as argument" ? Because as far as I know, except from some fluff, that's exactly what enums are and are intended for - a unique set of keys that all have special meaning.

On Wed, Apr 25, 2018 at 7:12 PM, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
Pycharm doesn't execute your code - it scans it. It wont know what you store on a function object.
How does it currently know what type something is? If you define something using an enum, how is PyCharm going to know what the members of that enum are?
That's *one of* the things you can do with an enum. There are many other features of enumerations, and I'm trying to boil your idea down to its most compact form. You don't need all the power of Enum - you just want to be able to list off the valid options. ChrisA

On Wed, Apr 25, 2018, 02:13 Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
... Which is where the auto-completion comes in. ...
Designing the language with auto-complete in mind feels wrong to me. It assumes a very sophisticated IDE and may lead to lazy design compromises. --Guido

Even if not just for the autocompletion, it would be more explicit that it's not just a random string like you'd pass to print(), but it has a specific meaning. Something in PEP 20 about explicit and implicit ? Autocompletion might be a good advantage, but 1) the IDE would need to know what to autocomplete it to, and you probably shouldn't special-case the stdlib like you'd need to with strings, and 2) enums -are- more explicit. When there's a distinct and limited set of options, they're just the tool for the job. (or at least, a much better tool for this job than to remember colors, which is used all over their documentation). Im naming auto-completion here as a solid reason, but when clean code itself can be considered a solid reason, I think that's probably the real reason. 2018-04-26 1:11 GMT+02:00 Soni L. <fakedme+py@gmail.com>:

On Wed, Apr 25, 2018 at 10:06:56AM +0200, Jacco van Dorp wrote:
Perhaps the string encode/decode would be a better case, tho. Is it latin 1 or latin-1 ? utf-8 or UTF-8 ?
py> 'abc'.encode('latin 1') == 'abc'.encode('LATIN-1') True py> 'abc'.encode('utf8') == 'abc'.encode('UTF 8') == 'abc'.encode('UtF_8') True Encoding names are normalised before being used.
If you did this with Encodings.ISO you would get a couple of dozen possibilities. ISO-8859-1 ISO-8859-7 ISO-8859-14 ISO-8859-15 etc, just to pick a few at random. How do you know which one you want? In general, there's not really much *practical* use-case for code completion on encodings, aside from just exploratory mucking about in the interactive interpreter. There are too many codecs (multiple dozen), the names are too similar and not self-explanatory, and they can have aliases. It would be like doing code-completion on an object and getting a couple of dozen methods looking like method1245 method1246 method1247 method2390 method2395 Besides, aside from UTF-16, UTF-8 and ASCII, we shouldn't encourage the use of most codecs except for legacy data. And when working with legacy data, we really need to know ahead of time what the encoding is, and declare it as constant or application option. (Or, worst case, we've used chardet or another encoding guesser, and stored the name of the encoding in a variable.) I don't really see a big advantage aside from laziness for completing on encodings. And while laziness is a virtue in programmers, that only goes so far before it becomes silly. Having to type import encodings enc <tab> .Enc <tab> .u <tab> arrow arrow arrow arrow arrow arrow enter (19 key presses, plus the import) to save from having to type 'utf8' (six keypresses) is not what I would call efficient use of programmer time and effort. (Why so many arrows? Since you'll have to tab past at least utf16 utf16be utf16le utf32 utf32be utf32le utf7 before you get to utf8.) But the biggest problem is that they aren't currently available for introspection anywhere. You can register new codecs, but there's no API for querying the list of currently registered codecs or their aliases. I think that problem would need to be solved first, in which case code completion will then be either easy, or irrelevant. (I'd be perfectly satisfied with an API I could call from the interactive interpreter.) -- Steve

25.04.18 10:57, Ivan Levkivskyi пише:
Creating a new function is very cheap -- just around 50 ns on my computer. Creating a new class is over two orders costly -- around 7 us for an empty class on my computer. Creating a new Enum class is much more costly -- around 40 us for an empty class (or 50 us for IntEnum) plus 7 us per member. Creating a new namedtuple type has the same cost as creating Enum class. It was much costly before 3.7. Thus creating a typical Enum class with 3-5 members is like creating 10 normal classes. Not much modules have 10 classes.

On 25 April 2018 at 11:03, Serhiy Storchaka <storchaka@gmail.com> wrote:
Hm, this is what I wanted to know. I think by rewriting EnumMeta in C we can reduce the creation time of an Enum class (almost) down to the creation time of a normal class, which may be a 4-5x speed-up. What do you think? -- Ivan

On 25 April 2018 at 12:01, Serhiy Storchaka <storchaka@gmail.com> wrote:
I think we can do something similar to ABCMeta, i.e. the metaclass itself will stay defined in Python, but the "hottest" parts of its methods will be replaced with helper functions written in C. This way we can limit complexity of the C code while still getting almost all the performance benefits. -- Ivan

-- Ivan
Adding C speedup module has high bar, especially performance gain is only on startup time. Personally speaking, I want speedup module for `re.compile` than enum. In case of enum, I feel CONSTANT = "magic" is enough for most cases. re doesn't have such workaround, and using third party regular expression library has very high bar, especially when using it in stdlib. But there were some -1 on adding speedup module for re. I think it's same to enum. So I think we should: * Don't use enum blindly; use it only when it's very far better than CONST = "magic". * Add faster API which bypass some slow part, especially for IntEnum.convert() or IntFlag.convert() in socket module. Regards, -- INADA Naoki <songofacandy@gmail.com>

I'm kind of curious why everyone here seems to want to use IntFlags and other mixins. The docs themselves say that their use should be minimized, and tbh I agree with them. Backwards compatiblity can be maintained by allowing the old value and internally converting it to the enum. Combinability is inherent to enum.Flags. There'd be no real reason to use mixins as far as I can see ? 2018-04-26 10:52 GMT+02:00 INADA Naoki <songofacandy@gmail.com>:

On 26 April 2018 at 19:37, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
Serialisation formats are a good concrete example of how problems can arise by switching out concrete types on people:
The mixin variants basically say "If you run into code that doesn't natively understand enums, act like an instance of this type". Since most of the standard library has been around for years, and sometimes even decades, we tend to face a *lot* of backwards compatibility requirements along those lines. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2018-04-26 15:26 GMT+02:00 Nick Coghlan <ncoghlan@gmail.com>:
However, as the docs not, they will be comparable with each other, which should throw an error. Since this is only a problem for this case when serializing (since the functions would allow the old str arguments for probably ever), shouldn't this be something caught when you upgrade the version you run your script under ? It's also a rather simple fix: json.dumps(Enum.a.value) would work just fine. Now im aware that most people don't have 100% test coverage and such. I also rather lack the amount of experience you guys have. Guess im just a bit behind on the practicality beats purity here :) Jacco
participants (11)
-
Antoine Pitrou
-
Chris Angelico
-
Ethan Furman
-
Guido van Rossum
-
INADA Naoki
-
Ivan Levkivskyi
-
Jacco van Dorp
-
Nick Coghlan
-
Serhiy Storchaka
-
Soni L.
-
Steven D'Aprano