Re: NAN handling in statistics functions

On Mon, Aug 30, 2021 at 10:22 AM Stephen J. Turnbull < stephenjturnbull@gmail.com> wrote:
Seriously? you are arguing that Enums are better because they are self documenting, when you have to poke at a dunder to get the information ?!? I'm actually kind of surprised there isn't an obvious way to check that. 2. The class attribute __doc__ is treated specially: it does not
become an Enum member, and it is treated as you would expect by help().
OK, so this is one advantage (for the use case at hand) you can put the docs for what the NaNFlags are in teh Enum docstring, and then it is documented for all function that use that flag. However, this is helpful (and DRY) for the author of the package -- but I think less useful for the users of the package. I want to (in iPython) do: statistics.median? and see everything I need to know to use it, not get a reference to an Enum that I then need to go look up.
But what they do is create a burden of extra code to read and write.
Exactly -- this is a fair bit more awkward than: from statistics import median result = median(the_data, nans="omit") I suppose less so if you are using more than a couple calls to statistics functions, but how common is that? Python 3.11 Enums have the global_enum decorator, which injects the
hmm, I suppose I'd follow numpy tradition, and do: import statistics as st result = st.median(the_data, nans=st.OMIT) but that only makes sense for numpy because we tend to use a LOT of numpy once we are using any :-) But: if an ENUM is used, do please put all the flags in the statistics namespace. -CHB Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, Aug 30, 2021 at 10:37:28PM -0700, Christopher Barker wrote:
Seems pretty easy to me. It has to be a dunder, or a sunder at least, because regular non-underscore names are reserved for the enumerations themselves. How do you programmatically get the information about which encoding error handlers the `open` function takes? I don't know how to get them programmatically, but today I learned how to get them from the documentation: * Start with help(open) * Page down two pages. * import codecs * help(codecs.Codec) That gives you the predefined error handlers, assuming the docs are up to date. I still don't know how to find out what extra handlers have been installed.
Okay, so if the API is (say) this: def median(data, *, nans='ignore'): ... will iPython give you a list of all the other possible strings that are accepted? How does it know? -- Steve

Can someone explain why enum-vs-string is being discussed as if it is an either-or choice? Why not just call the enum class using the input so that you can supply a string or enum? NanChoice(nan_choice_input) I understand this would not be a really great choice for a flags enum or int enum, but for single-choice strings, it seems like a reasonable approach to me. In the example below from my own code I just call the enums using the input strings so that whether I decide to pass in a string, or import and use an actual enum object, it will work fine either way. from enum import Enum import pandas as pd class LoadType(Enum): dead = "D" live = "L" snow = "S" wind = "W" seismic = "E" D = "D" L = "L" S = "S" W = "W" E = "E" class RiskCategory(Enum): """Risk Category of Buildings and Other Structures for Flood, Wind, Snow, Earthquake, and Ice Loads. cf. Table 1.5-1 """ I = "I" II = "II" III = "III" IV = "IV" _Table_1_pt_5_dsh_2 = pd.DataFrame({LoadType.snow: [0.80, 1.00, 1.10, 1.20], LoadType.seismic: [1.00, 1.00, 1.25, 1.50]}, index=list(RiskCategory)) def importance_factor(load_type_input): return lambda risk_input: _Table_1_pt_5_dsh_2[LoadType(load_type_input)][RiskCategory(risk_input)] Maybe this idea sucks? I don't have any way of knowing since nobody reads my code other than future me (who happens to be somewhat dim), but in the past, I have used enums like this in my own code mostly as 1. a documentation feature so I can easily remember what the valid choices are, and 2. an easy error catcher since you get a nice exception when you supply an invalid string to the enum class:
Could this approach be useful for handling nans in the stats module? Maybe the overhead of calling the enum be considered too high or something? --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Tue, Aug 31, 2021 at 9:17 AM Ricky Teachey <ricky@teachey.org> wrote:
Whoops, the repl session was incorrect, apologies. Hopefully it was clear what I meant. Corrected version below.
--- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Mon, Aug 30, 2021 at 10:37:28PM -0700, Christopher Barker wrote:
Seems pretty easy to me. It has to be a dunder, or a sunder at least, because regular non-underscore names are reserved for the enumerations themselves. How do you programmatically get the information about which encoding error handlers the `open` function takes? I don't know how to get them programmatically, but today I learned how to get them from the documentation: * Start with help(open) * Page down two pages. * import codecs * help(codecs.Codec) That gives you the predefined error handlers, assuming the docs are up to date. I still don't know how to find out what extra handlers have been installed.
Okay, so if the API is (say) this: def median(data, *, nans='ignore'): ... will iPython give you a list of all the other possible strings that are accepted? How does it know? -- Steve

Can someone explain why enum-vs-string is being discussed as if it is an either-or choice? Why not just call the enum class using the input so that you can supply a string or enum? NanChoice(nan_choice_input) I understand this would not be a really great choice for a flags enum or int enum, but for single-choice strings, it seems like a reasonable approach to me. In the example below from my own code I just call the enums using the input strings so that whether I decide to pass in a string, or import and use an actual enum object, it will work fine either way. from enum import Enum import pandas as pd class LoadType(Enum): dead = "D" live = "L" snow = "S" wind = "W" seismic = "E" D = "D" L = "L" S = "S" W = "W" E = "E" class RiskCategory(Enum): """Risk Category of Buildings and Other Structures for Flood, Wind, Snow, Earthquake, and Ice Loads. cf. Table 1.5-1 """ I = "I" II = "II" III = "III" IV = "IV" _Table_1_pt_5_dsh_2 = pd.DataFrame({LoadType.snow: [0.80, 1.00, 1.10, 1.20], LoadType.seismic: [1.00, 1.00, 1.25, 1.50]}, index=list(RiskCategory)) def importance_factor(load_type_input): return lambda risk_input: _Table_1_pt_5_dsh_2[LoadType(load_type_input)][RiskCategory(risk_input)] Maybe this idea sucks? I don't have any way of knowing since nobody reads my code other than future me (who happens to be somewhat dim), but in the past, I have used enums like this in my own code mostly as 1. a documentation feature so I can easily remember what the valid choices are, and 2. an easy error catcher since you get a nice exception when you supply an invalid string to the enum class:
Could this approach be useful for handling nans in the stats module? Maybe the overhead of calling the enum be considered too high or something? --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Tue, Aug 31, 2021 at 9:17 AM Ricky Teachey <ricky@teachey.org> wrote:
Whoops, the repl session was incorrect, apologies. Hopefully it was clear what I meant. Corrected version below.
--- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
participants (4)
-
Chris Angelico
-
Christopher Barker
-
Ricky Teachey
-
Steven D'Aprano