On Mon, 20 Jun 2022 at 05:02, David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
On Sun, Jun 19, 2022 at 2:24 PM Chris Angelico <rosuav@gmail.com> wrote:
def frobnicate(data, verbose=os.environ.get('LEVEL')==loglevel.DEBUG): ...
Is there any value in not putting that into a global constant?
Probably not. I was just inventing an ad hoc example to show what I meant. I didn't search any actual repos I work on for real-life examples.
Ah okay. Well, if that WERE a real example, I would recommend giving it a name. (Also, it's probably going to end up using >= rather than ==, so that the verbosity of any function can be set to a minimum level, so there'd be more complexity, thus making it even more useful to make it some sort of constant.)
Regardless, the @ operator is now available *everywhere* in Python. Does it quadratically increase cognitive load?
Yeah, probably about that much. Other than NumPy or closely related array libraries, I don't know that many other uses. I think I saw something on PyPI that used it as an email thing, where obviously it has some familiarity. But in that case, the lines it occurs on probably have no more than one or two other sigils.
In the numeric stuff, if I have:
newarray = (A @ B) | (C / D) + (E - F)
That's @, |, /, +, and -. So 5 operators, and 25 "complexity points". If I added one more operator, 36 "complexity points" seems reasonable. And if I removed one of those operators, 16 "complexity points" feels about right.
For my part, I would say that it's quite the opposite. This is three parenthesized tokens, each of which contains two things combined in a particular way. That's six 'things' combined in particular ways. Cognitive load is very close to this version: newarray = (A * B) + (C * D) + (E * F) even though this uses a mere two operators. It's slightly more, but not multiplicatively so. (The exact number of "complexity points" will depend on what A through F represent, but the difference between "all multiplying and adding" and "five distinct operators" is only about three points.) So unless you have a study showing this, I would say we each have a single data point - ourselves - and it's basically useless data.
In a function signature "def bisect(stuff, lo=0, hi=None)", you don't know what the hi value actually defaults to. Even if it's obvious that it is late-bound
Sure, knowing what `hi` defaults to *could be useful*. I'm sure if I used that function I would often want to know... and also often just assume the default is "something sensible." I just don't think that "could be useful" as a benefit is nearly as valuable as the cost of a new sigil and a new semantics adding to the cognitive load of Python.
Yes, but "something sensible" could be "len(stuff)", "len(stuff)-1", or various other things. Knowing exactly which of those will tell you exactly how to use the function. Would you say that knowing that lo defaults to 0 is useful information? You could just have a function signature that merely says which arguments are mandatory and which are optional, and force people to use the documentation to determine the behaviour of omitted arguments. If you accept that showing "lo=0" gives useful information beyond simply that lo is optional, then is it so hard to accept that "hi=>len(stuff)" is also immensely valuable?
For example, it also "could be useful" to have syntax that indicated the (expected) big-O complexity of that function. But whatever that syntax was, I really doubt it would be worth the extra complexity in the language vs. just putting that info in the docstring.
That's true; there's always a lot more that could go into a function's docstring than can fit into its signature. Perhaps, if it's of value to your project, it would be useful to use a function decorator and then redefine the return value annotation to (also or instead) inform you of the complexity. But for information about a single argument, the only useful place to put it is on the argument itself - either in the signature, or in a duplicated block in the docstring. And function defaults are a lot broader in value than algorithmic complexity, which is irrelevant to a huge number of functions.
Let's look at a function that has a lot of late-bound default arguments:
pd.read_csv( filepath_or_buffer: 'FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str]', sep=<no_default>, delimiter=None, header='infer', names=<no_default>, index_col=None, usecols=None, squeeze=None, prefix=<no_default>, mangle_dupe_cols=True, dtype: 'DtypeArg | None' = None, engine: 'CSVEngine | None' = None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=None, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression: 'CompressionOptions' = 'infer', thousands=None, decimal: 'str' = '.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, encoding_errors: 'str | None' = 'strict', dialect=None, error_bad_lines=None, warn_bad_lines=None, on_bad_lines=None, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None, storage_options: 'StorageOptions' = None,
I'd have to look through the implementation, but my guess is that quite a few of the 25 late-bound defaults require calculations to set that take more than one line of code. I really don't WANT to know more than "this parameter is calculated according to some logic, perhaps complex logic" ... well, unless I think it pertains to something I genuinely want to configure, in which case I'll read the docs.
Actually, I would guess that most of these default to something that's set elsewhere. Judging only by the documentation, not actually reading the source, here's what I can say: delimiter=>sep, # It's an alias for sep engine=>???, # seems the default is set elsewhere na_values=>_DEFAULT_NA_VALUES, # there is a default in the docs on_bad_lines='error' # seems this has a simple default For the rest, though, these _do not have_ defaults. Not default values, not default expressions. There is no code that could be placed at the top of the function to assign behaviour to them. The None default value actually means something different from passing in some other value - for instance, "callable or None" means it actually won't be calling any function if None is provided. This function isn't a good showcase of PEP 671 - neither its strengths nor its weaknesses - because it simply doesn't work with argument defaults in that way. It might be able to take advantage of it for a couple of them, but it's certainly not going to change the sheer number of None-default arguments that it has. Maybe I'm wrong on that, and maybe you could show the lines of code at the top of the function that could potentially be converted into argument defaults, but otherwise, this is simply a function that potentially does a lot of stuff, and only does the stuff for the arguments you pass in. (It could potentially benefit from a way to know whether the argument was passed or not, but since None is a fine sentinel for all of these args, there wouldn't be much to gain.) ChrisA