Proposal: "?" Documentation Operator and easy reference to argument types/defaults/docstrings
Dear all, Despite the general beauty of Python, I find myself constantly violating the "don't repeat yourself" maxim when trying to write clear, fully documented code. Take the following example: def func_1(a: int = 1, b: float = 2.5) -> float: """ Something about func_1 :param a: Something about param a :param b: Something else about param b :return: Something about return value of func_1 """ return a*b def func_2(c:float=3.4, d: bool =True) -> float: """ Something about func_2 :param c: Something about param c :param d: Something else about param d :return: Something about return value """ return c if d else -c def main_function(a: int = 1, b: float = 2.5, d: bool = True) -> float: """ Something about main_function :param a: Something about param a :param b: Something else about param b :param d: Something else about param d :return: Something about return value """ return func_2(func_1(a=a, b=b), d=d) Which has the following problems: - Defaults are defined in multiple places, which very easily leads to bugs (I'm aware of **kwargs but it obfuscates function interfaces and usually does more harm than good) - Types are defined in multiple places - Documentation is copy-pasted when referencing a single thing from different places. (I can't count the number of types I've written ":param img: A (size_y, size_x, 3) RGB image" - I could now just reference a single RGB_IMAGE_DOC variable) - Argument names need to be written twice - in the header and documentation - and it's up to the user / IDE to make sure they stay in sync. I propose to resolve this with the following changes: - Argument/return documentation can be made inline with a new "?" operator. Documentation becomes a first class citizen. - Argument (type/default/doc) can be referenced by "func.args.<arg_name>.type" / "func.args.<arg_name>.default" / "func.args.<arg_name>.doc". Positional reference: e.g. "func.args[1].default" also allowed. If not specified, they take a special, built-in "Undefined" value (because None may have another meaning for defaults). Return type/doc can be referenced with "func.return.type" / "func.return.doc". This would result in the following syntax: def func_1( a: int = 1 ? 'Something about param a', b: float = 2.5 ? 'Something else about param b', ) -> float ? 'Something about return value of func_1': """ Something about func_1 """ return a*b def func_2( c: float=3.4 ? 'Something about param c', d: bool =True ? 'Something else about param d', ) -> float ? 'Something about return value': """ Something about func_2 """ return c if d else -c def main_function( a: func_1.args.a.type = func_1.args.a.default ? func_1.args.a.doc, b: func_1.args.b.type = func_1.args.b.default ? func_1.args.b.doc, d: func_2.args.d.type = func_2.args.d.default ? func_2.args.d.doc, ) -> func_2.return.type ? func2.return.doc: """ Something about main_function """ return func_2(func_1(a=a, b=b), d=d) If the main_function header seems repetitious (it does) we could allow for an optional shorthand notation like: def main_function( a :=? func_1.args.a, b :=? func_1.args.b, d :=? func_2.args.d, ) ->? func_2.return: """ Something about main_function """ return func_2(func_1(a=a, b=b), d=d) Where "a :=? func_1.args.a" means "argument 'a' takes the same type/default/documentation as argument 'a' of func_1". So what do you say? Yes it's a bold move, but I think in the long term it's badly needed. Perhaps something similar has been proposed already that I'm not aware of.
On Fri, Apr 26, 2019 at 8:42 AM Peter O'Connor <peter.ed.oconnor@gmail.com> wrote:
Dear all,
Despite the general beauty of Python, I find myself constantly violating the "don't repeat yourself" maxim when trying to write clear, fully documented code. Take the following example:
def func_1(a: int = 1, b: float = 2.5) -> float: """ Something about func_1 :param a: Something about param a :param b: Something else about param b :return: Something about return value of func_1 """ return a*b
def func_2(c:float=3.4, d: bool =True) -> float: """ Something about func_2 :param c: Something about param c :param d: Something else about param d :return: Something about return value """ return c if d else -c
def main_function(a: int = 1, b: float = 2.5, d: bool = True) -> float: """ Something about main_function :param a: Something about param a :param b: Something else about param b :param d: Something else about param d :return: Something about return value """ return func_2(func_1(a=a, b=b), d=d)
Which has the following problems: - Defaults are defined in multiple places, which very easily leads to bugs (I'm aware of **kwargs but it obfuscates function interfaces and usually does more harm than good)
I'd actually rather explore fixing this problem than the other. We have functools.wraps() for the case where you do nothing other than pass through *a,**kw, but when you want to add or remove an argument, I don't think there's an easy way to say "that function's signature, but with these changes". That way, you aren't obfuscating the interface (since the called function's signature is incorporated into the wrapper's), and you're not duplicating defaults or anything. It shouldn't need to be all that complicated to use (although I'm sure it'll be complicated to implement). Something like: @functools.passes_args(f) def wrapper(spam, ham, *a, **kw): f(*a, **kw) There would need to be parameters to indicate the addition of parameters, but it could detect the removal (which is common for wrappers) just from the function's own signature. If that were implemented, would it remove the need for this new syntax you propose? ChrisA
Looks like a more complicated way to say : def f(x:'int : which does stuff' = 5, y:'int : which does more stuffs') The code reading the annotations (like the linter) might then parse it simply using .split. robertvandeneynde.be Le ven. 26 avr. 2019 à 00:41, Peter O'Connor <peter.ed.oconnor@gmail.com> a écrit :
Dear all,
Despite the general beauty of Python, I find myself constantly violating the "don't repeat yourself" maxim when trying to write clear, fully documented code. Take the following example:
def func_1(a: int = 1, b: float = 2.5) -> float: """ Something about func_1 :param a: Something about param a :param b: Something else about param b :return: Something about return value of func_1 """ return a*b
def func_2(c:float=3.4, d: bool =True) -> float: """ Something about func_2 :param c: Something about param c :param d: Something else about param d :return: Something about return value """ return c if d else -c
def main_function(a: int = 1, b: float = 2.5, d: bool = True) -> float: """ Something about main_function :param a: Something about param a :param b: Something else about param b :param d: Something else about param d :return: Something about return value """ return func_2(func_1(a=a, b=b), d=d)
Which has the following problems: - Defaults are defined in multiple places, which very easily leads to bugs (I'm aware of **kwargs but it obfuscates function interfaces and usually does more harm than good) - Types are defined in multiple places - Documentation is copy-pasted when referencing a single thing from different places. (I can't count the number of types I've written ":param img: A (size_y, size_x, 3) RGB image" - I could now just reference a single RGB_IMAGE_DOC variable) - Argument names need to be written twice - in the header and documentation - and it's up to the user / IDE to make sure they stay in sync.
I propose to resolve this with the following changes: - Argument/return documentation can be made inline with a new "?" operator. Documentation becomes a first class citizen. - Argument (type/default/doc) can be referenced by "func.args.<arg_name>.type" / "func.args.<arg_name>.default" / "func.args.<arg_name>.doc". Positional reference: e.g. "func.args[1].default" also allowed. If not specified, they take a special, built-in "Undefined" value (because None may have another meaning for defaults). Return type/doc can be referenced with "func.return.type" / "func.return.doc".
This would result in the following syntax:
def func_1( a: int = 1 ? 'Something about param a', b: float = 2.5 ? 'Something else about param b', ) -> float ? 'Something about return value of func_1': """ Something about func_1 """ return a*b
def func_2( c: float=3.4 ? 'Something about param c', d: bool =True ? 'Something else about param d', ) -> float ? 'Something about return value': """ Something about func_2 """ return c if d else -c
def main_function( a: func_1.args.a.type = func_1.args.a.default ? func_1.args.a.doc, b: func_1.args.b.type = func_1.args.b.default ? func_1.args.b.doc, d: func_2.args.d.type = func_2.args.d.default ? func_2.args.d.doc, ) -> func_2.return.type ? func2.return.doc: """ Something about main_function """ return func_2(func_1(a=a, b=b), d=d)
If the main_function header seems repetitious (it does) we could allow for an optional shorthand notation like:
def main_function( a :=? func_1.args.a, b :=? func_1.args.b, d :=? func_2.args.d, ) ->? func_2.return: """ Something about main_function """ return func_2(func_1(a=a, b=b), d=d)
Where "a :=? func_1.args.a" means "argument 'a' takes the same type/default/documentation as argument 'a' of func_1".
So what do you say? Yes it's a bold move, but I think in the long term it's badly needed. Perhaps something similar has been proposed already that I'm not aware of. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, Apr 25, 2019 at 6:42 PM Peter O'Connor <peter.ed.oconnor@gmail.com> wrote:
Despite the general beauty of Python, I find myself constantly violating the "don't repeat yourself" maxim when trying to write clear, fully documented code. Take the following example:
You do know that OPTIONAL type annotations are optional, right. There's no requirement to repeat yourself if you don't want to. But in general, comments or docstrings can do something very different from just annotate one variable at a time. If I want to describe the interaction of `a` and `b`, that simply cannot fit in an annotation/comment per parameter. Of if one argument switches the relevance or meaning of another, etc.
- Argument/return documentation can be made inline with a new "?" operator. Documentation becomes a first class citizen.
We already have a first-class citizen in annotations, this seems like extra burden for little reason.
def func_1( a: int = 1 ? 'Something about param a', b: float = 2.5 ? 'Something else about param b', ) -> float ? 'Something about return value of func_1': """ Something about func_1 """ return a*b
Why not just this in existing Python: def func_1( a: int = 1 # 'Something about param a', b: float = 2.5 # 'Something else about param b', ) -> float: """Something about func_1 a and b interact in this interesting way. a should be in range 0 < a < 125 floor(b) should be a prime number Something about return value of func_1 returns a multiplication """ return a*b -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
David Mertz writes:
Why not just this in existing Python:
def func_1(
If I understand the OP's POV, one reason is that the following comments are not available to help() in existing Python. (From the Picky-Picky-Picky Dept: It's a syntax error since a and b are not separated by the comma in the comment!) Did you mean "use existing Python syntax and have future interpreters use the comments"?
a: int = 1 # 'Something about param a', b: float = 2.5 # 'Something else about param b', ) -> float: """Something about func_1
a and b interact in this interesting way.
More from the Picky-Picky-Picky Dept: Again from the OP's POV, the following two lines belong in the signature I believe. I tend to differ, because they are (in principle) checkable, though only at runtime. See (2) below.
a should be in range 0 < a < 125 floor(b) should be a prime number
What I would rather see is (1) Comment syntax "inside" (fvo "inside" including any comment after the colon but before docstring or other code) the signature is interpreted as parameter documentation strings, which are available to help(). This is backward compatible; older Pythons will ignore them. The only problem would be if programs introspect their own docstrings and change behavior based on the result, which is so perverse that I'm happy to go with "if you do that, you deserve to suffer." (2) asserts involving parameters lexically are available to help(). I equate "assert" to your "should" since both have indeterminate semantics if violated at runtime. Of course this is subject to the usual caveats about use of assert to check user input at runtime! Perhaps argument-checking code not marked by an assert could be marked in some other way? So, after def featherfall( # not for use with parrots, dead or alive a: int = 1, # number of swallows in flock b: float = 2.5 # mean feather density of swallows ) -> float: # volume of feathers released when a flock # of swallows is stooped on by an F-35 """ The interaction of flock size and feather density may not be fully captured by multiplication. Also reports F-35's friend-or-foe ID to NORAD. Unimplemented: generalization to differential feather densities between African and European swallows and multiple disjoint flocks. Implemented for DoD contract #DEADBEEF666. """ assert 0 < a < 255 # not essential that these are assert prime(floor(b)) # the first code in the function c = a & floor(b) assert c != 0 # NOT documented in help() return a*b help() could produce featherfall (not for use with parrots, dead or alive) Parameters: a, an int, represents number of swallows in flock. Defaults to 1 and satisfies 0 < a < 255. b, a float, represents mean feather density of swallows. Defaults to 2.5 and satisfies prime(floor(b)). Returns: A float, whose value is the volume of feathers released when a flock of swallows is stooped on by an F-35. The interaction of flock size and feather density may not be fully captured by multiplication. Also reports F-35's friend-or-foe ID to NORAD. Unimplemented: generalization to differential feather densities between African and European swallows and multiple disjoint flocks. Implemented for DoD contract #DEADBEEF666. I'm not sure what to do with asserts involving multiple parameters. They could be repeated for each parameter involved, or put in a separate set of "Satisfies" conditions at the bottom of the Parameters: section. The main objection I see to the whole idea (and it may kill it) is that the comments aren't syntactically code, so that the formatting matters. (I'm also unsure whether they fall afoul of Guido's "No pragmas" dictat.) You could satisfy "must be code" with def featherfall( a: int = (1, 'number of swallows in flock')[0], b: float = (2.5, 'mean feather density of swallows')[0] ) -> float: # volume of feathers released when a flock # of swallows is stooped on by an F-35 but I think that falls into the "too ugly to live" category. NB: Any comment on the signature would be attached as documentation to the return value. It would also be possible to attach each comment in the signature to the most recently read component (function name, parameter, return type) so that def featherfall( # not for use with parrots, dead or alive a: int = 1, b: float = 2.5 # mean feather density of swallows ) -> float: would attach documentation to the function name and the parameter b, but not to the parameter a and the return type. This is well-defined, but maybe too pedantic to survive? The help text is of course infinitely bikesheddable. I suppose some people would prefer more backward compatibility. Ie, instead of the function name with comment as header, the header would be the signature. What that would look like is left as an exercise for the reader, but I would expect that the signature in the help output doesn't include the comment data. Personally, I don't really understand the "fails DRY" argument. If it's in the signature already, don't put it in the docstring. (If the project uses stub files, I guess this would require the ordinary compiler to look for them which might or might not be an acceptable tradeoff if it doesn't do that already.) Steve
Thanks all for the responses. I read thought them carefully and address each below. I don't think any fully address the core problem - The "Argument" - the tuple of (type, default, documentation) - is currently not a first-class entity. Because there is no way to reference an Argument, there is much copypasta and dangerous-default-duplication. The idea of this proposal is that it should be possible to define an argument in one place, and simply bind a different name to it in each function signature. To recap - the points to the proposal are: - Allow documentation to be bound to an argument: "func(arg_a: int = 3 ? 'doc about arg', ...)" or "func(arg_a: int = 3 # 'doc about arg', ...)" - Allow reference to argument: "outer_func(new_arg_a_name: func.args.arg_a.type = func.args.arg_a.default ? 'new doc for arg a', ...)" - (optionally) a shorthand syntax for reusing type/doc/default of argument: "def outer_func(new_arg_a_name :=? func.args.arg_a, ...):" Below I have responded to each comment - please let me know if I missed something: ---------- On Thu, Apr 25, 2019 at 3:59 PM Robert Vanden Eynde <robertve92@gmail.com> wrote:
Looks like a more complicated way to say : def f(x:'int : which does stuff' = 5, y:'int : which does more stuffs')
I hadn't though of incorporating documentation into the type, that's a nice idea. I think it's an ok "for now" solution but: - doing it this way loses the benefits of type inspection (built in to most IDEs now), - does not allow you do for instance keep the type definition in the wrapper while changing the documentation. - Provides no easy way to reference (f.args.x.documentation) which is a main point to the proposal. ---------- On Thu, Apr 25, 2019 at 3:58 PM Chris Angelico <rosuav@gmail.com> wrote:
@functools.passes_args(f) def wrapper(spam, ham, *a, **kw): f(*a, **kw) ....
If that were implemented, would it remove the need for this new syntax
you propose?
This does indeed allow defaults and types to be passed on, but I find a this approach still has the basic flaw of using **kwargs: - It only really applies to "wrappers" - functions that wrap another function. The goal here is to address the common case of a function passing args to a number of functions within it. - It is assumed that the wrapper should use the same argument names as the wrapped function. A name should bind a function to an argument - but here the name is packaged with the argument. - It remains difficult to determine the arguments of "wrapper" by simple inspection - added syntax for removal of certain arguments only complicates the task and seems fragile (lest the wrapped functions argument names change). - Renaming an argument to "f" will change the change the arguments of wrapper - but in a way that's not easy to inspect (so for instance if you have a call "y = wrapper(arg_1=4)", and you change "f(arg1=....)" to "f(arg_one=...)" no IDE will catch that and make the appropriate change to "y=wrapper(arg_one=4)". - What happens if you're not simply wrapping one sub-function but calling several? What about when those subfunctions have arguments with the same name? ---------- On Thu, Apr 25, 2019 at 5:50 PM David Mertz <mertz@gnosis.cx> wrote:
Why not just this in existing Python: def func_1( a: int = 1 # 'Something about param a', b: float = 2.5 # 'Something else about param b', ) -> float: """Something about func_1
a and b interact in this interesting way. a should be in range 0 < a < 125 floor(b) should be a prime number
Something about return value of func_1 returns a multiplication """ return a*b
- This would currently be a syntax error (because "#" is before the comma), but sure we could put it after the comma. - It does not address the fact that we cannot reference "func_1.a.default" - which is one of the main points of this proposal. - I'm fine with "#" being the documentation operator instead of "?", but figured people would not like it because it breaks the precedent of anything after "#" being ignored by the compiler --------------- On Thu, Apr 25, 2019 at 9:04 PM Anders Hovmöller <boxed@killingar.net> wrote:
Maybe something like... def foo(**kwargs): “”” @signature_by: full.module.path.to.a.signature_function(pass_kwargs_to=bar, hardcoded=[‘quux’]) “”” return bar(**kwargs, quux=3)
This makes it difficult to see what the names of arguments to "foo" are, at a glance. And what happens if (as in the example) "foo" does not simply wrap a function, but distributes arguments to multiple subfunctions? (this is a common case) ---------------------------- On Fri, Apr 26, 2019 at 2:18 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
What I would rather see is
(1) Comment syntax "inside" (fvo "inside" including any comment after the colon but before docstring or other code) .....
(2) asserts involving parameters lexically are available to help().....
(1) I'm fine with "#" being used instead of "?" as the "documentation operator", but I figured it would be rejected for breaking the president that everything after "#" is ignored by the compiler. (2) This would be a nice addition ... if this proposal were actually implemented, you'd have a built in "Argument" object, and in that case you could do e.g.: RGB_IMAGE = Argument(type=np.ndarray, doc = 'An RGB image', check = lambda img: (img.ndim==3 and img.shape[2]==3)) def brighten_image(image :=? RGB_IMAGE, ...): ...
Peter O'Connor writes:
On Fri, Apr 26, 2019 at 2:18 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
What I would rather see is
(1) Comment syntax "inside" (fvo "inside" including any comment after the colon but before docstring or other code) .....
(2) asserts involving parameters lexically are available to help().....
(1) I'm fine with "#" being used instead of "?" as the "documentation operator", but I figured it would be rejected for breaking the president that everything after "#" is ignored by the compiler.
This is not quite true, depending on your definition of "compiler", because of PEP 263, which allows you to specify the program's encoding in a comment at the beginning (first or second line), and because of type hints themselves, which are recognized in comments by parsing tools like mypy. The reason for not having semantically significant comments, AIUI, is not so much that the *compiler* ignore it as that the *human reader* be able to ignore it. The way I think about it, this is what justifies the PEP 263 coding cookies, since humans *will* ignore the cookies in favor of detecting mojibake "by eye", while compilers need them to construct string literals correctly. I'm not sure how the authorities on Pythonicity will come down on this, though.
(2) This would be a nice addition ... if this proposal were actually implemented, you'd have a built in "Argument" object, and in that case you could do e.g.: RGB_IMAGE = Argument(type=np.ndarray, doc = 'An RGB image', check = lambda img: (img.ndim==3 and img.shape[2]==3)) def brighten_image(image :=? RGB_IMAGE, ...): ...
If the "(2)" refers to my proposal that asserts be available to help(), I don't understand the example. Are you proposing that instead of an explicit expression, asserts of this kind be written assert image.check() In any case, as far as I know you can already use the syntax suggested in the quotation with standard type hints as long as "Argument" is a class *factory*. (Caveat: I'm not sure what semantics you attach to ":=?".) I'm not arguing against the desire for the *builtin* feature, simply recognizing the practicality that already implemented features, or approximations to what you really want, get more support more quickly. Thinking aloud, you're probably way ahead of me, but just in case: I'm not so clear on whether the reuse of RGB_IMAGE suggested by the assignment in the example is so useful. I would expect something more like RGB_IMAGE = Argument(type=np.ndarray, doc = 'An RGB image', check = lambda img: (img.ndim==3 and img.shape[2]==3)) def brighten_image(image : RGB_IMAGE(doc = 'Image to brighten in-place', ...): because "image" already tells you what the argument *is*, and "RGB_IMAGE" tells you its specific *type*, while the docstring tells you its *role* in the function, and that it's being mutated. For various reasons, these distinctions aren't so salient for RGB images, I guess, but they're crucial for types like int and float. It seems to me, therefore, that the Argument object (whether an instance or a class) is likely to have 'doc' and 'check' attributes that vary with role, ie, as the arguments they described get passed from function to function. I don't think there's a lot to say about the variability of 'doc' except "live with it", but the variability of 'check' sounds like something that a contracting functionality would deal with (if you don't know him already, Anders Hovmöller is expert on contracts in Python, and there were long threads on that a few months ago, which he could probably summarize for you). Steve
On 26 Apr 2019, at 00:41, Peter O'Connor <peter.ed.oconnor@gmail.com> wrote:
- Defaults are defined in multiple places, which very easily leads to bugs (I'm aware of **kwargs but it obfuscates function interfaces and usually does more harm than good) - Types are defined in multiple places - Documentation is copy-pasted when referencing a single thing from different places. (I can't count the number of types I've written ":param img: A (size_y, size_x, 3) RGB image" - I could now just reference a single RGB_IMAGE_DOC variable) - Argument names need to be written twice - in the header and documentation - and it's up to the user / IDE to make sure they stay in sync.
We have this exact problem in many places in tri.form, tri.query, tri.table and the code bases that use them. I would really like a solution to these! But you don’t seem to address these problems at all in the rest of your email, which makes me confused. In general I think what we want is an agreed upon way to specify argument names, counts and defaults for use by static analysis tools, documentation generation tools and IDEs, in a programmatic way. This could solve the problems you reference above, and also the issue of how to supply auto complete for something like Djangos query language (where you can do SomeTable.objects.filter(foreignkey__anotherforeignkey__value=3) which is great!). Maybe something like... def foo(**kwargs): “”” @signature_by: full.module.path.to.a.signature_function(pass_kwargs_to=bar, hardcoded=[‘quux’]) “”” return bar(**kwargs, quux=3) def signature_function(f, pass_kwargs_to=None, hardcoded=None, **_): signature = inspect.signature(f) if pass_kwargs_to is not None: signature_nested = inspect.signature(pass_kwargs_to) signature.remove_kwargs() signature = signature.merge(signature_nested) if hardcoded is not None: for h in hardcoded: signature.parameters.remove(h) return signature Some of the above is pseudo code obviously. What do you think? / Anders
participants (6)
-
Anders Hovmöller
-
Chris Angelico
-
David Mertz
-
Peter O'Connor
-
Robert Vanden Eynde
-
Stephen J. Turnbull