Suggestion for standardized annotations

Hi all, I wanted to make a suggestion for a convention on how to write annotations. I know that PEP 8 says that the standard library won't use them in order to avoid setting a convention, but that could lead to confusion and conflicts in how they are used in non-standard libraries. The reason I'm thinking about this is because of how docstrings are currently used. If you use PLY (http://www.dabeaz.com/ply/) and also want to use something like Sphinx (http://sphinx-doc.org/), you're going to have a problem; PLY stores its rules in the docstring, while Sphinx parses it for documentation. I want to prevent this problem for annotations. My thought is that all annotations should be dictionaries. The keys should all be unicode strings that are uppercase UUIDs (basically, what 'uuid.uuid1().hex.upper()' returns), and the values can be anything the programmer wants. Each project can generate one (or more) UUIDs to put into the dictionary, and publicly document what the UUID's meaning is (preferably somewhere where a search engine can find it). The advantage is that since UUIDs are unique, the number of false positives you'll get while searching for it should be low; I've tested this on a different mailing list I'm on, and the UUID I generated for it has 0 false positives, while picking up the complete discussion involving it. As an example, if project A had chosen the UUID B3D2AFE8A45A11E3AE24D49A20C52EF2 and project B chose the UUID C02D7C64A45A11E39DAFD49A20C52EF2, we might annotate a function as follows: def foo(first : { B3D2AFE8A45A11E3AE24D49A20C52EF2 : {"doc" : "English documentation"}, C02D7C64A45A11E39DAFD49A20C52EF2 : {"doc" : "expression : MINUS expression %prec UMINUS"} }, second) -> {B3D2AFE8A45A11E3AE24D49A20C52EF2: {"type" : int, "doc" : "Typechecking for a linter"}}: pass You can already see the downside of this approach; it's really, really verbose. However, at least it avoids outright conflicts in usage that prevent usage of certain tools/projects together. Note that I did consider using names as keys directly (e.g. 'doc'). However, that requires a strong, universal convention on what each key means. Since we can't seem to figure that out for the docstring, I don't see why we should expect to be able to figure it out for any of proposed keys. Moreover, the set of keys would need to be documented somewhere, the documentation kept up to date, etc. It becomes a management nightmare. UUIDs have the advantage the we just tell everyone how to generate their own, and let them go at it. If someone wants to use a given project's docstring for their own purposes, it is up to them to keep the meaning the same. Thoughts/suggestions? Thanks, Cem Karan

On 5 March 2014 11:53, Cem Karan <cfkaran2@gmail.com> wrote:
Thoughts/suggestions?
I think the core/stdlib position is that agreeing conventions would be better done once some real world experience of the practical issues and benefits of annotations has been established. So while a proposal like this is not without merit, it needs to be considered in the light of how projects actually use annotations. Personally, I'm not aware of any libraries that make significant use of annotations, so a good first step would be to survey existing use, and summarise it here. That would allow you to clarify your proposal in terms of exactly how existing projects would need to modify their current code. Of course, there's likely a chicken and egg problem here - projects may be holding off using annotations through fear of issues caused by clashes. But I'm not sure that a UUID-based proposal like the above (which as you admit is very verbose, and not particularly user friendly) would be more likely to encourage use. If I were developing a library that would benefit from annotations, at this point in time I'd probably just choose whatever conventions suited me and go with those - likely marking the feature as "subject to change" initially. Then, when people raised bug reports or feature requests that asked for better interoperability, I'd look at how to achieve that in conjunction with the other project(s) that clashed with me. Paul

You could use a slight modification of sigtools.modifiers.annotate [1] to create different objects with different function annotations. That way programmers who only use one library have no change to do, and those who use more only have to add a few lines devoid of the verbosity of UUIDs. [1] http://sigtools.readthedocs.org/en/latest/#sigtools.modifiers.annotate On 5 March 2014 13:50, Paul Moore <p.f.moore@gmail.com> wrote:
On 5 March 2014 11:53, Cem Karan <cfkaran2@gmail.com> wrote:
Thoughts/suggestions?
I think the core/stdlib position is that agreeing conventions would be better done once some real world experience of the practical issues and benefits of annotations has been established. So while a proposal like this is not without merit, it needs to be considered in the light of how projects actually use annotations. Personally, I'm not aware of any libraries that make significant use of annotations, so a good first step would be to survey existing use, and summarise it here. That would allow you to clarify your proposal in terms of exactly how existing projects would need to modify their current code.
Of course, there's likely a chicken and egg problem here - projects may be holding off using annotations through fear of issues caused by clashes. But I'm not sure that a UUID-based proposal like the above (which as you admit is very verbose, and not particularly user friendly) would be more likely to encourage use.
If I were developing a library that would benefit from annotations, at this point in time I'd probably just choose whatever conventions suited me and go with those - likely marking the feature as "subject to change" initially. Then, when people raised bug reports or feature requests that asked for better interoperability, I'd look at how to achieve that in conjunction with the other project(s) that clashed with me.
Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On 5 Mar 2014 22:51, "Paul Moore" <p.f.moore@gmail.com> wrote:
On 5 March 2014 11:53, Cem Karan <cfkaran2@gmail.com> wrote:
Thoughts/suggestions?
I think the core/stdlib position is that agreeing conventions would be better done once some real world experience of the practical issues and benefits of annotations has been established.
MyPy uses function annotations for optional static typing, which is pretty much the use case Guido originally had in mind and the main reason that PEP 8 doesn't make combining annotations with an associated decorator mandatory: http://www.mypy-lang.org/ You do still have to import a particular module to indicate that all annotations in the importing module are to be interpreted as type annotations. Beyond that, the guidance in PEP 8 stands: """It is recommended that third party experiments with annotations use an associated decorator to indicate how the annotation should be interpreted.""" Cheers, Nick.

On Wed, Mar 5, 2014 at 8:45 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 5 Mar 2014 22:51, "Paul Moore" <p.f.moore@gmail.com> wrote:
On 5 March 2014 11:53, Cem Karan <cfkaran2@gmail.com> wrote:
Thoughts/suggestions?
I think the core/stdlib position is that agreeing conventions would be better done once some real world experience of the practical issues and benefits of annotations has been established.
MyPy uses function annotations for optional static typing, which is pretty much the use case Guido originally had in mind and the main reason that PEP 8 doesn't make combining annotations with an associated decorator mandatory: http://www.mypy-lang.org/
You do still have to import a particular module to indicate that all annotations in the importing module are to be interpreted as type annotations.
Beyond that, the guidance in PEP 8 stands:
"""It is recommended that third party experiments with annotations use an associated decorator to indicate how the annotation should be interpreted."""
I like what all of you are suggesting; decorators are the way to go. If a project defines its own annotation decorators as sigtools.modifiers.annotate<http://sigtools.readthedocs.org/en/latest/_modules/sigtools/modifiers.html#a...>, mypy, <http://www.mypy-lang.org/> or pyanno<http://www.fightingquaker.com/pyanno>do, then it shouldn't be too hard for a project to add its own UUID to the annotation dictionary. I'll spend a little while this weekend seeing if I can come up with some proof-of-concept code to make this work in a portable way. If the signatures looked vaguely like the following: @type_annotation(some_arg, int) @return_type_annotation(int) def function(some_arg): pass Would that be appealing to everyone? One question though, are we allowed to modify a functions __annotation__ dictionary directly? I know that I can do it, I just don't know if it is discouraged. Thanks, Cem Karan <http://www.fightingquaker.com/pyanno>

CFK wrote:
then it shouldn't be too hard for a project to add its own UUID to the annotation dictionary.
I can't see what benefit there is in bringing UIIDs into the picture. If you're suggesting that people write the UUIDs directly into their code as literals, that's *not* going to fly. It would be appallingly ugly and error-prone. The only way to make it usable would be to import the UUIDs from somewhere under human-readable names. But then there's no need to use UUIDs, any set of unique sentinel objects will do. -- Greg

I've written some sample code that can do the type of annotations that I'm suggesting; is there a particular way that code is shared on the list, or should I just put it on the web somewhere? On Mar 5, 2014, at 11:46 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
CFK wrote:
then it shouldn't be too hard for a project to add its own UUID to the annotation dictionary.
I can't see what benefit there is in bringing UIIDs into the picture.
If you're suggesting that people write the UUIDs directly into their code as literals, that's *not* going to fly. It would be appallingly ugly and error-prone.
The only way to make it usable would be to import the UUIDs from somewhere under human-readable names. But then there's no need to use UUIDs, any set of unique sentinel objects will do.
-- Greg
There are a number of reasons I'm suggesting UUIDs instead of simple strings: - Since they are truly unique, and since the hex string doesn't form words in any human language, you can enter the string in your favorite search engine to get more information about the string. The probability of false positives is almost 0, which is something that a string like 'doc' simply won't give you. - They don't require any coordination between projects. That means you don't have to worry that the name you choose for your project conflicts with a name that someone else chooses. You also don't have to worry about changing the name of your project; UUIDs don't carry any inherent meaning to people so you can keep them even when your project changes. - They have fixed lengths and formats. Annotations have been around for a while, which means that any automated tooling is going to have to find a way to deal with a new convention. If a parser can convert all the keys into UUID instances, then that gives it a high confidence that the code is using this convention. This is much more difficult to do when the universe of keys is more or less arbitrary. As for being ugly and error prone, I agree that putting it all in by hand would be horrible. The code that I wrote demonstrates a different way, which only requires decorators. It is basically a library that does the following: import annotizer import uuid # Load your uuid in some manner into project_ID. project_annotizer = annotizer.annotizer(ID) # From now on, you can use project_annotizer's decorators everywhere. @project_annotizer.parameter_decorator(a="blah") def func(a): pass The example code I've written doesn't use strings, it uses uuid.UUID instances directly, but that is easily changed. With some work, it can be extended for a large variety of useful annotations, all while hiding the complexities of dealing with UUIDs completely. Thanks, Cem Karan

On Sun, Mar 9, 2014 at 10:24 AM, Cem Karan <cfkaran2@gmail.com> wrote:
There are a number of reasons I'm suggesting UUIDs instead of simple strings:
But against that is that they're extremely long. Most other places where long hex strings are used, humans won't use the full thing. With git and hg, revisions are identified by hash - but you can use a prefix, and in most repos, 6-8 hex digits will uniquely identify something. When humans look at SSH key fingerprints, how many actually read the whole thing? That's why randart was produced. Tell me, which of these is different? def f(x:"f0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b"): pass def asdf(x:"f0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b"): pass def hello_world(x:"f0e4c2f76c58916ec258f246851bea891d14d4247a2fc3e18694461b1816e13b"): pass def testing(x:"f0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b"): pass def longer_function_name(x:"f0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b"): pass As uniqueness guarantors, they're a bit unwieldy, and that makes them pretty unhelpful. ChrisA

On Mar 8, 2014, at 6:38 PM, Chris Angelico <rosuav@gmail.com> wrote:
On Sun, Mar 9, 2014 at 10:24 AM, Cem Karan <cfkaran2@gmail.com> wrote:
There are a number of reasons I'm suggesting UUIDs instead of simple strings:
But against that is that they're extremely long. Most other places where long hex strings are used, humans won't use the full thing. With git and hg, revisions are identified by hash - but you can use a prefix, and in most repos, 6-8 hex digits will uniquely identify something. When humans look at SSH key fingerprints, how many actually read the whole thing? That's why randart was produced. Tell me, which of these is different?
def f(x:"f0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b"): pass
def asdf(x:"f0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b"): pass
def hello_world(x:"f0e4c2f76c58916ec258f246851bea891d14d4247a2fc3e18694461b1816e13b"): pass
def testing(x:"f0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b"): pass
def longer_function_name(x:"f0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b"): pass
As uniqueness guarantors, they're a bit unwieldy, and that makes them pretty unhelpful.
Agreed... for human beings. However, the purpose isn't just for people, its for automated systems that use annotations in possibly incompatible ways. My original example was the problem of docstrings between projects like PLY (http://www.dabeaz.com/ply/) and Sphinx (http://sphinx-doc.org/). For this pair of projects, you must choose to either document your code, or use the parser; not both. My proposal circumvents the problem. Nick Coghlan suggested using decorators instead, which I implemented. In my test code (posted below), you can stack decorators, which completely hide the UUID from human eyes. If an automated tool requires the annotations, it can look for its own UUID. BTW, my apologies if the test code isn't 100% perfect; I threw it together to make sure that the decorator idea would work well. Also, I'm using actual UUID instances instead of strings; it was easier to see what I was doing that way, but would need to be changed in real code. Thanks, Cem Karan #!/usr/bin/env python # -*- coding: utf-8 -*- __docformat__ = "restructuredtext en" import inspect import uuid import pprint ############################################################################## ############################################################################## ### Helper classes ############################################################################## ############################################################################## class _parameter_decorator(object): """ This class adds a documentation string to the annotations of a function or class instance's parameter. You can add multiple documentation strings, or you can add one at a time. Note that it is intended to only be used by the annotizer class. """ def __init__(self, ID, *args, **kwargs): self._ID = ID self._args = args self._kwargs = kwargs # End of __init__() def __call__(self, f): sig = inspect.signature(f) func_args = {x for x in sig.parameters} decorator_args = {x for x in self._kwargs} keys = func_args.intersection(decorator_args) for key in keys: if key not in f.__annotations__: f.__annotations__[key] = dict() if self._ID not in f.__annotations__[key]: f.__annotations__[key][self._ID] = dict() f.__annotations__[key][self._ID]["doc"] = self._kwargs[key] return f # End of __call__() # End of class _parameter_decorator class _return_decorator(object): """ This class adds a documentation string to the annotations of a function or class instance's return value. """ def __init__(self, ID, *args, **kwargs): self._ID = ID self._args = args self._kwargs = kwargs # End of __init__() def __call__(self, f): sig = inspect.signature(f) key = 'return' if sig.return_annotation == inspect.Signature.empty: f.__annotations__[key] = dict() if self._ID not in f.__annotations__[key]: f.__annotations__[key][self._ID] = dict() f.__annotations__[key][self._ID]["doc"] = self._args[0] return f # End of __call__() # End of class _return_decorator ############################################################################## ############################################################################## ### annotizer ############################################################################## ############################################################################## class annotizer(object): """ This assists in making annotations by providing decorators that are aware of your project's internal `UUID <http://en.wikipedia.org/wiki/UUID>`_. Using this class will ensure that your annotations are separate from the annotations that others use, even if the keys they use are the same as your keys. Thus, projects **A** and **B** can both use the key ''doc'' while annotating the same function, without accidentally overwriting the other project's use. """ def __init__(self, ID, parameter_decorator_class=_parameter_decorator, return_decorator_class=_return_decorator): """ :param ID: This is your project's UUID. The easiest way to generate this is to use the `uuid <http://docs.python.org/3/library/uuid.html`_ module, and then store the ID somewhere convenient. :type ID: ``str`` that can be used to initialize a `UUID <http://docs.python.org/3/library/uuid.html#uuid.UUID`_ instance, or a `UUID <http://docs.python.org/3/library/uuid.html#uuid.UUID`_ instance """ if isinstance(ID, uuid.UUID): self._ID = ID else: self._ID = uuid.UUID(ID) self.parameter_decorator_class = parameter_decorator_class self.return_decorator_class = return_decorator_class # End of __init__() def ID(): doc = ("This is the ID of your project. It is a " + "`UUID <http://docs.python.org/3/library/uuid.html#uuid.UUID`_" + "instance.") def fget(self): return self._ID return locals() ID = property(**ID()) def parameter_decorator_class(): doc = ("Instances of this class may be used to decorate") def fget(self): return self._parameter_decorator_class def fset(self, value): self._parameter_decorator_class = value def fdel(self): self._parameter_decorator_class = _parameter_decorator return locals() parameter_decorator_class = property(**parameter_decorator_class()) def return_decorator_class(): doc = "The return_decorator_class property." def fget(self): return self._return_decorator_class def fset(self, value): self._return_decorator_class = value def fdel(self): self._return_decorator_class = _return_decorator return locals() return_decorator_class = property(**return_decorator_class()) def parameter_decorator(self, *args, **kwargs): decorator = self.parameter_decorator_class(self.ID, *args, **kwargs) return decorator # End of parameter_decorator() def return_decorator(self, *args, **kwargs): decorator = self.return_decorator_class(self.ID, *args, **kwargs) return decorator # End of return_decorator() # End of class annotizer ############################################################################## ############################################################################## ### Main ############################################################################## ############################################################################## if __name__ == "__main__": ID1 = uuid.uuid1() an1 = annotizer(ID1) ID2 = uuid.uuid1() an2 = annotizer(ID2) @an1.parameter_decorator(a="a", b="b", c="c") @an2.parameter_decorator(a="A", b="B", c="C") @an1.return_decorator("Doesn't return anything of value") @an2.return_decorator("Does not return a value") def func(a,b,c): print("a = {0!s}, b = {1!s}, c = {2!s}".format(a,b,c)) pprint.pprint(func.__annotations__) The output is: $ ./annotizer.py {'a': {UUID('f2e91c2c-a721-11e3-9535-d49a20c52ef2'): {'doc': 'a'}, UUID('f2ec5608-a721-11e3-b494-d49a20c52ef2'): {'doc': 'A'}}, 'b': {UUID('f2e91c2c-a721-11e3-9535-d49a20c52ef2'): {'doc': 'b'}, UUID('f2ec5608-a721-11e3-b494-d49a20c52ef2'): {'doc': 'B'}}, 'c': {UUID('f2e91c2c-a721-11e3-9535-d49a20c52ef2'): {'doc': 'c'}, UUID('f2ec5608-a721-11e3-b494-d49a20c52ef2'): {'doc': 'C'}}, 'return': {UUID('f2e91c2c-a721-11e3-9535-d49a20c52ef2'): {'doc': "Doesn't return anything of value"}, UUID('f2ec5608-a721-11e3-b494-d49a20c52ef2'): {'doc': 'Does not return a value'}}}

Cem Karan wrote:
There are a number of reasons I'm suggesting UUIDs instead of simple strings:
I'm not talking about strings, I'm talking about objects created and exported by the module defining the annotations, and compared by identity. The Python module namespace then ensures they have unique names within any given program. That's all you need, because there's no requirement to persist them from one program execution to another. -- Greg

On Mar 8, 2014, at 7:05 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Cem Karan wrote:
There are a number of reasons I'm suggesting UUIDs instead of simple strings:
I'm not talking about strings, I'm talking about objects created and exported by the module defining the annotations, and compared by identity.
The Python module namespace then ensures they have unique names within any given program. That's all you need, because there's no requirement to persist them from one program execution to another.
-- Greg
I see your point, and I think you're right. I can see a lot of really good additional tricks we can play (adding in metadata to the sentinel, etc.). I've spent the time since I saw this message trying to come up with good counterarguments and I honestly can't come up with any good ones. I can only think of two counterarguments, both of which are really weak. First, if you want to mock the module, you don't need the module's sentinel; you just need the UUID, and then you never need to load the module at all. Second, there are cases where you want to warn people that although the signature of a function has remained the same, the semantics have shifted. In this case, a module user might want to key off of the UUID. If the ID changes, then the meaning has shifted. Both of those counterarguments are really weak though, so unless anyone can think of a reason NOT to use sentinels as Greg suggests, I'd like to shift my proposal to using sentinels instead of UUIDs. Thanks, Cem Karan

Is there any further interest in standardized annotations, or should the idea be abandoned? Thanks, Cem Karan On Mar 8, 2014, at 7:05 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Cem Karan wrote:
There are a number of reasons I'm suggesting UUIDs instead of simple strings:
I'm not talking about strings, I'm talking about objects created and exported by the module defining the annotations, and compared by identity.
The Python module namespace then ensures they have unique names within any given program. That's all you need, because there's no requirement to persist them from one program execution to another.
-- Greg _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Cem Karan writes:
Is there any further interest in standardized annotations, or should the idea be abandoned?
Obviously there's interest; standards are a good thing when you're trying to share. But not if they end up getting in the way of sharing because they're too limited or you end up with a bunch of standards such that no program can conform with all of them. To avoid the latter, you need to provide an implementation and show that it's useful by waiting for it to be used. You're not going to get a standard in to the stdlib at this point because there's not enough usage of *any* proposed annotation standard. If you want to make progress on this, just do it, and worry about getting it in to the stdlib later. To see what it takes to go directly into the stdlib, consider the PEP 461 debate. There was no need to provide an implementation and wait for usage to follow *because %-formatting for binary was already in widespread practical use in Python 2*. It was pretty clear that the default was going to be "just like Python 2", and that's how it ended up -- with the exception of "%r" because that would do the wrong thing in the intended use case (and "%a" does an equivalent right thing).

Oh, I didn't think it would get into the standard library in one shot, that's for sure! I just wanted to gauge interest to see if I should continue working on it and promoting it. I'll go ahead and do so, and put it up on pypi. Thanks, Cem Karan On Mar 11, 2014, at 5:28 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Cem Karan writes:
Is there any further interest in standardized annotations, or should the idea be abandoned?
Obviously there's interest; standards are a good thing when you're trying to share. But not if they end up getting in the way of sharing because they're too limited or you end up with a bunch of standards such that no program can conform with all of them.
To avoid the latter, you need to provide an implementation and show that it's useful by waiting for it to be used. You're not going to get a standard in to the stdlib at this point because there's not enough usage of *any* proposed annotation standard.
If you want to make progress on this, just do it, and worry about getting it in to the stdlib later.
To see what it takes to go directly into the stdlib, consider the PEP 461 debate. There was no need to provide an implementation and wait for usage to follow *because %-formatting for binary was already in widespread practical use in Python 2*. It was pretty clear that the default was going to be "just like Python 2", and that's how it ended up -- with the exception of "%r" because that would do the wrong thing in the intended use case (and "%a" does an equivalent right thing).
participants (8)
-
Cem Karan
-
CFK
-
Chris Angelico
-
Greg Ewing
-
Nick Coghlan
-
Paul Moore
-
Stephen J. Turnbull
-
Yann Kaiser