[Python-ideas] Suggestion for standardized annotations

Cem Karan cfkaran2 at gmail.com
Wed Mar 5 12:53:10 CET 2014


Hi all, I wanted to make a suggestion for a convention on how to write annotations.  I know that PEP 8 says that the standard library won't use them in order to avoid setting a convention, but that could lead to confusion and conflicts in how they are used in non-standard libraries.  The reason I'm thinking about this is because of how docstrings are currently used.  If you use PLY (http://www.dabeaz.com/ply/) and also want to use something like Sphinx (http://sphinx-doc.org/), you're going to have a problem; PLY stores its rules in the docstring, while Sphinx parses it for documentation.  I want to prevent this problem for annotations.

My thought is that all annotations should be dictionaries.  The keys should all be unicode strings that are uppercase UUIDs (basically, what 'uuid.uuid1().hex.upper()' returns), and the values can be anything the programmer wants.  Each project can generate one (or more) UUIDs to put into the dictionary, and publicly document what the UUID's meaning is (preferably somewhere where a search engine can find it).  The advantage is that since UUIDs are unique, the number of false positives you'll get while searching for it should be low; I've tested this on a different mailing list I'm on, and the UUID I generated for it has 0 false positives, while picking up the complete discussion involving it.  

As an example, if project A had chosen the UUID B3D2AFE8A45A11E3AE24D49A20C52EF2 and project B chose the UUID C02D7C64A45A11E39DAFD49A20C52EF2, we might annotate a function as follows:

def foo(first : {
                    B3D2AFE8A45A11E3AE24D49A20C52EF2 : {"doc" : "English documentation"},
                    C02D7C64A45A11E39DAFD49A20C52EF2 : {"doc" : "expression : MINUS expression %prec UMINUS"}
                },
        second) -> {B3D2AFE8A45A11E3AE24D49A20C52EF2: {"type" : int, "doc" : "Typechecking for a linter"}}:

    pass

You can already see the downside of this approach; it's really, really verbose.  However, at least it avoids outright conflicts in usage that prevent usage of certain tools/projects together.

Note that I did consider using names as keys directly (e.g. 'doc').  However, that requires a strong, universal convention on what each key means.  Since we can't seem to figure that out for the docstring, I don't see why we should expect to be able to figure it out for any of proposed keys.  Moreover, the set of keys would need to be documented somewhere, the documentation kept up to date, etc.  It becomes a management nightmare.  UUIDs have the advantage the we just tell everyone how to generate their own, and let them go at it.  If someone wants to use a given project's docstring for their own purposes, it is up to them to keep the meaning the same.

Thoughts/suggestions?

Thanks,
Cem Karan


More information about the Python-ideas mailing list