[Python-ideas] Was: Annotations (and static typing), Now:Sharing annotations

Sun Aug 24 15:32:38 CEST 2014

On Aug 24, 2014, at 7:39 AM, Ed Kellett <edk141 at gmail.com> wrote:

> I have a few questions:
> 
> - How often do multiple kinds of annotation end up on the same function?

In this thread, we've already talked about type checkers and documentation generators, both of which can use the __annotations__ dictionary legitimately.  Now imagine that you are an end user that has installed a documentation generator and a static type analyzer.  If both tools were to use the __annotations__ dictionary, then right now you could choose one or the other tool, but not both.  However, if the standard I'm proposing was adopted, then each tool would choose its own UUID as its key, which would mean they could share entries in the annotations dictionary.

> - Why UUIDs (rather than for e.g. PyPI packages, or some other namespace)?

Multiple reasons:

- Using PyPI means that every programmer that is trying to follow the standard (including anyone who is just learning python) will create some name while practicing.  That name will need to be pushed up to PyPI to ensure uniqueness.  Since most of these names are only for learning, PyPI will immediately get flooded with a bunch of project names that are probably going to be abandoned almost immediately (I'm thinking of beginning programming classes especially).  This would significantly degrade the utility of PyPI, which I want to avoid.  All similar centrally-managed systems will suffer from the same problem.  UUIDs don't have this problem; create and abandon them at will.

- All other namespace systems will either suffer from the possibility of collisions, require a centrally managed repository of names, or will eventually reinvent UUIDs.  We already have UUIDs, lets skip the first couple of headaches and just solve the problem.

- Centrally managed systems have a much higher barrier to entry than simple UUIDs.  Getting a new UUID to experiment with is trivial; "import uuid; uuid.uuid4()" is our complete program, requires no management on the part of PyPI (or any other third party), doesn't require internet access, etc.  

- UUIDs have no built-in human significance; it is VERY unlikely that multiple projects will accidentally choose the same UUID.  E.g., it is likely that programmers developing different type checkers would choose 'type checker' as a key, and each project will have incompatible meanings/values for the 'type checker' key.  This doesn't sound too bad until you start pulling in multiple frameworks from different sources, each of which uses a different, mutually incompatible type checker system.  At that point, running any type checker will cause a crash as the type checker tries to read information from the frameworks you've just pulled in.

- Google for a UUID.  Any UUID.  If you've just generated the UUID, you are unlikely to get one come up.  Now google for 'UUID('f9bbc165-d904-4452-b858-fc5c9f104c87')'.  I've just added it to the README for my annotizer project, and I expect googlebot to pick it up in the next few days.  At that point, the only two places you should find mention of that UUID is on github, and in this thread.  If you google for 'type checker', etc., how many hits do you get?  How many of them relate to mypy, or even this thread?  Once people are used to the standard, they'll know that to find out information about what project is associated with a given UUID they just need to google for it.  This is a big win.  Actually, do this as an experiment: don't look at the URL below, instead, wait a few days and google for the UUID above.  See what comes up.

- Making a standard for __annotations__ at this point isn't easy; we need a simple way of deciding if someone is complying the the standard.  This is pretty easy if we adopt some UUID as a required key as I mentioned earlier.  

> - What is the point? A decorator could process the annotations and put
> some information in func._projectname__something instead of doing the
> UUID dance

Again, this isn't for consumption within a project, it is for users across projects.  What if I want to use Sphinx (http://sphinx-doc.org/) and mypy (http://www.mypy-lang.org/) at the same time in my project?  What happens in the following code?

"""
@sphinx_decorator(a, "Some documentation about a")
@mypy_decorator(a, int))
def foo(a):
    pass
"""

Is it the same as:

"""
@mypy_decorator(a, int))
@sphinx_decorator(a, "Some documentation about a")
def foo(a):
    pass
"""

Right now, as I understand it, the last applied decorator would win, which means 'func._projectname__something' would be set to either sphinx or mypy.  That means that order matters for completely orthogonal concepts.  This is bad.  UUIDs solve this, and all the earlier problems.

> - What would pydoc print for the function signature?

As I mentioned earlier, certain UUIDs might become de facto or de jure standards.  In this case, projects that have common goals could settle on a common standard and publish a common UUID.  Pydoc would know about these UUIDs (they would be published), and would know what to do for them.  For UUIDs it doesn't understand, it could raise a warning, or simply ignore them.

Before you take my comments above as proof the we don't need UUIDs, consider the fact that we are currently discussing type systems, and our thoughts may change in the future.  I don't mean that there will be successive standards, I mean that there may be competing standards, at least until we really know what the best one will be.  This is a case where creating and abandoning UUIDs will be trivial, but where using 'type checker' is going to lead to confusion.

> On 24 August 2014 03:18, Cem Karan <cfkaran2 at gmail.com> wrote:
>> 
>> On Aug 20, 2014, at 11:08 AM, Paul Moore <p.f.moore at gmail.com> wrote:
>> 
>>> Sigh. I go away for a week and come back to a mega-thread I can never
>>> hope to catch up on :-)
>>> 
>>> TL; DR; - Although mypy looks interesting, I think it's too soon to
>>> close the door on all other uses of annotations. Let's find a solution
>>> that allows exploration of alternative uses for a while longer.
>>> 
>>> OK, can I just make some points regarding the static typing thread.
>>> First of all, I have no issue with the idea of static typing, and in
>>> fact I look forward to seeing what benefits it might have (if nothing
>>> else, the pointer to mypy, which I'd never heard of before, is
>>> appreciated). It won't be something I use soon (see below) but that's
>>> fine.
>>> 
>>> But Guido seems to be saying (on a number of occasions) that nobody is
>>> really using annotations, so he wants to focus on the static typing
>>> use case alone. I think this is a mistake. First of all, I see no
>>> reason why functions using typing annotations could not be introduced
>>> with a decorator[1]. So why insist that this is the *only* use of
>>> annotations, when it's pretty easy to allow others to co-exist?
>>> 
>>> Also, the "nobody is using annotations" argument? Personally, I know
>>> of a few other uses:
>>> 
>>> 1. Argument parsers - at least 3 have been mentioned in the thread.
>>> 2. Structure unpacking - I think there is a library that uses
>>> annotations for this, although I may be wrong.
>>> 3. FFI bindings. I know I've seen this discussed, although I can't
>>> find a reference just now.
>>> 
>>> There are probably other ideas around as well (GUI bindings,
>>> documentation generation, ...) None are particularly mature, and most
>>> are just at the "ideas" stage, but typically the ideas I have seen are
>>> the sort of thing you'd write a library for, and Python 3 only
>>> libraries *really* aren't common yet.
>>> 
>>> The problem for people wanting to experiment with annotations, is that
>>> they need to be writing Python 3 only code. While Python 3 adoption is
>>> growing rapidly, I suspect that large applications are typically still
>>> focused on either going through, or tidying up after, a 2-3 migration.
>>> And new projects, while they may be developed from the ground up using
>>> Python 3, will typically be using programmers skilled in Python 2, to
>>> whom Python 3 features are not yet an "instinctive" part of the
>>> toolset. Apart from large standalone applications, there are smaller
>>> scripts (which are typically going to be too small to need
>>> programming-in-the-large features like annotations) and libraries
>>> (which really aren't yet in a position to drop Python 2.x totally,
>>> unless they have a fairly small user base).
>>> 
>>> So I don't see it as compelling that usage of annotations in the wild
>>> is not yet extensive.
>>> 
>>> Rather than close the door on alternative uses of annotations, can I suggest:
>>> 
>>> 1. By all means bless mypy syntax as the standard static typing
>>> notation - this seems like a good thing.
>>> 2. Clarify that static typing annotations should be introduced with a
>>> decorator. Maybe reserve a decorator name ("@typed"?) that has a dummy
>>> declaration in the stdlib, and have a registration protocols for tools
>>> to hook their own implementation into it.[2]
>>> 3. Leave the door open for other uses of decorators, at least until
>>> some of the more major libraries drop Python 2.x support completely
>>> (and hence can afford to have a dependency on a Python 3 only module
>>> that uses annotations). See at that stage if annotations take off.
>>> 4. If we get to a point where even libraries that *could* use
>>> annotations don't, then revisit the idea of restricting usage to just
>>> type information.
>>> 
>>> Paul
>>> 
>>> [1] Also, a decorator could allow a Python 2 compatible form by using
>>> decorator arguments as an alternative to annotations.
>>> [2] I've yet to see a clear explanation of how "a tool using type
>>> annotations" like an linter, editor, IDE or Python compiler would use
>>> them in such a way that precludes decoration as a means of signalling
>>> the annotation semantics.
>> 
>> A long while back I proposed a mechanism for sharing __annotations__ between multiple, non-cooperating projects.  The basic idea is that each annotation becomes a dictionary.  Each project (and 'project' is a very loosely defined concept here) chooses a UUID that it uses as key into the dictionary.  The value is up to the project.
>> 
>> The advantage to this is manifold:
>> 
>> - Annotations can still have multiple uses by different groups without stepping on each other's toes.
>> - If someone wants to make a standard, all they have to do is publish the UUID associated with their standard.  For example, we might choose UUID('2cca6238-9fca-4053-aa3d-db9050e6b26b') as the official type information UUID.  All projects that want to develop linters, documentation generators, etc., will use that UUID for all annotations, and the PEP will require it.
>> - de facto standards can become de jure standards by blessing a particular UUID.
>> - Guessing if this method is being used is relatively easy; if its a dictionary, and if every key is a UUID, it probably follows this standard.  We can tighten it further by requiring some key-value pair be in every dictionary (i.e., {UUID('1ad60d50-8237-4b98-b2b1-69fd08ed575c'):"PEPXXXX"} is always in the dictionary).  This makes it fairly simple to add without stomping on what people are already doing.
>> - Finding the standard on the web should also be easy; while you might not find the PEP instantly, you'll probably zoom into it fairly fast.
>> 
>> Disadvantages:
>> 
>> - Typing out UUIDs is PAINFUL.  I highly recommend using decorators instead.
>> - Reading the __annotations__ dictionary will be difficult.  pprint() should make this easier.
>> 
>> I have working proof-of-concept code at https://github.com/oranguman/annotizer that defines a decorator class that handles the UUID for you.  It needs to be extended to parse out information, but it handles the 'other use cases' problem fairly well.
>> 
>> Thanks,
>> Cem Karan

Thanks,
Cem Karan

[Python-ideas] Was: Annotations (and static typing), Now:Sharing __annotations__

[Python-ideas] Was: Annotations (and static typing), Now:Sharing annotations