Can you explain a bit more what would be the consequence of the type annotations in pandas itself?

We would keep the type annotations in pandas for maintaining the pandas code (i.e., type checking the code that is written by pandas developers), but not have to worry about typing the public API in conjunction with maintaining the internal typing.  They could evolve separately, if needed.
 
I suppose we wouldn't remove those? (we also have type annotations for non-public APIs) Or how would those be kept in sync?

That's not entirely clear to me, but I would say that whenever the public API changes, then the pandas-stubs project would get updated.  
 
Another question: what is the main advantage for doing so? I suppose this doesn't make it necessarily easier for the user, but is the goal the make the type stubs better maintainable?


To me, the advantages are:
1. Maintainability - we just have to publish stubs for the public API and not any internal routines, and in some sense, the published stubs are a check for that API
2. Tests - we can develop a set of tests that test the type stubs independent of all the other tests we do
3.  Reconciling Issues - with a separate project, any issues with the type stubs for the public API would be in a different GitHub project, which people who consume the API could contribute to, without having to worry about dealing with the full pandas code base, setting up a dev environment, etc.
4.  Faster release schedule - because the type stubs code base would be small, as issues/PRs are reconciled, it could be released on a more regular basis, rather than waiting for a full pandas release.

Regarding my comments (3) and (4) - I have been regularly contributing PRs to the Microsoft stubs that are included with Visual Studio Code https://github.com/microsoft/python-type-stubs/tree/main/pandas when I find issues with code that I write or members of my team write that doesn't pass the VS Code pyright basic type checks.  Being able to do so without waiting for a full pandas release is very helpful!  Since pylance in VS Code gets updated every week or two, that means that any changes in the type stubs that were approved by the maintainers end up getting released pretty quickly (and automatically updated).

Would the type-stubs package be for a specific pandas version (and get somewhat synced releases?)

I think we would sync it with minor releases, but not patch releases, since the public API shouldn't change in a patch release.

I'd like to discuss this in the pandas dev meeting.  Marco also pointed me to another set of stubs at https://github.com/VirtusLab/pandas-stubs .  That latter project has a nice blog about how they created their stubs here:  https://medium.com/virtuslab/pandas-stubs-how-we-enhanced-pandas-with-type-annotations-1f69ecf1519e

There is also https://github.com/predictive-analytics-lab/data-science-types/tree/master/pandas-stubs

-Irv



On Tue, Dec 7, 2021 at 11:30 AM Joris Van den Bossche <jorisvandenbossche@gmail.com> wrote:
Hi Irv,

I am not very familiar with the typing space so some questions below.

Can you explain a bit more what would be the consequence of the type annotations in pandas itself? I suppose we wouldn't remove those? (we also have type annotations for non-public APIs) Or how would those be kept in sync?

Another question: what is the main advantage for doing so? I suppose this doesn't make it necessarily easier for the user, but is the goal the make the type stubs better maintainable?
Would the type-stubs package be for a specific pandas version (and get somewhat synced releases?)

Joris

On Tue, 23 Nov 2021 at 17:22, Irv Lustig <irv@princeton.com> wrote:
I discovered this feature of typing:
https://www.python.org/dev/peps/pep-0561/#stub-only-packages

The idea is that for a package like pandas, we can have a separate package "pandas-stubs" that would contain the type stubs for pandas.  We wouldn't have to worry about including a `py.typed` file or `.pyi` files in our standard pandas distribution - all typing for the public API would be in the separate package.  That would allow pandas typing for the public API to be maintained separately (different GitHub repo).  We could start by just copying over what Microsoft created at https://github.com/microsoft/python-type-stubs/tree/main/pandas and then we maintain it as a separate repo, which could be installed via pip and conda.

Any thoughts on whether we should consider doing this?

-Irv

_______________________________________________
Pandas-dev mailing list
Pandas-dev@python.org
https://mail.python.org/mailman/listinfo/pandas-dev