[Distutils] Indexing modules in Python distributions

Wes Turner wes.turner at gmail.com
Wed Feb 8 18:06:28 EST 2017


On Wednesday, February 8, 2017, Thomas Kluyver <thomas at kluyver.me.uk> wrote:

> Thanks Steve, Chris,
>
> On Tue, Feb 7, 2017, at 04:49 PM, Chris Wilcox wrote:
>
> I may be able to help jump-start this a bit and provide a platform for
> this to run on. I deployed a small service that scans PyPI to figure out
> statistics on Python 2 vs Python 3 support using PyPI Classifiers. The
> source is on GitHub: https://github.com/crwilcox/PyPI-Gatherer. It
> watches the PyPI updates feed and refreshes entries for packages as they
> show up as modified. It should be possible to add your lib, query, and add
> an additional row or two to the result. I am happy to work together on
> this. Also, the data is stored in an Azure Table Storage which has rest
> endpoints (and a Python SDK) that makes getting the published data
> straight-forward.
>
>
> I had a quick look through this, and it does look like it should provide a
> useful framework for scanning PyPI and updating the results. :-)
>
> What I'm proposing differs in that it would need to download files from
> PyPI - basically all of them, if we're thorough about it. I imagine that's
> going to involve a lot of data transfer. Do we know what order of magnitude
> we're talking about? Is it so large that we should be thinking of running
> the scanner in the same data centre as the file storage?
>


So, IIUC,
you're looking to emit
((URL, release, platform), namespaces_odict)
for each new and all existing packages;
by uncompressing every package and running every setup.py (hopefully in a
container)?

https://github.com/python/pypi-salt/blob/master/provisioning/salt/roots/pillar/top.sls

https://github.com/python/pypi-salt/blob/master/provisioning/salt/roots/pillar/warehouse-deploys/warehouse-dev.sls

https://github.com/python/pypi-salt/blob/master/provisioning/salt/roots/salt/warehouse/web.sls

-
https://github.com/pypa/warehouse/blob/master/warehouse/packaging/search.py
 - elasticsearch_dsl
-
https://github.com/pypa/warehouse/blob/master/warehouse/packaging/models.py
  - SQLAlchemy
- https://github.com/pypa/warehouse/blob/master/warehouse/celery.py
  - celery

- https://github.com/pypa/warehouse/blob/master/warehouse/legacy/api/json.py
  - namespaces are useful metadata (worth adding to the spec)
    - https://github.com/pypa/interoperability-peps/issues/31
      - JSONLD

- https://github.com/python/psf-salt/blob/master/pillar/prod/top.sls
- https://github.com/python/psf-salt/blob/master/pillar/prod/roles.sls

- One CI project (container FROM python: (debian)) per python package with
additional metadata per project?
  - conda-forge solves for this case
    - and then how to post the extra metadata (build artifact) back from
the CI build and mark the task as done


Could this (namespace extraction) be added to 'setup.py build' for the
future?


>
> Thomas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20170208/cb73e01b/attachment.html>


More information about the Distutils-SIG mailing list