[Distutils] PEP 503 - Simple Repository API

M.-A. Lemburg mal at egenix.com
Sat Sep 5 11:43:37 CEST 2015


On 05.09.2015 03:17, Donald Stufft wrote:
> You can see this PEP online at https://www.python.org/dev/peps/pep-0503/ or I
> have reproduced it inline below.

Thanks for writing this up, Donald.

Some comments below...

> -----------------------------------------------
> 
> Abstract
> ========
> 
> There are many implementations of a Python package repository and many tools
> that consume them. Of these, the cannonical implementation that defines what

s/cannonical/canonical/

> the "simple" repository API looks like is the implementation that powers
> PyPI. This document will specify that API, documenting what the correct
> behavior for any implementation of the simple repository API.
> 
> Specification
> =============
> 
> A repository that implements the simple API is defined by its base url, this is
> the top level URL that all additional URLS are below. The API is named the
> "simple" repository due to fact that PyPI's base URL is
> ``https://pypi.python.org/simple/``.
> 
> .. note:: All subsequent URLs in this document will be relative to this base
>           URL (so given PyPI's URL, an URL of ``/foo/`` would be
>           ``https://pypi.python.org/simple/foo/``).
> 
> 
> Within a repository, the root URL (``/``) **MUST** be a valid HTML5 page with a
> single anchor element per project in the repository. The text of the anchor tag
> **MUST** be the normalized name of the project and the href attribute **MUST**
> link to the URL for that particular project. As an example::
> 
>    <!DOCTYPE html>
>    <html>
>      <body>
>        <a href="/frob/">frob</a>
>        <a href="/spamspamspam/">spamspamspam</a>
>      </body>
>    </html>
> 
> Below the root URL is another URL for each individual project contained within
> a repository. The format of this URL is ``/<project>/`` where the ``<project>``
> is replaced by the normalized name for that project, so a project named
> "HolyGrail" would have an URL like ``/holygrail/``. 

Hmm, if the installer will build the URL itself, why is there even
a need for a top-level index page ?

I mean for the occasional human reading the page it will certainly
make sense to have such a page, but for the API this doesn't
appear to be essentially needed.

Or is the idea to have the package manager scan the index for package
hosted on that index prior to asking for the package it would like
to install ?

> This URL must response with
> a valid HTML5 page with a single anchor element per file for the project. The
> text of the anchor tag **MUST** be the filename of the file and the href
> attribute **MUST** be an URL that links to the location of the file for
> download. The URL **SHOULD** include a hash in the form of an URL fragment with
> the following syntax: ``#<hashname>=<hashvalue>``, where ``<hashname>`` is the
> lowercase name of the hash function (such as ``sha256``) and ``<hashvalue>`` is
> the hex encoded digest.
> 
> In addition to the above, the following constraints are placed on the API:
> 
> * All URLs **MUST** end with a ``/`` and the repository **SHOULD** redirect the
>   URLs without a ``/`` to add a ``/`` to the end.

I think you only meant this for URLs that point to index pages,
since doing this for filenames would not be such a good idea
(confuses the MIME content type logic).

For site navigation, pages will typically also include relative links
such as "..". The spec should not disallow these.

> * There is no constraints on where the files must be hosted relative to the
>   repository.
> 
> * There may be any other HTML elements on the API pages as long as the required
>   anchor elements exist.

Would it help the package manager to more easily detect the links
that point to distribution files instead of e.g. documentation or
other resources ?

setuptools uses rel="download" for this:

https://pythonhosted.org/setuptools/easy_install.html#package-index-api

The downside here is that a simple web server directory listing would
no longer be compatible with the spec, so perhaps just make this optional
to optimize the link scanning:

* Project pages SHOULD add a rel="homepage" attribute to link
  elements of distribution file.

The same could then be done for index page links to project pages:

* Index pages SHOULD add a rel="download" attribute to link
  elements of distribution file.

The rel attributes used here are the ones that setuptools requires,
in order to be able to build indexes which are compatible to
setuptools as well.

> * Repositories **MAY** redirect unnormalized URLs to the cannonical normalized
>   URL (e.g. ``/Foobar/`` may redirect to ``/foobar/``), however clients
>   **MUST NOT** rely on this redirection and **MUST** request the normalized
>   URL.
> 
> * Repositories **SHOULD** choose a hash function from one of the ones
>   guarenteed to be available via the ``hashlib`` module in the Python standard

s/guarenteed/guaranteed/

>   library (currently ``md5``, ``sha1``, ``sha224``, ``sha256``, ``sha384``,
>   ``sha512``). The current recommendation is to use ``sha256``.

Could we perhaps also add optional features like:

* Distribution link elements MAY include a data-gpg-sig="<url-of-gpg-sig>"
  attribute to provide a GPG signature of the linked file

This could later be extended to more meta data, such as platform
tags, distribution file types, license info, mirror locations,
documentation, help strings, etc.

> Normalized Names
> ----------------
> 
> This PEP references the concept of a "normalized" project name. As per PEP 426
> the only valid characters in a name are the ASCII alphabet, ASCII numbers,
> ``.``, ``-``, and ``_``. The name should be lowercased with all runs of the
> characters ``.``, ``-``, or ``_`` replaced with a single ``-`` character. This
> can be implemented in Python with the ``re`` module::
> 
>    import re
> 
>    def normalize(name):
>        return re.sub(r"[-_.]+", "-", name).lower()

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 05 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2015-08-27: Released eGenix mx Base 3.2.9 ...     http://egenix.com/go83
2015-09-18: PyCon UK 2015 ...                              13 days to go

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Distutils-SIG mailing list