[Catalog-sig] transition to pypi-hosting through server-side changes

holger krekel holger at merlinux.eu
Sat Mar 9 08:22:22 CET 2013

Hi all,

i think Philip Eby brought up a very worthwhile idea to consider: 
if we can transition to a no-external-hosting situation by making
pypi-server changes without requiring client-side installers or
releases processes to change, that would be great.  We would
have one place to implement things, and less friction on the probably
millions of places where pip/easy_install and CI/release processes
are used today.

Basically all revolves around the issue of what links are
served on the simple/* pages.

What about adding a "hosting mode" field to a package which effects
all historic and future releases, i.e. the mode is not specific to a 
particular release but to all releases.  This field could have these
values and meanings:

- "pypi-only": homepage/download links are not added to simple/ pages
  unless they are #egg ones.  Release registration with a non-empty and
  non-#egg download url is rejected.  client-side tools will not need to
  crawl or download anything externally unless requring an #egg 
  development tarball. 

- "pypi-cache": homepage/download pages are crawled at the pypi server side
  exactly once at release registration time.  Or once at "transition" time
  when an author chooses to have his externally hosted release files be
  served from pypi.
- "pypi-linkext": homepage/download urls are crawled at the pypi server
  side for release files, and the simple/ page serves links to them without
  requiring client-side tools to crawl external sites for determining
  the set of candidate release files.  Legally, this should not pose
  a problem because the files are still hosted externally so we could
  at some point automatically switch projects to this mode.
- "pypi-ext": like it is today: homepage/download urls are presented in 
  simple/ pages and client-side tools need to crawl them themselves to 
  find release file links.

Now it is a matter of choosing good defaults and designing friendly
user interactions to allow package maintainers to move to at least pypi-cache 
or best "pypi-only" mode.  My current thoughts on this:

- 90% of the projects could directly get the "pypi-only" mode as a default
  according to Donald's statistics.  They'd still receive a mail 
  with a link to a page where they can change the mode, if needed.
  And of course the friendly information that "pypi-only" provides
  the fastest and most reliable way for users to install their package.

- 10% of the projects having external release files: 
  - if they have their newest releases on pypi already, they could get
    a "linkext" mode so that client-side tools will not need to crawl
    and not need to download from external sites, if they only 
    look for the newest release
  - if they have their newest release on pypi, they could get "ext" mode
    as default

  in either case, maintainers/authors get a mail with a link to the page
  where they can change the mode.  And with information about the time frame
  for phasing out particular modes:

  - pypi-ext: in N months we automatically switch this mode to pypi-linkext
  - in N+M months only "pypi-only" and "pypi-cache" is allowed. 
    With the latter you can still host your files externally but you need to 
    accept that pypi caches release files at release registration time and 
    serves them afterwards itself.  
    If you do not agree, your release files will not be automatically 
    discoverable anymore and you need to tell your users how to install 
    things manually through the descrition of your package.
  - (and maybe: in N+M+X months only pypi-hosted is allowed as a mode)

I think this (or a variation/refinements of this scheme) would offer a 
smooth transition where nobody needs to get upset and people would clearly 
see we are doing everything we can to make it easy to transition.


More information about the Catalog-SIG mailing list