[Catalog-sig] PyPI enhancement doc

Ian Bicking ianb at colorstudy.com
Wed May 25 08:36:20 CEST 2005


Here's my first go at a document about the enhancements I asked about 
earlier...


Package Types
-------------

PyPI includes this table::

   CREATE TABLE release_urls (
      name TEXT,
      version TEXT,
      url TEXT,
      packagetype TEXT,
      FOREIGN KEY (name, version) REFERENCES releases (name, version)
   );

The ``packagetype`` column has no particular type or constraint.  I
suggest that it contain these types taken from ``distutils``:

sdist:
     a source distribution (tarball, zip file, etc.)
bdist:
     a built (binary) distribution
bdist_dumb:
     a "dumb" built distribution
bdist_rpm:
     an RPM distribution
bdist_wininst:
     an executable installer for MS Windows
bdist_egg:
     a Python Egg

In addition, I would like these possible values:

svn_trunk:
     the URL of a subversion repository; a setup.py file should be
     found directly under this URL.  This URL will not necessarily
     change for different revisions of the package.
svn_tag:
     the URL of a subversion repository.  A branch or tag for this
     specific version.

Issues
~~~~~~

The binary distributions are all potentially platform specific, but
(except for Windows) the platform isn't encoded into the packagetype.
Should packagetype be extended to include information like Python
version?  Should it have the same fields as the release_files table?
Potentially even more information is required to fully describe the
package type.

Another change would be to refactor the database, so that
release_files didn't have python_version, packagetype, or md5_digest
fields, but rather a third table (referenced by both release_files and
package_urls) reference this package-metadata field.

Potentially the last portion of the URL is itself self-describing.
The files produced by the bdist* methods generally are, and as long as
the URL doesn't obscure that filename it can be automatically
determined by clients.


The sdist type does not indicate what kind of archive is used -- most
notable, both zip and tar are possible.  (Does sdist create zip files
on its own?)  However, I think the client can determine the proper way
to unpack the file after downloading; this aspect of files is
self-describing.


An svn trunk is not a package file.  However, it is largely equivalent
to an sdist file, though it must be downloaded in a different way;
actually performing the download is a client concern, so it doesn't
really effect this.


XML-RPC methods
---------------

This is an initial list of proposed methods for PyPI to support:

package_releases(package_name):
     returns list of release versions, as strings, e.g., ['0.1', '0.2b',
     '0.2'], in chronological order.

package_stable_version(package_name):
     returns packages.stable_version; the current stable version of the
     package.  E.g., the string '0.3'

package_urls(package_name, version):
     A list of {'url': url, 'packagetype': packagetype}, like [{'url':
     'http://svn.pythonpaste.org/Paste/trunk', 'packagetype':
     'svn_trunk'}, {'url': 'http://pythonpaste.org/Paste-0.1.tar.gz',
     'packagetype': 'sdist'}]

package_data(package_name, version):
     A dictionary that summarizes the releases table, plus
     release_classifiers.  E.g.:

         {'name': 'OpenRelease',
          'version': '0.1.2',
          'author': 'Richard Harris',
          'author_email': 'goosequill at users.sourceforge.net',
          'maintainer': '',
          'maintainer_email': '',
          'homepage': 'http://open-release.sourceforge.net',
          'download_url':
 
'http://prdownloads.sourceforge.net/projects/open-release/OpenRelease-0.1.2.tar.gz',
          'description': """OpenRelease is a Python module which 
automates the packaging, release and announcement of Open Source 
software. The pack class creates packages, which are defined by packer 
classes, manages versioning, and brings up your notes and changelog in 
an editor. The release class uploads the package to SourceForge, 
releases it through QRS, announces it on freshmeat and (if appropriate) 
on pypi.""",
          'license': 'GNU General Public License',
          'platform': 'any',
          'classifiers': [
              'Development Status :: 4 - Beta',
              'Environment :: Console',
              'Intended Audience :: Developers',
              'License :: OSI Approved :: GNU General Public License (GPL)',
              'Natural Language :: English',
              'Operating System :: OS Independent',
              'Programming Language :: Python',
              'Topic :: Software Development'],
          'summary': '',
          'description_html': '',
          'keywords': '',
          }

     All keys are required.  None/NULL is translated to ''.  Open
     issues: will emails be obscured?  Is keywords turned into a list?

search(field_specifiers, [operator='and']):
     field_specifiers is a dictionary of {fieldname: searchvalue}.
     Returns a list like [(name, version)] of matching non-hidden
     records.  The search values are case-insensitive and match any
     substring.  The second argument indicates if all the field
     specifiers are ANDed or ORed together.  The value defaults to
     'and' and is case-insensitive.


I'm a little soft on these, since I don't know if specifiers and the
necessary metadata is really ready:

providing_packages(specifier):
     A list of (name, version) from release_provides.  E.g.,
     providing_packages('PageTemplate>=1.0') == [('zpt', '1.0')].  This
     will only return non-hidden packages.

requiring_packages(specifier):
     A list of (name, version) from release_requires.


Web UI Enhancements
-------------------

Search should be available with a single field, that searches all of
package name, summary, description/description_html, keywords.  This
should be on the front page.

There should be a contact form attached to each package that will be
emailed to any people on record as owners or maintainers, for
reporting bad or missing links or other incorrect data.  Anything
submitted through that form will also be stored in the database, and
the owner should mark it "resolved" when the issue is corrected.
Until the issue is resolved, the content will show up on the package
page (so other people can see the comment).  All links in such emails
should have rel="nofollow" (if HTML anchors are created at all).  This
should be clearly marked as being for problem reports only; adding
package comments is another issue.


More information about the Catalog-sig mailing list