[Python-checkins] peps: PEP 426: analyse the PyPI metrics correctly
nick.coghlan
python-checkins at python.org
Thu Feb 21 14:37:39 CET 2013
http://hg.python.org/peps/rev/516b67ed1a2d
changeset: 4757:516b67ed1a2d
user: Nick Coghlan <ncoghlan at gmail.com>
date: Thu Feb 21 23:37:00 2013 +1000
summary:
PEP 426: analyse the PyPI metrics correctly
files:
pep-0426.txt | 145 +++++++++++++-----
pep-0426/pepsort.py | 249 ++++++++++++++++++-------------
2 files changed, 249 insertions(+), 145 deletions(-)
diff --git a/pep-0426.txt b/pep-0426.txt
--- a/pep-0426.txt
+++ b/pep-0426.txt
@@ -44,8 +44,9 @@
distribution.
This format is parseable by the ``email`` module with an appropriate
-``email.policy.Policy()``. When ``metadata`` is a Unicode string,
-```email.parser.Parser().parsestr(metadata)`` is a serviceable parser.
+``email.policy.Policy()`` (see `Appendix A`_). When ``metadata`` is a
+Unicode string, ```email.parser.Parser().parsestr(metadata)`` is a
+serviceable parser.
There are three standard locations for these metadata files:
@@ -1358,25 +1359,41 @@
Finally, as the version scheme in use is dependent on the metadata
version, it was deemed simpler to merge the scheme definition directly into
-this PEP rather than continuing to maintain it as a separate PEP. This will
-also allow all of the distutils-specific elements of PEP 386 to finally be
-formally rejected.
+this PEP rather than continuing to maintain it as a separate PEP.
-The following statistics provide an analysis of the compatibility of existing
-projects on PyPI with the specified versioning scheme (as of 16th February,
-2013).
+`Appendix B` shows detailed results of an analysis of PyPI distribution
+version information, as collected on 19th February, 2013. This analysis
+compares the behaviour of the explicitly ordered version schemes defined in
+this PEP and PEP 386 with the de facto standard defined by the behaviour
+of setuptools. These metrics are useful, as the intent of both PEPs is to
+follow existing setuptools behaviour as closely as is feasible, while
+still throwing exceptions for unorderable versions (rather than trying
+to guess an appropriate order as setuptools does).
-* Total number of distributions analysed: 28088
-* Distributions with no releases: 248 / 28088 (0.88 %)
-* Fully compatible distributions: 24142 / 28088 (85.95 %)
-* Compatible distributions after translation: 2830 / 28088 (10.08 %)
-* Compatible distributions after filtering: 511 / 28088 (1.82 %)
-* Distributions sorted differently after translation: 38 / 28088 (0.14 %)
-* Distributions sorted differently without translation: 2 / 28088 (0.01 %)
-* Distributions with no compatible releases: 317 / 28088 (1.13 %)
+Overall, the percentage of compatible distributions improves from 97.7%
+with PEP 386 to 98.7% with this PEP. While the number of projects affected
+in practice was small, some of the affected projects are in widespread use
+(such as Pinax and selenium). The surprising ordering discrepancy also
+concerned developers and acted as an unnecessary barrier to adoption of
+the new metadata standard.
+
+The data also shows that the pre-release sorting discrepancies are seen
+only when analysing *all* versions from PyPI, rather than when analysing
+public versions. This is largely due to the fact that PyPI normally reports
+only the most recent version for each project (unless the maintainers
+explicitly configure it to display additional versions). However,
+installers that need to satisfy detailed version constraints often need
+to look at all available versions, as they may need to retrieve an older
+release.
+
+Even this PEP doesn't completely eliminate the sorting differences relative
+to setuptools:
+
+* Sorts differently (after translations): 38 / 28194 (0.13 %)
+* Sorts differently (no translations): 2 / 28194 (0.01 %)
The two remaining sort order discrepancies picked up by the analysis are due
-to a pair of projects which have published releases ending with a carriage
+to a pair of projects which have PyPI releases ending with a carriage
return, alongside releases with the same version number, only *without* the
trailing carriage return.
@@ -1390,26 +1407,6 @@
standard scheme will normalize both representations to ".devN" and sort
them by the numeric component.
-For comparison, here are the corresponding analysis results for PEP 386:
-
-* Total number of distributions analysed: 28088
-* Distributions with no releases: 248 / 28088 (0.88 %)
-* Fully compatible distributions: 23874 / 28088 (85.00 %)
-* Compatible distributions after translation: 2786 / 28088 (9.92 %)
-* Compatible distributions after filtering: 527 / 28088 (1.88 %)
-* Distributions sorted differently after translation: 96 / 28088 (0.34 %)
-* Distributions sorted differently without translation: 14 / 28088 (0.05 %)
-* Distributions with no compatible releases: 543 / 28088 (1.93 %)
-
-These figures make it clear that only a relatively small number of current
-projects are affected by these changes. However, some of the affected
-projects are in widespread use (such as Pinax and selenium). The
-changes also serve to bring the standard scheme more into line with
-developer's expectations, which is an important element in encouraging
-adoption of the new metadata version.
-
-The script used for the above analysis is available at [3]_.
-
A more opinionated description of the versioning scheme
-------------------------------------------------------
@@ -1550,8 +1547,10 @@
.. [3] Version compatibility analysis script:
http://hg.python.org/peps/file/default/pep-0426/pepsort.py
-Appendix
-========
+Appendix A
+==========
+
+The script used for this analysis is available at [3]_.
Parsing and generating the Metadata 2.0 serialization format using
Python 3.3::
@@ -1610,6 +1609,74 @@
# Correct if sys.stdout.encoding == 'UTF-8':
Generator(sys.stdout, maxheaderlen=0).flatten(m)
+Appendix B
+==========
+
+Metadata v2.0 guidelines versus setuptools::
+
+ $ ./pepsort.py
+ Comparing PEP 426 version sort to setuptools.
+
+ Analysing release versions
+ Compatible: 24477 / 28194 (86.82 %)
+ Compatible with translation: 247 / 28194 (0.88 %)
+ Compatible with filtering: 84 / 28194 (0.30 %)
+ No compatible versions: 420 / 28194 (1.49 %)
+ Sorts differently (after translations): 0 / 28194 (0.00 %)
+ Sorts differently (no translations): 0 / 28194 (0.00 %)
+ No applicable versions: 2966 / 28194 (10.52 %)
+
+ Analysing public versions
+ Compatible: 25600 / 28194 (90.80 %)
+ Compatible with translation: 1505 / 28194 (5.34 %)
+ Compatible with filtering: 13 / 28194 (0.05 %)
+ No compatible versions: 420 / 28194 (1.49 %)
+ Sorts differently (after translations): 0 / 28194 (0.00 %)
+ Sorts differently (no translations): 0 / 28194 (0.00 %)
+ No applicable versions: 656 / 28194 (2.33 %)
+
+ Analysing all versions
+ Compatible: 24239 / 28194 (85.97 %)
+ Compatible with translation: 2833 / 28194 (10.05 %)
+ Compatible with filtering: 513 / 28194 (1.82 %)
+ No compatible versions: 320 / 28194 (1.13 %)
+ Sorts differently (after translations): 38 / 28194 (0.13 %)
+ Sorts differently (no translations): 2 / 28194 (0.01 %)
+ No applicable versions: 249 / 28194 (0.88 %)
+
+Metadata v1.2 guidelines versus setuptools::
+
+ $ ./pepsort.py 386
+ Comparing PEP 386 version sort to setuptools.
+
+ Analysing release versions
+ Compatible: 24244 / 28194 (85.99 %)
+ Compatible with translation: 247 / 28194 (0.88 %)
+ Compatible with filtering: 84 / 28194 (0.30 %)
+ No compatible versions: 648 / 28194 (2.30 %)
+ Sorts differently (after translations): 0 / 28194 (0.00 %)
+ Sorts differently (no translations): 0 / 28194 (0.00 %)
+ No applicable versions: 2971 / 28194 (10.54 %)
+
+ Analysing public versions
+ Compatible: 25371 / 28194 (89.99 %)
+ Compatible with translation: 1507 / 28194 (5.35 %)
+ Compatible with filtering: 12 / 28194 (0.04 %)
+ No compatible versions: 648 / 28194 (2.30 %)
+ Sorts differently (after translations): 0 / 28194 (0.00 %)
+ Sorts differently (no translations): 0 / 28194 (0.00 %)
+ No applicable versions: 656 / 28194 (2.33 %)
+
+ Analysing all versions
+ Compatible: 23969 / 28194 (85.01 %)
+ Compatible with translation: 2789 / 28194 (9.89 %)
+ Compatible with filtering: 530 / 28194 (1.88 %)
+ No compatible versions: 547 / 28194 (1.94 %)
+ Sorts differently (after translations): 96 / 28194 (0.34 %)
+ Sorts differently (no translations): 14 / 28194 (0.05 %)
+ No applicable versions: 249 / 28194 (0.88 %)
+
+
Copyright
=========
diff --git a/pep-0426/pepsort.py b/pep-0426/pepsort.py
--- a/pep-0426/pepsort.py
+++ b/pep-0426/pepsort.py
@@ -20,6 +20,8 @@
PEP426_VERSION_RE = re.compile('^(\d+(\.\d+)*)((a|b|c|rc)(\d+))?'
'(\.(post)(\d+))?(\.(dev)(\d+))?$')
+PEP426_PRERELEASE_RE = re.compile('(a|b|c|rc|dev)\d+')
+
def pep426_key(s):
s = s.strip()
m = PEP426_VERSION_RE.match(s)
@@ -60,23 +62,28 @@
return nums, pre, post, dev
+def is_release_version(s):
+ return not bool(PEP426_PRERELEASE_RE.search(s))
+
def cache_projects(cache_name):
logger.info("Retrieving package data from PyPI")
client = xmlrpclib.ServerProxy('http://python.org/pypi')
projects = dict.fromkeys(client.list_packages())
+ public = projects.copy()
failed = []
for pname in projects:
- time.sleep(0.1)
+ time.sleep(0.01)
logger.debug("Retrieving versions for %s", pname)
try:
projects[pname] = list(client.package_releases(pname, True))
+ public[pname] = list(client.package_releases(pname))
except:
failed.append(pname)
logger.warn("Error retrieving versions for %s", failed)
with open(cache_name, 'w') as f:
- json.dump(projects, f, sort_keys=True,
+ json.dump([projects, public], f, sort_keys=True,
indent=2, separators=(',', ': '))
- return projects
+ return projects, public
def get_projects(cache_name):
try:
@@ -84,11 +91,11 @@
except IOError as exc:
if exc.errno != errno.ENOENT:
raise
- projects = cache_projects(cache_name);
+ projects, public = cache_projects(cache_name);
else:
with f:
- projects = json.load(f)
- return projects
+ projects, public = json.load(f)
+ return projects, public
VERSION_CACHE = "pepsort_cache.json"
@@ -112,109 +119,139 @@
"426": pep426_key,
}
+class Analysis:
+
+ def __init__(self, title, projects, releases_only=False):
+ self.title = title
+ self.projects = projects
+
+ num_projects = len(projects)
+
+ compatible_projects = Category("Compatible", num_projects)
+ translated_projects = Category("Compatible with translation", num_projects)
+ filtered_projects = Category("Compatible with filtering", num_projects)
+ incompatible_projects = Category("No compatible versions", num_projects)
+ sort_error_translated_projects = Category("Sorts differently (after translations)", num_projects)
+ sort_error_compatible_projects = Category("Sorts differently (no translations)", num_projects)
+ null_projects = Category("No applicable versions", num_projects)
+
+ self.categories = [
+ compatible_projects,
+ translated_projects,
+ filtered_projects,
+ incompatible_projects,
+ sort_error_translated_projects,
+ sort_error_compatible_projects,
+ null_projects,
+ ]
+
+ sort_key = SORT_KEYS[pepno]
+ sort_failures = 0
+ for i, (pname, versions) in enumerate(projects.items()):
+ if i % 100 == 0:
+ sys.stderr.write('%s / %s\r' % (i, num_projects))
+ sys.stderr.flush()
+ if not versions:
+ logger.debug('%-15.15s has no versions', pname)
+ null_projects.add(pname)
+ continue
+ # list_legacy and list_pep will contain 2-tuples
+ # comprising a sortable representation according to either
+ # the setuptools (legacy) algorithm or the PEP algorithm.
+ # followed by the original version string
+ # Go through the PEP 386/426 stuff one by one, since
+ # we might get failures
+ list_pep = []
+ release_versions = set()
+ prerelease_versions = set()
+ excluded_versions = set()
+ translated_versions = set()
+ for v in versions:
+ s = v
+ try:
+ k = sort_key(v)
+ except Exception:
+ s = suggest_normalized_version(v)
+ if not s:
+ good = False
+ logger.debug('%-15.15s failed for %r, no suggestions', pname, v)
+ excluded_versions.add(v)
+ continue
+ else:
+ try:
+ k = sort_key(s)
+ except ValueError:
+ logger.error('%-15.15s failed for %r, with suggestion %r',
+ pname, v, s)
+ excluded_versions.add(v)
+ continue
+ logger.debug('%-15.15s translated %r to %r', pname, v, s)
+ translated_versions.add(v)
+ if is_release_version(s):
+ release_versions.add(v)
+ else:
+ prerelease_versions.add(v)
+ if releases_only:
+ logger.debug('%-15.15s ignoring pre-release %r', pname, s)
+ continue
+ list_pep.append((k, v))
+ if releases_only and prerelease_versions and not release_versions:
+ logger.debug('%-15.15s has no release versions', pname)
+ null_projects.add(pname)
+ continue
+ if not list_pep:
+ logger.debug('%-15.15s has no compatible versions', pname)
+ incompatible_projects.add(pname)
+ continue
+ # The legacy approach doesn't refuse the temptation to guess,
+ # so it *always* gives some kind of answer
+ if releases_only:
+ excluded_versions |= prerelease_versions
+ accepted_versions = set(versions) - excluded_versions
+ list_legacy = [(legacy_key(v), v) for v in accepted_versions]
+ assert len(list_legacy) == len(list_pep)
+ sorted_legacy = sorted(list_legacy)
+ sorted_pep = sorted(list_pep)
+ sv_legacy = [t[1] for t in sorted_legacy]
+ sv_pep = [t[1] for t in sorted_pep]
+ if sv_legacy != sv_pep:
+ if translated_versions:
+ logger.debug('%-15.15s translation creates sort differences', pname)
+ sort_error_translated_projects.add(pname)
+ else:
+ logger.debug('%-15.15s incompatible due to sort errors', pname)
+ sort_error_compatible_projects.add(pname)
+ logger.debug('%-15.15s unequal: legacy: %s', pname, sv_legacy)
+ logger.debug('%-15.15s unequal: pep%s: %s', pname, pepno, sv_pep)
+ continue
+ # The project is compatible to some degree,
+ if excluded_versions:
+ logger.debug('%-15.15s has some compatible versions', pname)
+ filtered_projects.add(pname)
+ continue
+ if translated_versions:
+ logger.debug('%-15.15s is compatible after translation', pname)
+ translated_projects.add(pname)
+ continue
+ logger.debug('%-15.15s is fully compatible', pname)
+ compatible_projects.add(pname)
+
+ def print_report(self):
+ print("Analysing {}".format(self.title))
+ for category in self.categories:
+ print(" ", category)
+
+
def main(pepno = '426'):
- sort_key = SORT_KEYS[pepno]
print('Comparing PEP %s version sort to setuptools.' % pepno)
- projects = get_projects(VERSION_CACHE)
- num_projects = len(projects)
-
- null_projects = Category("No releases", num_projects)
- compatible_projects = Category("Compatible", num_projects)
- translated_projects = Category("Compatible with translation", num_projects)
- filtered_projects = Category("Compatible with filtering", num_projects)
- sort_error_translated_projects = Category("Translations sort differently", num_projects)
- sort_error_compatible_projects = Category("Incompatible due to sorting errors", num_projects)
- incompatible_projects = Category("Incompatible", num_projects)
-
- categories = [
- null_projects,
- compatible_projects,
- translated_projects,
- filtered_projects,
- sort_error_translated_projects,
- sort_error_compatible_projects,
- incompatible_projects,
- ]
-
- sort_failures = 0
- for i, (pname, versions) in enumerate(projects.items()):
- if i % 100 == 0:
- sys.stderr.write('%s / %s\r' % (i, num_projects))
- sys.stderr.flush()
- if not versions:
- logger.debug('%-15.15s has no releases', pname)
- null_projects.add(pname)
- continue
- # list_legacy and list_pep will contain 2-tuples
- # comprising a sortable representation according to either
- # the setuptools (legacy) algorithm or the PEP algorithm.
- # followed by the original version string
- list_legacy = [(legacy_key(v), v) for v in versions]
- # Go through the PEP 386/426 stuff one by one, since
- # we might get failures
- list_pep = []
- excluded_versions = set()
- translated_versions = set()
- for v in versions:
- try:
- k = sort_key(v)
- except Exception:
- s = suggest_normalized_version(v)
- if not s:
- good = False
- logger.debug('%-15.15s failed for %r, no suggestions', pname, v)
- excluded_versions.add(v)
- continue
- else:
- try:
- k = sort_key(s)
- except ValueError:
- logger.error('%-15.15s failed for %r, with suggestion %r',
- pname, v, s)
- excluded_versions.add(v)
- continue
- logger.debug('%-15.15s translated %r to %r', pname, v, s)
- translated_versions.add(v)
- list_pep.append((k, v))
- if not list_pep:
- logger.debug('%-15.15s has no compatible releases', pname)
- incompatible_projects.add(pname)
- continue
- # Now check the versions sort as expected
- if excluded_versions:
- list_legacy = [(k, v) for k, v in list_legacy
- if v not in excluded_versions]
- assert len(list_legacy) == len(list_pep)
- sorted_legacy = sorted(list_legacy)
- sorted_pep = sorted(list_pep)
- sv_legacy = [t[1] for t in sorted_legacy]
- sv_pep = [t[1] for t in sorted_pep]
- if sv_legacy != sv_pep:
- if translated_versions:
- logger.debug('%-15.15s translation creates sort differences', pname)
- sort_error_translated_projects.add(pname)
- else:
- logger.debug('%-15.15s incompatible due to sort errors', pname)
- sort_error_compatible_projects.add(pname)
- logger.debug('%-15.15s unequal: legacy: %s', pname, sv_legacy)
- logger.debug('%-15.15s unequal: pep%s: %s', pname, pepno, sv_pep)
- continue
- # The project is compatible to some degree,
- if excluded_versions:
- logger.debug('%-15.15s has some compatible releases', pname)
- filtered_projects.add(pname)
- continue
- if translated_versions:
- logger.debug('%-15.15s is compatible after translation', pname)
- translated_projects.add(pname)
- continue
- logger.debug('%-15.15s is fully compatible', pname)
- compatible_projects.add(pname)
-
- for category in categories:
- print(category)
-
+ projects, public = get_projects(VERSION_CACHE)
+ print()
+ Analysis("release versions", public, releases_only=True).print_report()
+ print()
+ Analysis("public versions", public).print_report()
+ print()
+ Analysis("all versions", projects).print_report()
# Uncomment the line below to explore differences in details
# import pdb; pdb.set_trace()
# Grepping the log files is also informative
--
Repository URL: http://hg.python.org/peps
More information about the Python-checkins
mailing list