[Distutils] recollections of Pycon distutils versioning discussion (part 2)

Paul Moore p.f.moore at gmail.com
Fri Jun 12 11:49:10 CEST 2009

2009/6/12 Ben Finney <ben+python at benfinney.id.au>:
> I realise now that this has an unintended effect: that version strings
> which have letters in differing case will compare ASCIIbetically, which
> may be non-obvious:
>    1.2.C1
>    1.2.D1
>    1.2.REV876
>    1.2.a1
>    1.2.b1
>    1.2.rev543
> I hereby simplify the above specification and its semantics, by
> declaring upper-case letters outside the scope of a version string. A
> component can have characters from the set [0-9a-z], removing the above
> cases of non-obvious comparison.
>    1.2.a1
>    1.2.b1
>    1.2.c1
>    1.2.d1
>    1.2.rev543
>    1.2.rev876

One other aspect of standard practice that I just realised your rules
don't cover is where version strings differ in length. The normal
lexicographic "shortest is earliest" rule doesn't work properly:

1.2a1 vs 1.2 (I hope everyone agrees that 1.2a1 is earlier)

Even adding a dot, 1.2.a1 vs 1.2 compares wrongly (and gets worse when
you add in 1.2.1...)

Here's an alternative suggestion:

* Versions are treated as dot-separated tuples
* Comparison is component-by-component, exactly as Python tuples compare
* Components must have the form [a-z]*[0-9]+([a-z][0-9]+)? (ie,
optional leading alphas, an integer, and an optional "letter-integer"
* Call the 3 parts "prefix" ([a-z]*), "number" ([0-9]+), "suffix" ([a-z][0-9]+)
* Components compare as follows:
  - Components with differing prefixes are incomparable[1]. Otherwise,
ignore the prefix.
  - Within this, sort by the number part (as a number, not as text)
  - Within this, components with a suffix sort BEFORE those without,
in the obvious letter-then-number order.

That's a little messy, but I think it follows people's intuition,
allows for most of the variations people want, and most importantly
(to my mind) isolates the complexity to how *components* sort against
each other (the high-level rule is "like tuples", which is simple).

[1] Note that I see the "prefix" as cosmetic. I would expect real
projects to use a fixed prefix on a component-by-component basis -
1.2.r34567 or 1.2.dev5 or whatever, but never a mix of 1.2.3,
1.2.r1234 and 1.2.dev5. Hence, I have said that mixed prefixes are
incomparable. If this causes an outcry, the following rule could be
used instead:

  - Components with a prefix sort before components without, in
alphabetic order of prefix

but in my view it adds unnecessary complexity (and hence I'd like to
see real-world, justified use cases).

Hmm, this doesn't allow for a component which is a SHA ID (something
like a Mercurial revision ID). Given that these aren't ordered, I
think that's OK as they don't make usable version numbers in any case.


More information about the Distutils-SIG mailing list