recollections of Pycon distutils versioning discussion (part 2)
Here-in the second part of my recollections of the Pycon distutils versioning discussions. This first part is here: http://mail.python.org/pipermail/distutils-sig/2009-June/012143.html The first part left us with this pseudo-pattern: N.N[.N]*[(abc)N[.N]*] Some examples to help illustrate: 1.0 1.4.2.45.2.2.5 1.2.3a4 1.34.5.6c1 3.0b2.1 That leaves the ".devN" and ".postN" business of the current version proposal (PEP 386). # A brief setuptools lesson (Mainly this is because I needed it -- I'm not intimately familiar with setuptools. Skip this section if you know this already.) Setuptools defines (http://peak.telecommunity.com/DevCenter/setuptools#specifying-your-project-s...) "pre-release tags" and "post-release tags" on a version number. In setuptools one part of the "pre-release tags" are the "a", "b", "c" (for alpha, beta, candidate) parts of the proposed pattern that we've already discussed. The remaining part of "pre-release tags" is the use of "dev" or "pre", e.g.: 1.0dev 2.5.2pre 3.1dev459259 (Strictly speaking also "preview" and any string that sorts less than "final", but "dev" and "pre" are the common ones.) "Post-release tags" are any non-numeric in the version string that sorts greater than "final" or the special case "-". Practically speaking it is almost always the latter because setuptools provides command-line options (http://peak.telecommunity.com/DevCenter/setuptools#egg-info) to automatically append an svn rev ("-rNNNNN") or a date ("-YYYYMMDD"). Examples: 1.0-r2345 5.4.3-20090107 # ".devNNNN" in the new versioning proposal Reminder: I'm just trying to describe my recollections of the Pycon discussions here. I'll post my current opinions in a separate email. I.e., don't shoot the messenger. :) In general, people tended to agree that development builds are a reasonable thing. Your project is in the run-up to 1.0 and you'll be doing regular builds at incremental revisions of your source code control system. Setuptools encourages (by offering certain command-line options) spelling this kind of thing this way (though, of course there are other spellings of the equivalent): 1.0dev-r342 1.0dev-r343 1.0dev-r350 1.0dev-r355 At this point in the Pycon discussions I vaguely recall that there was some opposition to having support for spelling development builds in the versioning scheme at all. However, I don't fully recall. If so, they were won over (or drowned out) by (a) others giving examples of doing daily/nightly builds of product X -- including me describing nightlies of the Komodo product (not a Python project) I work on -- and (b) the start of the debate on how to spell "1.0dev-r342", those presuming that its inclusion was a forgone conclusion. I'll touch more on this in a followup email. Ensuing debate, pleading, fisticuffs, cajoling ... leading to N.N[.N]*[(abc)N[.N]*][.devN] 1.0.dev456 2.5.0b2.dev523 Why "dev" at all and not "-", "_", or "~" or just another "."? None of the alternatives were explicit that this is a release that should sort *earlier* than the version without it. Also, '-' is used to setuptools to mean a *post*-release; '_' causes problems for Debian packaging; '~' is used in Debian package versioning (IIRC, best to avoid using it); just '.' (e.g. '1.0a1.456') can be ambiguous. Why "dev"? Because it is by far the most common string used for this in current PyPI versions. "pre" a far second. Why ".devN" (1.0a1.dev456) instead of just "devN" (1.0a1dev456)? Partly a feeling that the '.' helped separate from the leading part -- if "1.0.a1" didn't look ridiculous we probably would have chosen that too. :) Also partly because there is justifiably little point in quibbling for hours over a single '.' here. # ".postNNNN" in the new versioning proposal Amid fading sounds of hands being dusted off, backs being patted and rising discussions of what's for dinner ("Salad bar or steak?" "What's the special tonight?") the question: "What about post-releases?" -- Zooko Groans. "Who uses that?" "Twisted, for example." "Really? Hrm, okay, let's tack on a '.postN' (like the '.devN') and call it a night." The discussion *was* longer than that, but not a whole lot longer. Mostly the justification for ".postN" in the proposal rested on the example of at least a couple significant Python projects using this (e.g. Twisted). An example of the resulting sorting order: 1.0a1 1.0a2.dev456 1.0a2 1.0a2.1.dev456 1.0a2.1 1.0b1.dev456 1.0b2 1.0c1.dev456 1.0c1 1.0.dev456 1.0 1.0.post456 and the final pseudo-pattern: N.N[.N]*[(a|b|c)N[.N]*][(.devN|.postN)] Those are my recollections of the Pycon versioning discussions. I want to following up (in a separate email) with my current opinions. Cheers, Trent -- Trent Mick trentm@gmail.com
2009/6/11 Trent Mick <trentm@gmail.com>:
Those are my recollections of the Pycon versioning discussions. I want to following up (in a separate email) with my current opinions.
Thank you for posting this. It helps *a lot*. Some questions that come immediately to mind: 1. From your description of events, and from discussion here, the "post" tag is clearly the least thought through aspect of all this. So a proponent of this should step up to justify why 1.0.5.post3 is required, when 1.0.5.3 is available already. 2. It sounds like all of the discussions were focused on formalising a *definition* of version string syntax, with no mention of what use would be made of the definition (i.e., what software benefits from a well-defined version-parsing API). Can someone please enumerate the expected use cases for the new version parsing API? The obvious one is dependency handling - I'll add my thoughts on this separately. I can't think of any other compelling use cases. Paul.
Trent Mick <trentm@gmail.com> writes:
Here-in the second part of my recollections of the Pycon distutils versioning discussions. This first part is here: http://mail.python.org/pipermail/distutils-sig/2009-June/012143.html
Thank you for this series, it is good to have these summaries as a focus for discussion.
The first part left us with this pseudo-pattern:
N.N[.N]*[(abc)N[.N]*]
Some examples to help illustrate:
1.0 1.4.2.45.2.2.5 1.2.3a4 1.34.5.6c1 3.0b2.1
An important part of the specification here is how versions compare sequentially. My understanding is the above version strings should be compared by the following rules: * A version string is interpreted as a tuple of components. Each component is a sequence of characters from the set [0-9A-Za-z]. Each component is separated from others by a single full-stop (‘.’) character. * Two version strings are compared for sequence by comparing their component tuples. * Each pair of components (from the two version tuples) is compared from left to right. * Contiguous sequences of digits are interpreted as integers and compared numerically; other characters compare as per the ASCII character set sequence. * The first component in turn that differs between the two tuples thereby determines the sequencing of the two version strings; if all corresponding components compare equal then the two strings represent the same version. My experience with version string interpretations suggests the above rules formally describe a fairly solid consensus about how version strings should compare.
That leaves the ".devN" and ".postN" business of the current version proposal (PEP 386).
Right, thank you for treating this as an addendum.
# A brief setuptools lesson
(Mainly this is because I needed it -- I'm not intimately familiar with setuptools. Skip this section if you know this already.)
I think this description of setuptools's specifics is interesting only in that it describes one particular implementation; I do *not* think any authority should be vested in this implementation merely because it's setuptools.
# ".devNNNN" in the new versioning proposal
In general, people tended to agree that development builds are a reasonable thing.
No disagreement here.
Your project is in the run-up to 1.0 and you'll be doing regular builds at incremental revisions of your source code control system. Setuptools encourages (by offering certain command-line options) spelling this kind of thing this way (though, of course there are other spellings of the equivalent):
1.0dev-r342 1.0dev-r343 1.0dev-r350 1.0dev-r355
My beef comes in at this point. Why on earth should we have a specification that enshrines the above, which is only one of many incompatible special-case extensions to an otherwise simple comparison algorithm? It's especially puzzling why we would choose something more complex when “the run-up to 1.0” is easily versioned as version numbers that will compare as previous to version 1.0. 0.0.5 0.18.3 0.23 0.999.dev-r342 0.999.dev-r343 0.999.dev-r350 0.999.dev-r355 1.0
At this point in the Pycon discussions I vaguely recall that there was some opposition to having support for spelling development builds in the versioning scheme at all. However, I don't fully recall.
It's precisely the fact that it's (AFAICT) far more controversial than the solid consensus on the version comparisons without special cases that makes me think it has no place in a specification that we hope to be broadly accepted.
If so, they were won over (or drowned out) by (a) others giving examples of doing daily/nightly builds of product X -- including me describing nightlies of the Komodo product (not a Python project) I work on -- and (b) the start of the debate on how to spell "1.0dev-r342", those presuming that its inclusion was a forgone conclusion. I'll touch more on this in a followup email.
If that's an accurate representation of the discussion, I want to point out the huge non sequitur that has occurred. I've demonstrated above that one can easily have nightly builds of product X *and* have a simple version comparison algorithm that has no special-cased words. So it's a false dilemma to present it as “simple version comparison versus versions for nightly builds”. We *can* have both, so anyone who wants to take one of those away still has all their persuasion work ahead of them.
Why "dev" at all and not "-", "_", or "~" or just another "."? […]
Why "dev"? […]
Why ".devN" (1.0a1.dev456) instead of just "devN" (1.0a1dev456)? […]
All of these are secondary to “Why special-case any word at all, instead of just using versions that *already* compare in the right sequence by a simple uncontroversial algorithm with no special cases?”
# ".postNNNN" in the new versioning proposal
Amid fading sounds of hands being dusted off, backs being patted and rising discussions of what's for dinner ("Salad bar or steak?" "What's the special tonight?") the question:
"What about post-releases?" -- Zooko
What's wrong with appending more components so that these post-releases will compare in the right sequence anyway? Another demonstration: 1.0 # release of 1.0 1.0.0.1 # first post-1.0 release 1.0.0.2 # second post-1.0 release 1.0.1 # working on the next release 1.0.2 1.0.3 The “work leading up to version 1.1” has already been tagged version 1.0.1, 1.0.2 etc. Then the need for an additional release of the version 1.0 code is needed; that's tagged version 1.0.0.1, and the next such release can be 1.0.0.2, etc. while 1.0.3 is developed. This *works*, and is easily understood and IME obvious to know where the versions compare. So again, anyone who wants to take away one of “simple version comparison” and “ability to tag post-release versions in correct sequence” has all their persuasive work ahead of them. So I think both of “.devXXX needs to be special-cased” and “.postXXX needs to be special-cased” are demonstrably false, and as a result the inclusion of these into the specification is not sufficiently justified.
Those are my recollections of the Pycon versioning discussions. I want to following up (in a separate email) with my current opinions.
Thanks again. -- \ “Truth would quickly cease to become stranger than fiction, | `\ once we got as used to it.” —Henry L. Mencken | _o__) | Ben Finney
On Jun 11, 2009, at 2:37 AM, Paul Moore wrote:
1. From your description of events, and from discussion here, the "post" tag is clearly the least thought through aspect of all this. So a proponent of this should step up to justify why 1.0.5.post3 is required, when 1.0.5.3 is available already.
My motivation is that the leading sequence of numbers is chosen by a human to communicate some information such as major rewrite (major), feature addition (minor), or bugfix (micro), while the numbers after the "-r" or the ".post" are chosen by the version control system to simply count patches or give a secure hash of the current tree state or whatever. Another bit of information that we thus encode into the version number is whether it is a stable release or a snapshot -- stable releases don't have a -r$COUNT in their version number. The current stable release of Tahoe is v1.4.1, as visible on PyPI: http://pypi.python.org/pypi/allmydata-tahoe . The current snapshot is v1.4.1-r3908, as visible on our web server: http://allmydata.org/ source/tahoe/tarballs/?C=M;O=D . If the new "rational version number" definition excludes ".post", and if I choose to make Tahoe snapshot version numbers be rational version numbers, then I could make snapshots be named e.g. v1.4.1.3908. Then I would have v1.4.1.3909, etc. until one day I would have v1.4.1.3948 and then v1.5.0. The next snapshot would be numbered v1.5.0.3949. I would hope that people who are looking for stable releases don't find the v1.5.0.3949 tarball (since it isn't on PyPI), or if they do find it that they realize from the extra long version number that it is a snapshot instead of a stable release. I'm willing to change my build system to produce $MAJ.$MIN.$MIC.post $COUNT instead of $MAJ.$MIN.$MIC-r$COUNT, in order to achieve rationality (i.e., in order to make my versions look more like other people's versions and in order to be compatible with some hypothetical far-future tool which is picky and refuses to use software with irrational version numbers). I'm not yet sure whether I'm willing to change it to $MAJ.$MIN.$MIC.$COUNT. Regards, Zooko
On Thu, 11 Jun 2009 08:15:30 -0600, Zooko Wilcox-O'Hearn <zooko@zooko.com> wrote:
On Jun 11, 2009, at 2:37 AM, Paul Moore wrote:
1. From your description of events, and from discussion here, the "post" tag is clearly the least thought through aspect of all this. So a proponent of this should step up to justify why 1.0.5.post3 is required, when 1.0.5.3 is available already.
My motivation is that the leading sequence of numbers is chosen by a human to communicate some information such as major rewrite (major), feature addition (minor), or bugfix (micro), while the numbers after the "-r" or the ".post" are chosen by the version control system to simply count patches or give a secure hash of the current tree state or whatever. Another bit of information that we thus encode into the version number is whether it is a stable release or a snapshot -- stable releases don't have a -r$COUNT in their version number.
The current stable release of Tahoe is v1.4.1, as visible on PyPI: http://pypi.python.org/pypi/allmydata-tahoe . The current snapshot is v1.4.1-r3908, as visible on our web server: http://allmydata.org/ source/tahoe/tarballs/?C=M;O=D .
If the new "rational version number" definition excludes ".post", and if I choose to make Tahoe snapshot version numbers be rational version numbers, then I could make snapshots be named e.g. v1.4.1.3908. Then I would have v1.4.1.3909, etc. until one day I would have v1.4.1.3948 and then v1.5.0. The next snapshot would be numbered v1.5.0.3949. I would hope that people who are looking for stable releases don't find the v1.5.0.3949 tarball (since it isn't on PyPI), or if they do find it that they realize from the extra long version number that it is a snapshot instead of a stable release.
I'm willing to change my build system to produce $MAJ.$MIN.$MIC.post $COUNT instead of $MAJ.$MIN.$MIC-r$COUNT, in order to achieve rationality (i.e., in order to make my versions look more like other people's versions and in order to be compatible with some hypothetical far-future tool which is picky and refuses to use software with irrational version numbers). I'm not yet sure whether I'm willing to change it to $MAJ.$MIN.$MIC.$COUNT.
This is basically how I feel about Twisted (and Nevow, Axiom, Mantissa, Epsilon, Sine, Quotient, etc) right noow. Jean-Paul
2009/6/11 Zooko Wilcox-O'Hearn <zooko@zooko.com>:
If the new "rational version number" definition excludes ".post", and if I choose to make Tahoe snapshot version numbers be rational version numbers, then I could make snapshots be named e.g. v1.4.1.3908. Then I would have v1.4.1.3909, etc. until one day I would have v1.4.1.3948 and then v1.5.0. The next snapshot would be numbered v1.5.0.3949. I would hope that people who are looking for stable releases don't find the v1.5.0.3949 tarball (since it isn't on PyPI), or if they do find it that they realize from the extra long version number that it is a snapshot instead of a stable release.
I'm willing to change my build system to produce $MAJ.$MIN.$MIC.post$COUNT instead of $MAJ.$MIN.$MIC-r$COUNT, in order to achieve rationality (i.e., in order to make my versions look more like other people's versions and in order to be compatible with some hypothetical far-future tool which is picky and refuses to use software with irrational version numbers). I'm not yet sure whether I'm willing to change it to $MAJ.$MIN.$MIC.$COUNT.
I really don't fully understand your motivation here (how "stable" a release is, is more about where you get it from and how it is advertised, than about the version number) but if it's going to stop the arguments, let's go with Ben's definition, specified in a previous email (dot-separated components, each component sorted alphanumerically, with sequences of digits treated as numbers). Then you can use .r12345, or .post12345, or even .helloguysIcanabusethesystem12345. But please don't enshrine specific fixed strings in the spec. Paul
If the "rational version number" spec allows arbitrary alphanumeric components, then I can write something like $MAJ.$MIC.$MIN.snapshot $COUNT. Any tool which wants to compare my version numbers to each other will do the right thing -- newer snapshots are higher than older snapshots, snapshots are higher than the release (i.e. the same version string with the ".snapshot$COUNT" part truncated off), and newer releases are higher than any snapshot of an older release. So far so good. Now, should the "rational version number spec" *also* encourage those of us who use this technique to use the same spelling for ".snapshot" / "-r" / ".post"? This would not effect any version comparison, but it would be nice for us humans if everyone chose the same word when they mean the same thing. Regards, Zooko
2009/6/11 Zooko Wilcox-O'Hearn <zooko@zooko.com>:
So far so good. Now, should the "rational version number spec" *also* encourage those of us who use this technique to use the same spelling for ".snapshot" / "-r" / ".post"? This would not effect any version comparison, but it would be nice for us humans if everyone chose the same word when they mean the same thing.
Not if it perpetuates this endless discussion, IMHO :-) Paul.
Zooko Wilcox-O'Hearn <zooko@zooko.com> writes:
So far so good. Now, should the "rational version number spec"
I would like to avoid this term, since version strings do not represent rational numbers (nor, indeed, should they be interpreted as any kind of single number). If we must have a name for it, I propose “consistent version comparison spec” as less confusing than the above term.
*also* encourage those of us who use this technique to use the same spelling for ".snapshot" / "-r" / ".post"?
This is where I don't think the specification should express an opinion. Keep it simple and declarative.
This would not effect any version comparison, but it would be nice for us humans if everyone chose the same word when they mean the same thing.
It might be nice, but it's not necessary to the specification since version comparison only needs to be consistent within versions of *the same thing*, so for the sake of specification it doesn't matter if different projects choose different tokens. The specification should provide *examples*, and make those examples sensible; but don't favour anything that's not meant to be normative. -- \ “I cannot conceive that anybody will require multiplications at | `\ the rate of 40,000 or even 4,000 per hour …” —F. H. Wales, 1936 | _o__) | Ben Finney
Ben Finney <ben+python@benfinney.id.au> writes:
Note that the specification I propose allows any alphanumeric character [0-9A-Za-z], so you could also make the version string ‘8.2.0.r27002’ if you want the “revision number” component to be visibly different while still having an obvious sequencing semantic.
Ben Finney <ben+python@benfinney.id.au> writes:
My understanding is the above version strings should be compared by the following rules:
* A version string is interpreted as a tuple of components. Each component is a sequence of characters from the set [0-9A-Za-z]. Each component is separated from others by a single full-stop (‘.’) character. […]
* Contiguous sequences of digits are interpreted as integers and compared numerically; other characters compare as per the ASCII character set sequence.
I realise now that this has an unintended effect: that version strings which have letters in differing case will compare ASCIIbetically, which may be non-obvious: 1.2.C1 1.2.D1 1.2.REV876 1.2.a1 1.2.b1 1.2.rev543 I hereby simplify the above specification and its semantics, by declaring upper-case letters outside the scope of a version string. A component can have characters from the set [0-9a-z], removing the above cases of non-obvious comparison. 1.2.a1 1.2.b1 1.2.c1 1.2.d1 1.2.rev543 1.2.rev876 -- \ “The process by which banks create money is so simple that the | `\ mind is repelled.” —John Kenneth Galbraith, _Money: Whence It | _o__) Came, Where It Went_, 1975 | Ben Finney
2009/6/12 Ben Finney <ben+python@benfinney.id.au>:
I realise now that this has an unintended effect: that version strings which have letters in differing case will compare ASCIIbetically, which may be non-obvious:
1.2.C1 1.2.D1 1.2.REV876 1.2.a1 1.2.b1 1.2.rev543
I hereby simplify the above specification and its semantics, by declaring upper-case letters outside the scope of a version string. A component can have characters from the set [0-9a-z], removing the above cases of non-obvious comparison.
1.2.a1 1.2.b1 1.2.c1 1.2.d1 1.2.rev543 1.2.rev876
One other aspect of standard practice that I just realised your rules don't cover is where version strings differ in length. The normal lexicographic "shortest is earliest" rule doesn't work properly: 1.2a1 vs 1.2 (I hope everyone agrees that 1.2a1 is earlier) Even adding a dot, 1.2.a1 vs 1.2 compares wrongly (and gets worse when you add in 1.2.1...) Here's an alternative suggestion: * Versions are treated as dot-separated tuples * Comparison is component-by-component, exactly as Python tuples compare * Components must have the form [a-z]*[0-9]+([a-z][0-9]+)? (ie, optional leading alphas, an integer, and an optional "letter-integer" suffix) * Call the 3 parts "prefix" ([a-z]*), "number" ([0-9]+), "suffix" ([a-z][0-9]+) * Components compare as follows: - Components with differing prefixes are incomparable[1]. Otherwise, ignore the prefix. - Within this, sort by the number part (as a number, not as text) - Within this, components with a suffix sort BEFORE those without, in the obvious letter-then-number order. That's a little messy, but I think it follows people's intuition, allows for most of the variations people want, and most importantly (to my mind) isolates the complexity to how *components* sort against each other (the high-level rule is "like tuples", which is simple). [1] Note that I see the "prefix" as cosmetic. I would expect real projects to use a fixed prefix on a component-by-component basis - 1.2.r34567 or 1.2.dev5 or whatever, but never a mix of 1.2.3, 1.2.r1234 and 1.2.dev5. Hence, I have said that mixed prefixes are incomparable. If this causes an outcry, the following rule could be used instead: - Components with a prefix sort before components without, in alphabetic order of prefix but in my view it adds unnecessary complexity (and hence I'd like to see real-world, justified use cases). Hmm, this doesn't allow for a component which is a SHA ID (something like a Mercurial revision ID). Given that these aren't ordered, I think that's OK as they don't make usable version numbers in any case. Paul.
Paul Moore <p.f.moore@gmail.com> writes:
Here's an alternative suggestion:
* Versions are treated as dot-separated tuples * Comparison is component-by-component, exactly as Python tuples compare
Agreed so far (unsurprisingly, because so far it matches the algorithm I outlined).
* Components must have the form [a-z]*[0-9]+([a-z][0-9]+)? (ie, optional leading alphas, an integer, and an optional "letter-integer" suffix) * Call the 3 parts "prefix" ([a-z]*), "number" ([0-9]+), "suffix" ([a-z][0-9]+) * Components compare as follows: - Components with differing prefixes are incomparable[1]. Otherwise, ignore the prefix. - Within this, sort by the number part (as a number, not as text) - Within this, components with a suffix sort BEFORE those without, in the obvious letter-then-number order.
That's a little messy
More than a little. That's not something I'd expect people to keep in their head without needing to look at the specification frequently; or, worse, make a guess and often get it wrong.
but I think it follows people's intuition,
I think it's far from obvious that this represents an intuitive comparison scheme. It's yet another set of special cases for certain tokens, as far as I can see; which leads us back to the point that as soon as we get into those, there's far less consensus about how they should work.
allows for most of the variations people want, and most importantly (to my mind) isolates the complexity to how *components* sort against each other (the high-level rule is "like tuples", which is simple).
Yes, I've no disagreement about version strings being sorted like tuples at the component level. I don't see how you claim that as a distinguishing characteristic of this suggestion.
[1] Note that I see the "prefix" as cosmetic. I would expect real projects to use a fixed prefix on a component-by-component basis - 1.2.r34567 or 1.2.dev5 or whatever, but never a mix of 1.2.3, 1.2.r1234 and 1.2.dev5.
Why not? We've already had people talking about a mix of ‘a123’, ‘post123’, ‘b123’, ‘r123’, ‘dev123’, etc. I think any version comparison scheme needs to allow a definite statement to be made about the sequencing of *any* possible version strings, with only the answers {equal, less-than, greater-than} possible.
Hence, I have said that mixed prefixes are incomparable. If this causes an outcry,
The cry is a simple “What does that mean I should do with two version strings that are incomparable?”.
the following rule could be used instead:
- Components with a prefix sort before components without, in alphabetic order of prefix
but in my view it adds unnecessary complexity (and hence I'd like to see real-world, justified use cases).
Actually, that seems *simpler*. It's such a simplification, in fact, that it makes this rule redundant: it's covered already by the existing rules (AFAICT). You've essentially got components within components, but some components sort differently from others by non-obvious rules, and worst of all some components “are incomparable”; what does *that* mean when I need to compare them for sequence? I think “can keep the whole specification in one's head easily” is an important criterion for any version comparison scheme that we promote for the standard library. People should be able to learn it once, then be able to look at any two version strings in the future and quickly know what sequence Python will put them in, without going back to the specification again for special cases and differing comparison rules. -- \ “I have never imputed to Nature a purpose or a goal, or | `\ anything that could be understood as anthropomorphic.” —Albert | _o__) Einstein, unsent letter, 1955 | Ben Finney
participants (5)
-
Ben Finney
-
Jean-Paul Calderone
-
Paul Moore
-
Trent Mick
-
Zooko Wilcox-O'Hearn