The Breaking of distutils and PyPI for Python 3000?

As I'm digging into packaging issues here at PyCon, a couple of Python 3000 related matters occur to me. As I'm new to the Python 3000 development, if these have already been addressed in prior discussions, I apologize for your time. 1. What is the plan for PyPI when Python 3.0 comes out and dependencies start getting satisfied from distribution across the great divide, e.g. a 3.0-specific package pulls from PyPI a 2.x-specific package to meet some need? Are there plans to fork PyPI, apply special tags to uploads or what? While binary distributions are tagged with the Python version, source distributions are not. And of course a dependency expression as it stands today for "SomePackage > 2.4" may pull 3.0 to satisfy it. 2. There have been attempts over the years to fix distutils, with the last one being in 2006 by Anthony Baxter. He stated that a major hurdle was the strong demand to respect backward compatibility and he finally gave up. One of the purposes of Python 3.0 was the freedom to break backward compatibility for the sake of "doing the right thing". So is it now permissible to give distutils a good reworking and stop letting compatibility issues hold us back? -Jeff

On 19/03/2008, Jeff Rush <jeff@taupro.com> wrote:
1. What is the plan for PyPI when Python 3.0 comes out and dependencies start getting satisfied from distribution across the great divide, e.g. a 3.0-specific package pulls from PyPI a 2.x-specific package to meet some need?
As distutils (and core Python) doesn't do any automatic dependency management, this is a setuptools issue. As such, it's up to setuptools to deal with it. There may be infrastructure changes that would be generally useful, but there's nothing *needed* for the core.
2. There have been attempts over the years to fix distutils, with the last one being in 2006 by Anthony Baxter. He stated that a major hurdle was the strong demand to respect backward compatibility and he finally gave up. One of the purposes of Python 3.0 was the freedom to break backward compatibility for the sake of "doing the right thing". So is it now permissible to give distutils a good reworking and stop letting compatibility issues hold us back?
Sounds reasonable. I'm sure patches would be considered, but past discussions around "including setuptools" have been controversial and generally not reached consensus (for reasons other than pure backward compatibility). Also, while compatibility isn't as important for 3.0, smooth migration *is* - so any incompatible proposal must include some consideration of how to assist people with huge, complex setup.py files which use distutils internals in complex ways. So be prepared to do some work :-) (But I'd be happy to see distutils improved. I just don't have any need for such improvement, personally). Paul.

1. What is the plan for PyPI when Python 3.0 comes out and dependencies start getting satisfied from distribution across the great divide, e.g. a 3.0-specific package pulls from PyPI a 2.x-specific package to meet some need? Are there plans to fork PyPI, apply special tags to uploads or what?
I don't see the need to for PyPI. For packages (or "distributions", to avoid confusion with Python packages), I see two options: a) provide a single release that supports both 2.x and 3.x. The precise strategy to do so might vary. If one is going for a single source version, have setup.py run 2to3 (or perhaps 3to2). For dual-source packages, have setup.py just install the one for the right version; setup.py itself needs to be written so it runs on both versions (which is easy to do). b) switch to Python 3 at some point (i.e. burn your bridges). You seem to be implying that some projects may release separate source distributions. I cannot imagine why somebody would want to do that.
2. There have been attempts over the years to fix distutils, with the last one being in 2006 by Anthony Baxter. He stated that a major hurdle was the strong demand to respect backward compatibility and he finally gave up.
Can you kindly refer to some archived discussion for that?
One of the purposes of Python 3.0 was the freedom to break backward compatibility for the sake of "doing the right thing". So is it now permissible to give distutils a good reworking and stop letting compatibility issues hold us back?
I don't know what the proposed changes are, but for some changes; in general, I feel that the need for backwards compatibility is exaggerated. Regards, Martin

Martin v. Löwis wrote:
1. What is the plan for PyPI when Python 3.0 comes out and dependencies start getting satisfied from distribution across the great divide, e.g. a 3.0-specific package pulls from PyPI a 2.x-specific package to meet some need? Are there plans to fork PyPI, apply special tags to uploads or what?
I don't see the need to for PyPI. For packages (or "distributions", to avoid confusion with Python packages), I see two options:
a) provide a single release that supports both 2.x and 3.x. The precise strategy to do so might vary. If one is going for a single source version, have setup.py run 2to3 (or perhaps 3to2). For dual-source packages, have setup.py just install the one for the right version; setup.py itself needs to be written so it runs on both versions (which is easy to do). b) switch to Python 3 at some point (i.e. burn your bridges).
You seem to be implying that some projects may release separate source distributions. I cannot imagine why somebody would want to do that.
While not quite to the same scale as the 2 to 3 transition, this problem seems like one that would generally already exist. If one writes code that uses newer 2.4/2.5 features (say decorators for example,) it won't build/run on 2.3 or earlier installs. How have people been handling that sort of situation? Is it only by not using the newer features or is there some situation I'm just not seeing in a brief review of a few projects pages on PyPI where there is only one source tarball? -- Dave

While not quite to the same scale as the 2 to 3 transition, this problem seems like one that would generally already exist. If one writes code that uses newer 2.4/2.5 features (say decorators for example,) it won't build/run on 2.3 or earlier installs. How have people been handling that sort of situation? Is it only by not using the newer features or is there some situation I'm just not seeing in a brief review of a few projects pages on PyPI where there is only one source tarball?
I think packages have taken all sorts of responses to this issue. Some will list the minimum required Python version in their README, some might put a test in setup.py that aborts installation if the Python version is too old, some may just install and let the user find out at runtime. Typically, packages try to support all the Python versions that their users still use. If a user of an older Python version comes along, they'll just need to fetch the older release (which hopefully is still online, or can be extracted from the source repository). Regards, Martin

Martin v. Löwis wrote:
I don't see the need to for PyPI. For packages (or "distributions", to avoid confusion with Python packages), I see two options:
a) provide a single release that supports both 2.x and 3.x. b) switch to Python 3 at some point (i.e. burn your bridges).
You seem to be implying that some projects may release separate source distributions. I cannot imagine why somebody would want to do that.
Yes, I am assuming that existing projects would at some point introduce a 3.x version and maybe continue a 2.x version as separate distros, similar to Python itself. Then the large number of existing unqualified dependencies on, say SQLObject, would pull in the higher 3.x version and crash. It's the older projects that don't get updated often that are at risk of being destabilized by the arrival of 3.x specific code in PyPI. Are developers for Python 3.x encouraged in 3.x guidelines to release 'fat' distributions that combine 2.x and 3.x usable versions? There is also some hassle with 2.x programmers browsing PyPI for useful modules to incorporate in their programs, downloading them (w/easy_install so they don't see the project website) and getting streams of errors because they unknowningly hit a 3.x-specific version. Perhaps a convention of a keyword or more likely a new trove classifier that spells outs 3.x stuff, with indicators on package info pages and query filters on PyPI against that?
2. There have been attempts over the years to fix distutils, with the last one being in 2006 by Anthony Baxter. He stated that a major hurdle was the strong demand to respect backward compatibility and he finally gave up.
Can you kindly refer to some archived discussion for that?
http://mail.python.org/pipermail/python-dev/2006-April/063943.html "I started looking at this. The number of complaints I got when I started on this that it would break the existing distutils based installers totally discouraged me. In addition, the existing distutils codebase is ... not good. It is flatly not possible to "fix" distutils and preserve backwards compatibility." -Anthony Baxter
One of the purposes of Python 3.0 was the freedom to break backward compatibility for the sake of "doing the right thing". So is it now permissible to give distutils a good reworking and stop letting compatibility issues hold us back?
I don't know what the proposed changes are, but for some changes; in general, I feel that the need for backwards compatibility is exaggerated.
A controversial point, I'm afraid. Perhaps it is time for a parallel rewrite, so that those setup.py who import distutils get the old behavior, and those who import distutils2 get the new, and we let attrition and the community decide which is standard. -Jeff

Jeff Rush wrote:
Perhaps it is time for a parallel rewrite, so that those setup.py who import distutils get the old behavior, and those who import distutils2 get the new,
That sounds good to me. If anyone wants to have a go at this, I have some ideas on how to structure it that I'd be happy to discuss. -- Greg

Yes, I am assuming that existing projects would at some point introduce a 3.x version and maybe continue a 2.x version as separate distros, similar to Python itself. Then the large number of existing unqualified dependencies on, say SQLObject, would pull in the higher 3.x version and crash. It's the older projects that don't get updated often that are at risk of being destabilized by the arrival of 3.x specific code in PyPI. Are developers for Python 3.x encouraged in 3.x guidelines to release 'fat' distributions that combine 2.x and 3.x usable versions?
Passive voice is misleading here: encouraged by whom? *I* encourage people to consider that option, rather than assuming it couldn't possibly work. Whether it actually works, I don't know. I hope it would work, and I hope it would not be fat at all.
Perhaps a convention of a keyword or more likely a new trove classifier that spells outs 3.x stuff, with indicators on package info pages and query filters on PyPI against that?
I'm fine with adding more trove classifiers if that solves the problem (although I still assume the problem doesn't actually exist). As always, a classifier should not be added until there actually are two packages that want to use it.
Can you kindly refer to some archived discussion for that?
http://mail.python.org/pipermail/python-dev/2006-April/063943.html
"I started looking at this. The number of complaints I got when I started on this that it would break the existing distutils based installers totally discouraged me. In addition, the existing distutils codebase is ... not good.
It is flatly not possible to "fix" distutils and preserve backwards compatibility." -Anthony Baxter
Thanks. I still have the same position as I had then - if distutils is broken, it should be fixed, not ignored.
A controversial point, I'm afraid. Perhaps it is time for a parallel rewrite, so that those setup.py who import distutils get the old behavior, and those who import distutils2 get the new, and we let attrition and the community decide which is standard.
Is there a list of the problems with distutils somewhere? It always worked fine for me, so I see no reason to fix it in the first place. Regards, Martin

Martin v. Löwis wrote:
specific code in PyPI. Are developers for Python 3.x encouraged in 3.x guidelines to release 'fat' distributions that combine 2.x and 3.x usable versions?
Passive voice is misleading here: encouraged by whom?
"... encouraged in __3.x guidelines__ to ...": I presume although I've not found them yet that there is some kind of document for developers titled something like, "how to migrate your Python code from 2.x to 3.x". That document would be a logical place for advice and consideration of the tradeoffs of jumping to 3.x, maintaining two synced versions using 2to3 or 3to2, and the risks of keeping two independent releases. Identifying best practices would help them make good choices for the community.
*I* encourage people to consider that option, rather than assuming it couldn't possibly work. Whether it actually works, I don't know. I hope it would work, and I hope it would not be fat at all.
So we don't have an actual success story of a dual-version distribution, even as a prototype, using 2to3 within a distutils package? I would not encourage a practice without at least one such example.
Can you kindly refer to some archived discussion for that?
http://mail.python.org/pipermail/python-dev/2006-April/063943.html
Thanks. I still have the same position as I had then - if distutils is broken, it should be fixed, not ignored.
Since the precise API was not documented well and many people began to make use of ambiguous internal interfaces, such fixes would indeed break them. So your vote would be to do the right thing, even if it results in some breakage. I can respect that philosophy.
A controversial point, I'm afraid. Perhaps it is time for a parallel rewrite, so that those setup.py who import distutils get the old behavior, and those who import distutils2 get the new, and we let attrition and the community decide which is standard.
Is there a list of the problems with distutils somewhere?
Unfortunately no. Much of it is anecdotal, much of it occurs on lists outside the Python community by those attempting to package things. And some of it are comments by developers who peeked into the distutils source and blanched. And some of the problems are not bugs, per se, but disagreement on scope of functionality and a lack of well-known alternatives. So just "fix it if broken" doesn't work when there is no agreement on how to expand that scope. I am working on pulling together such a list however, and getting it into the tracker, so that debate with a record of decisions can occur. I agree that it is hard to fix problems if no one is _clearly_ reporting them to us. Too much smoke, not enough light.
It always worked fine for me, so I see no reason to fix it in the first place.
Pardon my lack of knowledge of your background; when you say it always worked fine for you, are you referring to personal experiences using it to _install_ software or to experiences as a packager in actually distributing complex collections of modules on different platforms? -Jeff

"... encouraged in __3.x guidelines__ to ...": I presume although I've not found them yet that there is some kind of document for developers titled something like, "how to migrate your Python code from 2.x to 3.x". That document would be a logical place for advice and consideration of the tradeoffs of jumping to 3.x, maintaining two synced versions using 2to3 or 3to2, and the risks of keeping two independent releases. Identifying best practices would help them make good choices for the community.
I don't think any of the core committers is qualified to write such a document. Instead, it would have to be written by people who *actually* ported a project from 2 to 3 in some form. That is not to say that such a document couldn't be part of the 3k release, or shouldn't be reviewed by core committers. [Also, it might turn out that some of the core committers writes such a document, with the theoretical background of what *could* work for projects. That would be a lot like all those books giving advise written by people who never followed their own advise because they never had a chance to].
So we don't have an actual success story of a dual-version distribution, even as a prototype, using 2to3 within a distutils package? I would not encourage a practice without at least one such example.
We don't have any success story for Python 3, period. Nobody has ever attempted to run a significant code base in Python 3, other than the test suite, AFAIK.
It always worked fine for me, so I see no reason to fix it in the first place.
Pardon my lack of knowledge of your background; when you say it always worked fine for you, are you referring to personal experiences using it to _install_ software or to experiences as a packager in actually distributing complex collections of modules on different platforms?
I've been maintaining a larger project (PyXML) for several years, and have written/maintained a few smaller projects (iconv, partial, python-fam), which all used distutils. I have also extended distutils in the core, with the upload and bdist_msi commands. And then there is the experience with installing distutils-based packages, which is usually pleasant (although I prefer to use the Debian package where available) Regards, Martin

"Martin v. Löwis" writes:
I don't think any of the core committers is qualified to write such a document. Instead, it would have to be written by people who *actually* ported a project from 2 to 3 in some form.
Note that this is precisely the kind of experience Skip Montanaro is talking about trying to generate in the context of SpamBayes in a thread on the python-3000 list entitled "Strategy for porting to 3.0?" I don't know if he plans to write a HOWTO himself, but he certainly intends to keep a lab notebook that will be of help in writing such a document.

>> I don't think any of the core committers is qualified to write such a >> document. Instead, it would have to be written by people who >> *actually* ported a project from 2 to 3 in some form. Stephen> Note that this is precisely the kind of experience Skip Stephen> Montanaro is talking about trying to generate in the context of Stephen> SpamBayes ... Correctamundo. Give that man a cigar. Skip
participants (7)
-
"Martin v. Löwis"
-
Dave Peterson
-
Greg Ewing
-
Jeff Rush
-
Paul Moore
-
skip@pobox.com
-
Stephen J. Turnbull