Request for Input re Packaging

In researching the state of packaging, I've been reading the archives and all the bug reports filed against distutils. I'd like though to get some examples of particularly troublesome uses of setup.py, to pull together and propose some changes to make their use case a bit easier. So far such cases I've been made aware of are Twisted, numpy and SciPy. If you know of a tough case where the developer had to jump through hoops to make it work, please point me to it. I'd also like to get suggestions of improvements to PyPI, which I've not seen much discussion about. A few I've collected are: - move to https/ssl - improvements re package signing - internal parsing/aggregation of metadata for better queries, and to stop using the filename for version/platform/etc. information. - moving of requirements logic from client into PyPI, where it has db access to the dependency, resolves what packages are needed and delivers a list back to the client for prompting the user for permission, similar to how "yum" interacts today. - a db lint-picking walker that looks for problems on PyPI, such as binary distros w/o a source distro, lack of binaries for those platforms often without compilers, failure to provide a link to a version repo for use with "project==dev". - some auto-generated reports of access statistics and the mix of distutils vs setuptools, those who "registered" w/o "uploading", and perhaps if we get a new classifier assigned, some idea of Python 2.x vs 3.x packages. Last, some of the issues with distutils/setuptools can be solved with zc.buildout. If you have found zc.buildout lacking, please tell me where it fell short so we can see if anything can be done. Thanks for your involvement, -Jeff

Jeff Rush <jeff@taupro.com> writes:
In researching the state of packaging, I've been reading the archives and all the bug reports filed against distutils.
I'd like though to get some examples of particularly troublesome uses of setup.py, to pull together and propose some changes to make their use case a bit easier. So far such cases I've been made aware of are Twisted, numpy and SciPy. If you know of a tough case where the developer had to jump through hoops to make it work, please point me to it.
I'd also like to get suggestions of improvements to PyPI, which I've not seen much discussion about.
I know that those in Debian who are involved with packaging Python modules and applications often complain about the state of distutils and setuptools. I don't know what the specific complaints are, though. I'll forward this request to the debian-python list in an attempt to rouse them for feedback. -- \ "Why was I with her? She reminds me of you. In fact, she | `\ reminds me more of you than you do!" -- Groucho Marx | _o__) | Ben Finney

Jeff Rush wrote:
In researching the state of packaging, I've been reading the archives and all the bug reports filed against distutils.
I'd like though to get some examples of particularly troublesome uses of setup.py, to pull together and propose some changes to make their use case a bit easier. So far such cases I've been made aware of are Twisted, numpy and SciPy. If you know of a tough case where the developer had to jump through hoops to make it work, please point me to it.
Hi, My name is David Cournapeau, and I am one of the developer of numpy (I am not one of the core developer, but I have been heavily involved with a new build system for both numpy and scipy in the last few months, so I think I have one or two things to say in this respect). My first contact with distutils was because I wanted to add some functionalities to numpy.distutils, which is numpy own extensions to distutils for numpy needs (things like fortran support, etc...). I wanted to add support for building ctypes extensions (.so on linux, .dylib on mac os X, .dll on windows, etc...), etc... I quickly gave up because of the complexity of distutils, and took a different approach (using scons within distutils to build all our compiled code, distutils still doing the packaging). Here are some things which I find frustrating with distutils: 1 extending distutils is not documented at all. Sure, you have a few words on distutils commands, but once you want to use compilers in your own commands, you are on your own. For example, a working example on how to extend distutils with a new command to build something from C would be a really good addition. The relationship between Distribution classes, Command classes and Compiler classes should be documented somewhere. The relationship between the different distutils commands should be documented somewhere: I wanted to do something as simple as adding a distutils command to add a whole directory of files: doing it such as it works with sdist, install, distutils and setuptools was found impossible, and I found easier to regenerate MANIFEST.in with a shell script. That's something that should be doable in a hour or two for anyone who does not know anything about distutils; today, I am not sure it is doable by anyone without a deep knowledge of distutils. 2 The only way to understand how distutils works is to run code, because a lot of code is based on adding attributes at run-time, etc... Basically, a lot of distutils feels like magic to me. For example: - in numpy, we want to have tight control of compiler flags: this is extremely complicated to do with distutils, because flags are added from everywhere in the code, and understanding it enough to change it wo breaking anything is nearly impossible. Removing the magic would be great (all the configuration in some separate configuration files, for example, and the customization at runtime in one clearly separated module, for example). But this is a difficult problem: I don't see how to change this (in distutils) without breaking someone else code. Ideally, it should be easy to customize compiler flags from the command line (I bet this is one of the rpm/deb maintainers complain); every few days, some people complain on numpy/scipy ML because they use CFLAGS, it does not work as they expect it to work, and it breaks the build. - compiler usage is not documented. Some functions (initialized) have to be called in some order with compiler instances to get some of their characteristics; of course, neither the order or which function to call for which characteristic is documented anywhere; worse, it depends on the compiler (unix vs windows). I don't understand the point of adding attributes on runtime, differently, in different cases. Maybe I am missing something here - why msvc is different from everything else ? In particular, why it is not possible to have access to msvc flags in the same manner than all other platforms ? Instead, it is burried in the MSVCCompiler code... - generally, it is not specified what is public interface and what is not. Everything is leaking everywhere, there is no specification. 3 Some code to detect libraries would be good. For example, you write code which depends on libfoo: we have our own code in numpy.distutils, but that's something which I think many people would like to be able to do. A helper tool to parse pkg-config would be good, too. The magic behavior + lack of documentation really is the main problem for me: if there was a small core of functionalities that we could extend, the situation would be better. It is difficult to say one particular thing is broken, because almost any distutils functionality is linked to something else; I cannot find a more precise description than magic, and the above points are the first which come to my mind (I can find other ones if necessary, but they are all linked to this magic thing and lack of precise interfaces). But changing this in a backward-compatible manner may be extremely difficult, maybe even impossible. To be frank, I was secretly hoping something would be done on this front for python 3k... I would certainly be happy to help if there was some work on a distutils2. cheers, David

On Thu, Mar 20, 2008 at 12:17 AM, Jeff Rush <jeff@taupro.com> wrote: [cut]
- move to https/ssl
Semi-related: There are a few problems in this area, also related to indexing we need to work out imho: When a package defines a https://... link into the url meta-data, the link will be added in the Simple index besides other links. For instance, people that uses sourceforge can have such urls. Even if the package egg or tarball is available at PyPI, the home page url will appear at #1 on the index page. This will make tools like easy_install read this link before it reaches the egg/tarball. This is OK as long as the users behind the firewalls are allowed to call htppS... so that's a PyPI server *and/or* setuptools issue
[cut] - some auto-generated reports of access statistics and the mix of distutils vs setuptools, those who "registered" w/o "uploading", and perhaps if we get a new classifier assigned, some idea of Python 2.x vs 3.x packages.
In this area, I have proposed a few months ago to make the classifiers permissive see http://wiki.python.org/moin/EnhancedPyPI. Just wanted to say again, that this work is in my pile for turning PyPI errors into warnings when a classifier is unknown. The last thing I wanted to express in the state of packaging is that it took me a long time to understand where all pieces of code belonged to, between distutils and setuptools. I have found some bugs in distutils, and did add some issues, patches, and discovered after that setuptools had them resolved in its code. I am feeling very frustrated on this because the boundary between the two package is very fuzzy. Maybe this is just a doc issue. In the meantime, I might be naive but I don't really understand why some part of setuptools are not merged into distutils, when it does the same things, but in a more robust way. Cheers Tarek

Tarek Ziadé wrote:
On Thu, Mar 20, 2008 at 12:17 AM, Jeff Rush <jeff@taupro.com
- move to https/ssl
There are a few problems in this area, also related to indexing we need to work out imho:
When a package defines a https://... link into the url meta-data, the link will be added in the Simple index besides other links. For instance, people that uses sourceforge can have such urls. Even if the package egg or tarball is available at PyPI, the home page url will appear at #1 on the index page.
This will make tools like easy_install read this link before it reaches the egg/tarball.
This is OK as long as the users behind the firewalls are allowed to call htppS...
It's not clear to me the correct behavior - help me understand: 1. Are there firewall policies that block *all* https access? I've only encountered more fine-grained firewalls because, to me, use of https for _some_ sites is a necessary and expected behavior. 2. If we moved PyPI to serve exclusively over https, for integrity reasons, would this have a major negative impact? 3. Would it be better to sort the URLs, to place the https ones at the end, a and allow a fetch error to occur, or provide a .distutils config option to just quietly skip https sites? 4. Is it not a problem that, when checking for newer versions, setuptools would be unable to access a newer version on an https site and would have to settle for an older version on a non-https site, leading to stale packages? -Jeff

Jeff Rush wrote:
2. If we moved PyPI to serve exclusively over https, for integrity reasons, would this have a major negative impact?
Given that urllib2 doesn't support https through a proxy, it would probably cause a problem for easy_install, etc. :-) We had to create a custom urllib2 Handler for a couple of our applications. I just looked to see if I should pursue re-working our solution as a patch and saw that Christopher Li has already posted a patch. http://bugs.python.org/issue1424152 And there is also this outstanding ticket: http://bugs.python.org/issue1448934. -- Dave

oups, forgot to cc to the list On Fri, Mar 21, 2008 at 12:28 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
On Thu, Mar 20, 2008 at 9:42 PM, Jeff Rush <jeff@taupro.com> wrote:
Tarek Ziadé wrote:
On Thu, Mar 20, 2008 at 12:17 AM, Jeff Rush <jeff@taupro.com
- move to https/ssl
There are a few problems in this area, also related to indexing we need to work out imho:
When a package defines a https://... link into the url meta-data, the link will be added in the Simple index besides other links. For instance, people that uses sourceforge can have such urls. Even if the package egg or
is available at PyPI, the home page url will appear at #1 on the index
tarball page.
This will make tools like easy_install read this link before it
reaches
the egg/tarball.
This is OK as long as the users behind the firewalls are allowed to call htppS...
It's not clear to me the correct behavior - help me understand:
1. Are there firewall policies that block *all* https access? I've only encountered more fine-grained firewalls because, to me, use of https for _some_ sites is a necessary and expected behavior.
That happened last week for a developer on one project at a customer place. I am not saying it is the right behavior, but that's how I found the problem.
Now maybe that such a firewall is too restrictive anyway to allow the usage of a web based repository such as PyPI
2. If we moved PyPI to serve exclusively over https, for integrity reasons, would this have a major negative impact?
Related to 1. I guess it is a choice. As long as it is easy to created mirrors of PyPI. That's what we do in some projects.
Now for https, like Dave says, we cannot create at this time a robust auth handler for it, and our PyPI implementation uses http auth.
So if this patch is pushed it is very cool :)
3. Would it be better to sort the URLs, to place the https ones at the end, a and allow a fetch error to occur, or provide a .distutils config option to just quietly skip https sites?
I think ordering the URLs and puting the *.egg, *.tar.gz, etc.. at first would be good yes, as easy_install fetches them in order.
It will also make the system quicker I think, if easy_install would not fetch external home URLs when the right packages are available on the page.
Maybe those could be dropped when the dists are uploaded That's what I am doing on the PyPI server I work on.
4. Is it not a problem that, when checking for newer versions, setuptools would be unable to access a newer version on an https site and would have to settle for an older version on a non-https site, leading to stale packages?
Good point. But I guess that as long as the system allows external urls, we can't prevent from such failures.
We have some mirrors for that as a matter of fact, not to rely on third party servers that are sometimes down or moving things around.
-Jeff
participants (5)
-
Ben Finney
-
Dave Peterson
-
David Cournapeau
-
Jeff Rush
-
Tarek Ziadé