You guys are fairly into you debate so hopefully I don't interject something that's already been gone over :-)
Chris Withers wrote:
Matthias Klose wrote:
Install debian and get back to productive tasks.
This is an almost troll-like answer. See page 35 of the presentation.
I disagree. You could think of "Packages are Pythons Plugins" (taken from page 35) as a troll-like statement as well.
You're welcome to your (incorrect) opinion ;-) Debian packages could just as easilly be seen as Debian's pluggins.
For a *very* loose definition of plugin, perhaps. But if you look at: http://en.wikipedia.org/wiki/Plugin
the idea of Debian packages being plugins is a pretty far stretch. The idea of Packages being python plugins is less of a stretch but I'd call it an analogy. It's useful for looking on things in a new light but if we start designing a plugin interface and only viewing packages through that definition I think we'll be hindering ourselves.
- all the package management systems behave differently and expect
packages to be set up differently for them
correct, but again they share common requirements.
...but all have different implementations.
The common requirements are more important than the varying implementations when thinking about the metadata and how flexible things need to be. When justifying the need for a separate python build tool and distribution format, realizing that there's different implementations is good. ie: we need to expose package naming, versioning, and dependencies to outside tools because they have a common need for that information on the one hand. We have to realize that there's a need for both run-from-egg and run-from-FHS-locations on the other.
some people prefer to name this "stable releases" instead of "bitrot".
I'll call bullshit on this one. The most common problem I have as a happy Debian user and advocate when I go to try and get help for a packaged application (I use packages because I perhaps mistakenly assume this is the best way to get security-fixed softare), such as postfix, postgres, and Zope if I was foolish enough to take that path, is "why are toy using that ancient and buggy version of the software?!" shortly before pointing out how all the issues I'm facing are solved in newer (stable) releases.
The problem is that first the application needs to be tested and released by its community, then Debian needs to re-package, patch, generally mess around with it, etc before it eventually gets a "Debian release". It's bad enough with apps with huge support bases like portgres, imagine trying to do this "properly" for the 4000-odd packages on PyPI...
You're correct in the results you're seeing but not in the reason that it exists. There are many linux distributions and each has a different policy of how to update packages. The reason for the variety is that there's demand for both fast package updates and slow package updates. The Debian Stable, Red Hat Enterprise Linux, and other stable, enterprise-oriented distributions' aim is to provide a stable base on which people can build their applications and processes. A common misperception among developers who want faster cycles is that the base system is just a core of packages while things closer to the leaves of the dependency tree could be updated (ie: don't update the kernel; do update the python-sqlalchemy package). What's not seen is that these distributions are providing the base for so many people that updates that change the API/ABI/on-disk format/etc are likely to break *someone* out there. You want to be using one of these systems if you have deployed a major application that serves thousands of people and can afford little to no downtime because you can be more assured that any changes to the system are either changes that are overwhelmingly necessary and the API/ABI breakage has been reduced as much as possible or changes that you yourself have introduced.
For system administrators it can also be frustrating due to knowing that there's been bug fixes that are not supposed to change backwards compatibility in newer upstream packages. The problem here is that we all know that all software has bugs. The risk with an update to a newer stable version of software is that the new software has bugs that are as bad or worse than the old one. The package maintainers have to evaluate how many changes have gone into the new version of the software and how big the current problem is and then apply the distribution's policy on updates to that. For a stable enterprise-oriented distro, it's often a case of "better the devil you know than the devil you don't".
For a developer of software or someone deploying a new system (as opposed to someone who's had one deployed for several years before they hit a certain bug), this can be quite frustrating as you know that there are fixes and features in newer versions of the software. When you have the choice, then, you should use one of the other Linux distributions either whose focus is on staying closer to what upstream is shipping (I'd recommend this for developers) or one which has a stable policy but has released closer to the current date with newer packages. When you don't have a choice, you have to be prepared for the possibility that you will need to install the requirements for your app from another resource (this could be from another version that the distribution supports like installing debian backports or installing from source or installing an egg). Remember though, that sometimes the distribution will update a package for you if you just request it. It depends on the severity of what's broken currently, the risks involved with updating, and the distribution (and maintainer's) policies/perceptions of the risk vs reward.
Speaking of extensions "maintained by the entity originating the python package": this much too often is a way of bitrot. is the shipped library up to date? does it have security fixes? how many duplicates are shipped in different extensions? does the library need to be shipped at all (because some os does ship it)?
So what do you propose doing one projectA depends on version 1.0 of libC and projectB depends on version 2.0 of libC?
This is a problem that is not new for distributions. Each one handles it slightly differently. For Fedora, we've decided the best course is to help upstream port to newer versions of the library. However, since this isn't always practical, we sometimes introduce compatibility packages which have the old version of libraries so older programs will continue to work.
Having multiple versions not ideal as this is where bitrot sets in in earnest. If upstream for libC only supports version 2.0 and a security flaw comes out that affects both libC-1.0 and libC-2.0 then we have to fix libC-1.0 at the distribution level. This is more work for us to support something outdated. We'd much rather do work that has a future upstream by porting the application to the newer version. And the time to do that is *not* when there's a security flaw that has to be fixed yesterday.
The exact wrong-thing to do (and prohibited in policy in most distributions) is for the applications to have their own copies of the libraries. When a security flaw comes out in that case, we'd have to:
1) hunt through the all the packages we ship to find any that are affected. 2) update the various versions in all of those packages which might mean we have to generate multiple different fixes. 3) rebuild those packages and force our users to redownload all of them.
If we had separate library packages for the separate versions we'd: 1) know exactly which packages had to be fixed 2) Only have to apply fixes once to the versions that we were shipping 3) have our users only download the library packages as the applications will load the fixed version from the system.
I can go on with other reasons why this is a bad idea and how to mitigate problems but if you're convinced already, I'll surrender the soapbox to someone else :-)
Considering an extension interfacing a library shipped with the os, you do want to use this library, not add another copy.
libxml2 seems to be agood example to use here...
I guess on debian I'd need to likely install libxml2-dev before I could install the lxml package...
Note: I'm a Fedora dev, not a Debian dev but the packaging techniques are similar in generalities. You should just be able to request that lxml be installed and it will automatically pull in libxml2. libxml2-dev shouldn't enter the picture as a python program that imports lxml won't need the C headers.
(Unless you're talking about *building* lxml which is a separate problem.)
...what about MacOS X?
...what about Windows?
Are you going to be distributing a separate version for MacOS X and Windows anyway since the norm is not to compile from source on those platforms? Then you're already at the point where you have multiple packages for different OS's. A source tarball for unix distributors and a binary zip/binhex/what have you for MacOSX and Windows.
An upstream extension maintainer cannot provide this unless he builds this extension for every (os) distribution and maintains it during the os' lifecycle.
...or just says in the docs "hey, you need libxml2 for this, unless you're on Windows, in which case the binary includes it".
- os distributors usually try to minimize the versions they include, trying to just ship one version.
...which is fair enough for the "system python", but many of us have a collection of apps, some of which require Python 2.4, some Python 2.5, and on top of each of those, different versions of different packages for each app.
In my case, I do source (alt-)installs of python rather than trusting the broken stuff that ships with Debian and buildout to make sure I get the right versions of the right packages for each project.
So this is fine to a certain extent.
Pros: * Allows you to develop new applications using known good or latest versions of other software. * Allows you to deploy an app using newer-than system libraries on an otherwise stable-class distribution.
Cons: * You become responsible for the code of all the components your installing. If there's a bug in your alt-install of lxml, you're the one that has to fix it rather than the linux distribution. * If you're distributing this so that everyone can use it, the os packagers are going to have to make sure that the code works with their versions and might have to do porting work.
The first Con is the more important one for me.
- setuptools has the narrow minded view of a python package being contained in a single directory, which doesn't fit well when you do have common locations for include or doc files.
Python packages have no idea of "docs" or "includes", which is certainly a deficiency.
I know I've mentioned paver before but one of the things that it does right is making the declarative metadata extensible. Whereas you can't simply add a new piece of metadata to setup.py's setup() you can add a new Bunch() of metadata in a paver pavement.py file without any other code. This makes it easy to do the right thing and write code to operate on "docs", "includes", "locales", etc that you've defined declaratively in the metadata section.
way packaging the python module with rpm or dpkg. E.g. namespace packages are a consequence how setuptools distributes and installs things. Why force this on everybody?
being able to break a large lump (say zope.*) into seperate distributions is a good idea, which setuptools implements very badly using namespace packages...
A big win could be a modularized setuptools where you are able to only use the things you do want to use, e.g.
- version specifications (not just the heuristics shipped with setuptools).
not sure what you mean by this.
I'm not 100% certain of what Matthias means but there's several problems with seutptools usage of versions:
1) The heuristic encourages bad practices. Versions need to be parsed by computer programs (package managers, scripts that maintain repositories, etc). Not all of those are written in python. Having things other than letters and dots in version strings is problematic for these programs. For instance, here's something that setuptools versioning heuristics allow you to do:
foo-1.0rc1 foo-1.0 foo-1.0post1
But here's how rpm would order it: foo-1.0 foo-1.0post1 foo-1.0rc1
In Fedora we have rules for puting non-numeric things in our release tag to work around this:
version: 1.0 , release: 0.1.rc1 version: 1.0 , release: 1 version: 1.0 , release: 2.post1
This is not all inclusive, but you can see, we have to move the alpha portion of the version to the release to ensure that the upgrade path will move forward sensibly.
2) This is more important but much harder. Something that would really help everyone is having a way of versioning API/ABI. Right now you can specify that you depend on Foo >= 1.0 Foo <= 2.0. But the version numbers don't have meaning until the actual packages are released. If Foo-1.0 and Foo-1.1 don't have compatible API, your numbers are wrong. If Foo-1.0 is succeeded by Foo-2.0 with the same API your numbers are too restrictive. If you lock the versions to only what you've tested: Foo = 1.0 then you're going to have people and distributions that want to use the new version but can't. Some places have good versioning rules:: https://svn.enthought.com/enthought/wiki/EnthoughtVersionNumbers
Other places say they have marketing departments that prevent that One possibility would be to have MyLib1-1.0, MyLib2-1.0, MyLib2-2.0, etc with the version for marketing included in the package name.
Another idea would be to have API information stored in metadata but not in the package name. That way marketing can have a big party for MyLib-2.0 but the API metadata has API_Revision: 32.
specification of dependencies.
I have no love for how pkg_resources implements this (including the API) but the idea of retrieving data files, locales, config files, etc from an API is good. For packages to be coded that conform to the File Hierachy Standard on Linux, the API (and metadata) needs to be more flexible. We need to be able to mark locale, config, and data files in the metadata. The build/install tool needs to be able to install those into the filesystem in the proper places for a Linux distro, an egg, etc. and then we need to be able to call an API to retrieve the specific class of resources or a directory associated with them.
* config files go to /etc on Linux and we'd want to retrieve the contents of /etc/configfile * generic, architecture-independent data files go under /usr/share/. We'd want to place them in or under /usr/share/$PACKAGENAME. Mostly we're going to want to retrieve the contents of a specific data file. * locale files go under /usr/share/locale/ (ex: /usr/share/locale/en_US/LC_MESSAGES/compiz.mo) We'll want to retrieve the directory '/usr/share/locale' for feeding to gettext.
- a module system independent from any distribution specific stuff.
I read this as "entry_points is a good feature".
- any other distribution specific stuff.
I think Matthias is trying to separate out the different services that setuptools provides so that they can be decoupled and worked on separately. So "other distribution specific stuff" would be things to do with distributing the results of your labors. eggs and pypi would fall under this.
Matthias, if I'm wrong in any of this, please correct me :-). These are my perceptions due to them being the issues I have as a pakckger for a different distribution.