Re: [Distutils] "Python Package Management Sucks"
(I'll be CC'ing the distutils sig in to these replies as this discussion probably belongs there...) Nicolas Chauvat wrote:
The slides for my two talks can be found here:
"Python Package Management Sucks"
Install debian and get back to productive tasks.
This is an almost troll-like answer. See page 35 of the presentation. If you're too lazy, here's a re-hash: - there are lots of different operating systems - even with Linux, there are many different package management systems - all the package management systems behave differently and expect packages to be set up differently for them - expecting package developers to shoulder this burden is unfair and results in various bitrotting repackaged versions of python packages rather than just one up-to-date version maintained by the entity originating the python package - Adobe Photoshop Plugins, Firefox Add-ons, etc do not delegate their management to an OS package manager. Packages are Python's "plugins" and so should get the same type of consistent, cross-platform package management targetted at the application in question, which is Python in this case. cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
On Tue, Sep 23, 2008 at 12:37 PM, Chris Withers
(I'll be CC'ing the distutils sig in to these replies as this discussion probably belongs there...)
Nicolas Chauvat wrote:
The slides for my two talks can be found here:
"Python Package Management Sucks"
Install debian and get back to productive tasks.
haha. :) I just wanted to react on that: I have created a setuptools-enabled package for Pylint (logilab.pylintinstaller, see http://pypi.python.org/pypi/logilab.pylintinstaller) with the bless of Logilab, so people under Windows could use the tool with a one-liner installation process... So if you like Pylint and Python but you don't use Debian you can use that.
This is an almost troll-like answer. See page 35 of the presentation.
If you're too lazy, here's a re-hash:
- there are lots of different operating systems
- even with Linux, there are many different package management systems
- all the package management systems behave differently and expect packages to be set up differently for them
- expecting package developers to shoulder this burden is unfair and results in various bitrotting repackaged versions of python packages rather than just one up-to-date version maintained by the entity originating the python package
- Adobe Photoshop Plugins, Firefox Add-ons, etc do not delegate their management to an OS package manager. Packages are Python's "plugins" and so should get the same type of consistent, cross-platform package management targetted at the application in question, which is Python in this case.
cheers,
Chris
-- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
Hi Tarek, On Tue, Sep 23, 2008 at 01:38:00PM +0200, Tarek Ziadé wrote:
I just wanted to react on that: I have created a setuptools-enabled package for Pylint (logilab.pylintinstaller, see http://pypi.python.org/pypi/ logilab.pylintinstaller) with the bless of Logilab, so people under Windows could use the tool with a one-liner installation process...
Thanks again for doing it. We provide Debian packages for the free software we make and we will support anyone wanting to package our free software for other targets than Debian. -- Nicolas Chauvat logilab.fr - services en informatique scientifique et gestion de connaissances
I have to say, as a developer, and a system administrator, I like setuptools. It does what I need. Could it be better? Sure. For what I use python for on a day-to-day basis it makes my life a thousand times better than it was before setuptools. Nothing ruins your day more than spending *hours* tracing down package dependencies just to get the *one* package you need to allow you to perform some crucial task. It's even worse when you have to do it on multiple architectures. Perl's package location and installation system (CPAN) is one of the primary facts contributing to its success. Perl is a pig. It's a charming pig that can do lots of tricks, but a pig none the less. What makes it shine is CPAN. And here's the catch: CPAN isn't really any better than setuptools. It's got warts and nuts all over the place, but it works. Without setuptools a lot of people wouldn't be using python. Easily installing packages is critical to the success of a language, and setuptools fills that role admirably. [ I think what we really need to focus on as a community is binary generation generation. There are several tools out there that work for different platforms, but nothing that, well, just works everywhere. I'd far rather have one of those, than bicker over a perfectly functional setuptools. ] -jeff
Jeff Younker wrote:
I have to say, as a developer, and a system administrator, I like setuptools. It does what I need. Could it be better? Sure. For what I use python for on a day-to-day basis it makes my life a thousand times better than it was before setuptools. Nothing ruins your day more than spending *hours* tracing down package dependencies just to get the *one* package you need to allow you to perform some crucial task. It's even worse when you have to do it on multiple architectures.
Perl's package location and installation system (CPAN) is one of the primary facts contributing to its success. Perl is a pig. It's a charming pig that can do lots of tricks, but a pig none the less. What makes it shine is CPAN. And here's the catch: CPAN isn't really any better than setuptools. It's got warts and nuts all over the place, but it works.
And CPAN has some HUGE advantages over setuptools: it is designed as a repository, and it is replicated. Which means it is dependable. Anyone who suffered through the multiple outages of PyPI (which in not replicated) over the past year or so, or the ongoing outages of the many repositories across the web to which PyPI directs users/processes, can understand why this is important. - rick
On Sep 23, 2008, at 5:44 PM, Rick Warner wrote:
Jeff Younker wrote:
I have to say, as a developer, and a system administrator, I like setuptools. It does what I need. Could it be better? Sure. For what I use python for on a day-to-day basis it makes my life a thousand times better than it was before setuptools. Nothing ruins your day more than spending *hours* tracing down package dependencies just to get the *one* package you need to allow you to perform some crucial task. It's even worse when you have to do it on multiple architectures.
Perl's package location and installation system (CPAN) is one of the primary facts contributing to its success. Perl is a pig. It's a charming pig that can do lots of tricks, but a pig none the less. What makes it shine is CPAN. And here's the catch: CPAN isn't really any better than setuptools. It's got warts and nuts all over the place, but it works.
And CPAN has some HUGE advantages over setuptools: it is designed as a repository, and it is replicated. Which means it is dependable. Anyone who suffered through the multiple outages of PyPI (which in not replicated) over the past year or so, or the ongoing outages of the many repositories across the web to which PyPI directs users/ processes, can understand why this is important.
Actually, PyPI is replicated. See, for example, http://download.zope.org/simple/ . It may be that some of the mirrors should be better advertised. Jim -- Jim Fulton Zope Corporation
Jim Fulton wrote:
On Sep 23, 2008, at 5:44 PM, Rick Warner wrote:
Jeff Younker wrote:
I have to say, as a developer, and a system administrator, I like setuptools. It does what I need. Could it be better? Sure. For what I use python for on a day-to-day basis it makes my life a thousand times better than it was before setuptools. Nothing ruins your day more than spending *hours* tracing down package dependencies just to get the *one* package you need to allow you to perform some crucial task. It's even worse when you have to do it on multiple architectures.
Perl's package location and installation system (CPAN) is one of the primary facts contributing to its success. Perl is a pig. It's a charming pig that can do lots of tricks, but a pig none the less. What makes it shine is CPAN. And here's the catch: CPAN isn't really any better than setuptools. It's got warts and nuts all over the place, but it works.
And CPAN has some HUGE advantages over setuptools: it is designed as a repository, and it is replicated. Which means it is dependable. Anyone who suffered through the multiple outages of PyPI (which in not replicated) over the past year or so, or the ongoing outages of the many repositories across the web to which PyPI directs users/processes, can understand why this is important.
Actually, PyPI is replicated. See, for example, http://download.zope.org/simple/.
It may be that some of the mirrors should be better advertised.
A half-hearted effort. at best, after the problems last year. When I configure a CPAN client (once per user) I create a list of replicas I want to search for any query from a list of hundreds of replicas distributed around the world. From then on the client automatically switches to one of my selected replicas when one does not respond in a timely manner. The minimal set of recent PyPI replicas are neither well advertised, and are not automatically searched, so therefore ineffective. And that is a mere tip of the iceberg, since PyPI is just the index, and the repositories are for the most part not replicated. CPAN sites are both index and repository. Setuptools and PyPI are light years behind CPAN in regards to creating a usable, reliable method of package deployment. - rick
On Sep 23, 2008, at 6:42 PM, Rick Warner wrote:
Jim Fulton wrote:
On Sep 23, 2008, at 5:44 PM, Rick Warner wrote:
Jeff Younker wrote:
I have to say, as a developer, and a system administrator, I like setuptools. It does what I need. Could it be better? Sure. For what I use python for on a day-to-day basis it makes my life a thousand times better than it was before setuptools. Nothing ruins your day more than spending *hours* tracing down package dependencies just to get the *one* package you need to allow you to perform some crucial task. It's even worse when you have to do it on multiple architectures.
Perl's package location and installation system (CPAN) is one of the primary facts contributing to its success. Perl is a pig. It's a charming pig that can do lots of tricks, but a pig none the less. What makes it shine is CPAN. And here's the catch: CPAN isn't really any better than setuptools. It's got warts and nuts all over the place, but it works.
And CPAN has some HUGE advantages over setuptools: it is designed as a repository, and it is replicated. Which means it is dependable. Anyone who suffered through the multiple outages of PyPI (which in not replicated) over the past year or so, or the ongoing outages of the many repositories across the web to which PyPI directs users/processes, can understand why this is important.
Actually, PyPI is replicated. See, for example, http://download.zope.org/simple/ .
It may be that some of the mirrors should be better advertised.
A half-hearted effort. at best,
Hardly, but there's always room for improvement.
after the problems last year. When I configure a CPAN client (once per user) I create a list of replicas I want to search for any query from a list of hundreds of replicas distributed around the world. From then on the client automatically switches to one of my selected replicas when one does not respond in a timely manner.
That's good. That would be nice to add to setuptools.
The minimal set of recent PyPI replicas are neither well advertised,
Yes
and are not automatically searched,
Setuptools and many tools built on it let you configure the index to use. It's true that you can configure only one, but I've found that to be sufficient for my needs, but I agree, being able to search many would be nice.
so therefore ineffective.
Lots of people have found them to be very effective.
And that is a mere tip of the iceberg, since PyPI is just the index, and the repositories are for the most part not replicated. CPAN sites are both index and repository.
My mirror is just an index, yes. I've found PyPI's repository to be reliable enough for my needs. I've had the most trouble when distributions aren't stored in the PyPI repository. People are building mirrors that mirror both index and repository/
Setuptools and PyPI are light years behind CPAN in regards to creating a usable, reliable method of package deployment.
Behind, yes. Light years? I don't think so. Jim -- Jim Fulton Zope Corporation
On Wed, Sep 24, 2008 at 2:24 AM, Jim Fulton
On Sep 23, 2008, at 6:42 PM, Rick Warner wrote:
Jim Fulton wrote:
On Sep 23, 2008, at 5:44 PM, Rick Warner wrote:
Jeff Younker wrote:
I have to say, as a developer, and a system administrator, I like setuptools. It does what I need. Could it be better? Sure. For what I use python for on a day-to-day basis it makes my life a thousand times better than it was before setuptools. Nothing ruins your day more than spending *hours* tracing down package dependencies just to get the *one* package you need to allow you to perform some crucial task. It's even worse when you have to do it on multiple architectures.
Perl's package location and installation system (CPAN) is one of the primary facts contributing to its success. Perl is a pig. It's a charming pig that can do lots of tricks, but a pig none the less. What makes it shine is CPAN. And here's the catch: CPAN isn't really any better than setuptools. It's got warts and nuts all over the place, but it works.
And CPAN has some HUGE advantages over setuptools: it is designed as a repository, and it is replicated. Which means it is dependable. Anyone who suffered through the multiple outages of PyPI (which in not replicated) over the past year or so, or the ongoing outages of the many repositories across the web to which PyPI directs users/processes, can understand why this is important.
Actually, PyPI is replicated. See, for example, http://download.zope.org/simple/.
It may be that some of the mirrors should be better advertised.
A half-hearted effort. at best,
Hardly, but there's always room for improvement.
after the problems last year. When I configure a CPAN client (once per
user) I create a list of replicas I want to search for any query from a list of hundreds of replicas distributed around the world. From then on the client automatically switches to one of my selected replicas when one does not respond in a timely manner.
That's good. That would be nice to add to setuptools.
Well that is the patch I have proposed right here : let setuptools deal with several indexes http://bugs.python.org/setuptools/issue32 Tarek
On Wednesday 24 September 2008 01:24:51 Jim Fulton wrote:
Setuptools and PyPI are light years behind CPAN in regards to creating a usable, reliable method of package deployment.
Behind, yes. Light years? I don't think so.
People have been complaining about python package management for the entire time I've used python. Now, I'm not old timer for python only 6 and a bit years or so - after having rejected it originally 10 years ago, but people have definitely been complaining for 6 years at least. Now before that I'd been writing perl code for many years, and they'd had this system stolen from the TeX community (CTAN) which they'd called CPAN and that had been actively useful to everyone in the Perl community from the first time I came across perl (which was 10 years ago). Now, with python there's the general ethos: There should be one-- and preferably only one --obvious way to do it. And with perl there's the general ethos: There's more than one way to do it Anyone who's written extensive amounts of code in both languages will know that the latter ethos does cause major problems in practice. However for packaging, with python the rule is * "There's more than one way to do it" And for perl the rule is: * Use CPAN I've always found this difference amusing, and never said much about this because one of my earliest experiences of the python "community" was seeing a poor perl developer at Europython in 2005 ripped to shreds by (someone I would expect nicer behaviour from) for merely suggesting "hey, the perl people actually have figured this out and have one obvious way to do it". Now, personally, given that CPAN (which I think borrowed the idea from CTAN - I think - I definitely heard of CTAN first) has worked very happily for the perl community for the past 10 years (the time I've used perl for), and given that the python community still has problems (pyinstall being the latest in the saga), maybe the time has come to recognise that perl maybe isn't "light years" ahead on this, but actually is a decade ahead at least. After all, the question: * What do I use to package my code *IS* a discussion point for python. For Perl, it's a FAQ with the answer being "CPAN". Sure, CPAN has had (and probably still has occasionally) problems, and it is far from perfect but at least the perl people DO have one obvious way to do it. Heck, in python its a discussion so old that people become INCREDIBLY ratty about it, and so on that note since I'll probably get flamed to hell and back for the above (given what happened at Europython a few years back), I'll back out quietly now. Regards, Michael.
Michael wrote:
Now, with python there's the general ethos: There should be one-- and preferably only one --obvious way to do it.
And with perl there's the general ethos: There's more than one way to do it
Anyone who's written extensive amounts of code in both languages will know that the latter ethos does cause major problems in practice.
However for packaging, with python the rule is * "There's more than one way to do it"
And for perl the rule is: * Use CPAN
I've always found this difference amusing,
Me too... Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Rick Warner wrote:
Actually, PyPI is replicated. See, for example, http://download.zope.org/simple/.
It may be that some of the mirrors should be better advertised.
A half-hearted effort. at best, after the problems last year. When I configure a CPAN client (once per user) I create a list of replicas I want to search for any query from a list of hundreds of replicas distributed around the world.
Can someone suggest the best way to search among repositories? For instance, try to connect to one, then stop if it gives Connection Refused? If it gives any unexpected error (5xx)? Timing out is a common failure, and a pain in the butt, but I guess there's that too. What does the CPAN client do? -- Ian Bicking : ianb@colorstudy.com : http://blog.ianbicking.org
Ian Bicking wrote:
Rick Warner wrote:
Actually, PyPI is replicated. See, for example, http://download.zope.org/simple/.
It may be that some of the mirrors should be better advertised.
A half-hearted effort. at best, after the problems last year. When I configure a CPAN client (once per user) I create a list of replicas I want to search for any query from a list of hundreds of replicas distributed around the world.
Can someone suggest the best way to search among repositories? For instance, try to connect to one, then stop if it gives Connection Refused? If it gives any unexpected error (5xx)? Timing out is a common failure, and a pain in the butt, but I guess there's that too. What does the CPAN client do?
I don't know what CPAN does but Linux distributions have also solved this problem. We send out massive numbers of updates and new packages to users every day so we need a mirror network that works well. In Fedora we have a server that gives out a list of mirrors with GeoIP data used to try and assemble a list of mirrors near you (country, then continent (with special cases, for instance, for certain middle eastern countries that connect better to Europe than to Asia) and then global). This server gives the mirror list out (randomized among the close mirrors) and the client goes through the list, trying to retrieve package metadata. If it times out or otherwise fails, then it goes on to the next mirror until it gets data. (Note, some alternate clients are able to download from multiple servers at the same time if multiple packages are needed.) The mirrorlist server is a pretty neat application (https://fedorahosted.org/mirrormanager). It has a TurboGears front end that allows people to add a new mirror (https://admin.fedoraproject.org/mirrormanager) for public availability or restricted to a subset of IPs. It allows you to only mirror a subset of the whole content. And it has several methods of telling if the mirror is in sync or outdated. The latter is important to us for making sure we're giving out users the latest updates that we've shipped and ranges from a script that the mirror admin can run from their cron job to check the data available and report back to a process run on our servers to check that the mirrors have up to date content. The mirrorlist itself is cached and served from a mod_python script (soon to be mod_wsgi) for speed. You might also be interested in the way that we work with package metadata. In Fedora and many other rpm-based distributions (Some Debian-based distros talked about this as well but I don't know if it was ever implemented there) we create static xml files (and recently, sqlite dbs as well) that live on the mirrors. The client hits the mirror and downloads at least two of these files. The repomd.xml file describes the other files with checksums and is used to verify that the other metadata is up to date and whether anything has changed. The primary.xml file stores information that is generally what is needed for doing depsolving on the packages. Then we have several other xml files that collectively contain the complete metadata for the packages but is usually overkill... by separating htis stuff out, we save clients from having to download it in the common case. This stuff could provide some design ideas for constructing a pypi metadata repository and is documented here: http://createrepo.baseurl.org/ Note: the reason we went with static metadata rather than some sort of cgi script is that static data can be mirrored without the mirror being required to run anything beyond a simple rsync cron job. This makes finding mirrors much easier. -Toshio
--On 23. September 2008 18:20:36 -0400 Jim Fulton
On Sep 23, 2008, at 5:44 PM, Rick Warner wrote:
Jeff Younker wrote:
I have to say, as a developer, and a system administrator, I like setuptools. It does what I need. Could it be better? Sure. For what I use python for on a day-to-day basis it makes my life a thousand times better than it was before setuptools. Nothing ruins your day more than spending *hours* tracing down package dependencies just to get the *one* package you need to allow you to perform some crucial task. It's even worse when you have to do it on multiple architectures.
Perl's package location and installation system (CPAN) is one of the primary facts contributing to its success. Perl is a pig. It's a charming pig that can do lots of tricks, but a pig none the less. What makes it shine is CPAN. And here's the catch: CPAN isn't really any better than setuptools. It's got warts and nuts all over the place, but it works.
And CPAN has some HUGE advantages over setuptools: it is designed as a repository, and it is replicated. Which means it is dependable. Anyone who suffered through the multiple outages of PyPI (which in not replicated) over the past year or so, or the ongoing outages of the many repositories across the web to which PyPI directs users/ processes, can understand why this is important.
Actually, PyPI is replicated. See, for example, http://download.zope.org/simple/.
It may be that some of the mirrors should be better advertised.
For the logs: we are currently working on a mirroring infrastructure with a set of several full PyPI mirror (based on z3c.pypimirror)....more to be announced and worked on in the middle of October. Andreas
Chris Withers writes:
(I'll be CC'ing the distutils sig in to these replies as this discussion probably belongs there...)
Nicolas Chauvat wrote:
The slides for my two talks can be found here:
"Python Package Management Sucks"
Install debian and get back to productive tasks.
This is an almost troll-like answer. See page 35 of the presentation.
I disagree. You could think of "Packages are Pythons Plugins" (taken from page 35) as a troll-like statement as well. Apparently this user is happy about Debian and Ubuntu how they distribute python modules and it's worth evaluating why.
If you're too lazy, here's a re-hash:
- there are lots of different operating systems
- even with Linux, there are many different package management systems
and a subset of all these operating systems (linux or other) do have the need to distribute python and a set of python modules and extensions. they cannot rely on a plugin system outside the (os) distribution.
- all the package management systems behave differently and expect packages to be set up differently for them
correct, but again they share common requirements.
- expecting package developers to shoulder this burden is unfair and
results in various bitrotting repackaged versions of python packages
rather than just one up-to-date version maintained by the entity originating the python package
some people prefer to name this "stable releases" instead of "bitrot". usually you can be assured that the set of python modules delivered with an (os) distribution is tested and works. I doubt you do achieve the same with an untested set of up-to-date versions. Speaking of extensions "maintained by the entity originating the python package": this much too often is a way of bitrot. is the shipped library up to date? does it have security fixes? how many duplicates are shipped in different extensions? does the library need to be shipped at all (because some os does ship it)?
- Adobe Photoshop Plugins, Firefox Add-ons, etc do not delegate their management to an OS package manager.
this is known trouble for os distributors, and your statement is generally wrong. firefox plugins are packaged in distributions and the plugin system is able to cope with packaged plugins.
Packages are Python's "plugins" and so should get the same type of consistent, cross-platform package management targetted at the application in question, which is Python in this case.
No, as explained above. Considering an extension interfacing a library shipped with the os, you do want to use this library, not add another copy. and you do want to interface exactly this version. An upstream extension maintainer cannot provide this unless he builds this extension for every (os) distribution and maintains it during the os' lifecycle. Your view is very common (and legitimate) in the Python world (distributing an batteries-all-included application), but not everybody does need this (like os distributors, using the reliable-power-plug-where-available), and in some areas it starts to hurt. Your paper gives a nice overview of the shortcomings of each of the build/distribution systems. From an (os) distributors point of view I would add som things (also posted in [1]). - os distributors usually try to minimize the versions they include, trying to just ship one version. This single version is installed with --single-version-externally-managed, so that it can be imported without any pkg_resouces magic and fiddling with pth files. Unfortunately then it is not possible to use/import another version using pkg_resources. As discussed at PyCon such a setup is very common for os distributors. None of the tools does support this. - setuptools has the narrow minded view of a python package being contained in a single directory, which doesn't fit well when you do have common locations for include or doc files. A linux distribution is supposed to follow the File Hierarchy Standard (FHS). None of the tools does support this. - setuptools (sharing with distutils) doesn't allow splitting of a python package into several distribution binary packages, e.g. have a separate package with large docs, allowing installation on a server without having to install X, and so on. But maybe this is something an (os) distributor only cares about. You will see that most packages included in (os) distributions still use plain distutils, even if the setup.py does support a setuptools based install. That's simply because distutils doesn't get into the way packaging the python module with rpm or dpkg. E.g. namespace packages are a consequence how setuptools distributes and installs things. Why force this on everybody? A big win could be a modularized setuptools where you are able to only use the things you do want to use, e.g. - version specifications (not just the heuristics shipped with setuptools). - specification of dependencies. - resource management - a module system independent from any distribution specific stuff. - any other distribution specific stuff. The "Wouldn't it be nice if?" pacman (page 55) sounds like a nice idea, if it could just handle multiple repositories and one of them being the archive of the os distributor (iirc the java jsr277 proposed the use of multiple repositories, even in different formats). Matthias [1] http://mail.python.org/pipermail/distutils-sig/2008-September/010045.html
Matthias Klose wrote:
Your paper gives a nice overview of the shortcomings of each of the build/distribution systems. From an (os) distributors point of view I would add som things (also posted in [1]). [snip]
I think it would be worthwhile to have one location to put those requirements. IMHO, taking into account OS packagers (Linux and other) in the development of a package tool for python is essential. Tarek, what is the best place to put this ? Somewhere on launchpad ? Or somewhere else ? cheers, David
On Thu, Sep 25, 2008 at 4:13 PM, David Cournapeau < david@ar.media.kyoto-u.ac.jp> wrote:
Matthias Klose wrote:
Your paper gives a nice overview of the shortcomings of each of the build/distribution systems. From an (os) distributors point of view I would add som things (also posted in [1]). [snip]
I think it would be worthwhile to have one location to put those requirements. IMHO, taking into account OS packagers (Linux and other) in the development of a package tool for python is essential.
Tarek, what is the best place to put this ? Somewhere on launchpad ? Or somewhere else ?
cheers,
David
I think the best place would be in python.org wiki itself, As Jeff Rush mentioned earlier, the best way to think about all requirements would be to write a series of PEP. So I think this would be a great starting point in Python.org wiki, I am preparing write now a few pages so we can gather all the info, i'll give you the link when it is ready Tarek
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
On Thu, Sep 25, 2008 at 4:52 PM, Tarek Ziadé
On Thu, Sep 25, 2008 at 4:13 PM, David Cournapeau < david@ar.media.kyoto-u.ac.jp> wrote:
Matthias Klose wrote:
Your paper gives a nice overview of the shortcomings of each of the build/distribution systems. From an (os) distributors point of view I would add som things (also posted in [1]). [snip]
I think it would be worthwhile to have one location to put those requirements. IMHO, taking into account OS packagers (Linux and other) in the development of a package tool for python is essential.
Tarek, what is the best place to put this ? Somewhere on launchpad ? Or somewhere else ?
cheers,
David
I think the best place would be in python.org wiki itself,
As Jeff Rush mentioned earlier, the best way to think about all requirements would be to write a series of PEP.
So I think this would be a great starting point in Python.org wiki,
I am preparing write now a few pages so we can gather all the info, i'll give you the link when it is ready
Tarek
Alright, I have just started it here: http://wiki.python.org/moin/Distribute I guess we can gather all documentation there. On my side I will write down a wrapup on what has been said, and what various people are expecting from such a project, and how it can be organized, etc. Tarek -- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
Matthias Klose wrote:
Install debian and get back to productive tasks. This is an almost troll-like answer. See page 35 of the presentation.
I disagree. You could think of "Packages are Pythons Plugins" (taken from page 35) as a troll-like statement as well.
You're welcome to your (incorrect) opinion ;-) Debian packages could just as easilly be seen as Debian's pluggins.
and a subset of all these operating systems (linux or other) do have the need to distribute python and a set of python modules and extensions. they cannot rely on a plugin system outside the (os) distribution.
OK, you guys have persuaded me of this at least...
- all the package management systems behave differently and expect packages to be set up differently for them
correct, but again they share common requirements.
...but all have different implementations.
some people prefer to name this "stable releases" instead of "bitrot".
I'll call bullshit on this one. The most common problem I have as a happy Debian user and advocate when I go to try and get help for a packaged application (I use packages because I perhaps mistakenly assume this is the best way to get security-fixed softare), such as postfix, postgres, and Zope if I was foolish enough to take that path, is "why are toy using that ancient and buggy version of the software?!" shortly before pointing out how all the issues I'm facing are solved in newer (stable) releases. The problem is that first the application needs to be tested and released by its community, then Debian needs to re-package, patch, generally mess around with it, etc before it eventually gets a "Debian release". It's bad enough with apps with huge support bases like portgres, imagine trying to do this "properly" for the 4000-odd packages on PyPI...
Speaking of extensions "maintained by the entity originating the python package": this much too often is a way of bitrot. is the shipped library up to date? does it have security fixes? how many duplicates are shipped in different extensions? does the library need to be shipped at all (because some os does ship it)?
So what do you propose doing one projectA depends on version 1.0 of libC and projectB depends on version 2.0 of libC?
this is known trouble for os distributors, and your statement is generally wrong. firefox plugins are packaged in distributions and the plugin system is able to cope with packaged plugins.
I guess since my desktop OS is still windows, this is not something I've had to fight with ;-)
Packages are Python's "plugins" and so should get the same type of consistent, cross-platform package management targetted at the application in question, which is Python in this case.
No, as explained above.
While I'll buy the argument that python packaging tools should make life easier for production of os-specific packages, I still don't think you're correct ;-)
Considering an extension interfacing a library shipped with the os, you do want to use this library, not add another copy.
libxml2 seems to be agood example to use here... I guess on debian I'd need to likely install libxml2-dev before I could install the lxml package... ...what about MacOS X? ...what about Windows?
An upstream extension maintainer cannot provide this unless he builds this extension for every (os) distribution and maintains it during the os' lifecycle.
...or just says in the docs "hey, you need libxml2 for this, unless you're on Windows, in which case the binary includes it".
- os distributors usually try to minimize the versions they include, trying to just ship one version.
...which is fair enough for the "system python", but many of us have a collection of apps, some of which require Python 2.4, some Python 2.5, and on top of each of those, different versions of different packages for each app. In my case, I do source (alt-)installs of python rather than trusting the broken stuff that ships with Debian and buildout to make sure I get the right versions of the right packages for each project.
- setuptools has the narrow minded view of a python package being contained in a single directory, which doesn't fit well when you do have common locations for include or doc files.
Python packages have no idea of "docs" or "includes", which is certainly a deficiency.
way packaging the python module with rpm or dpkg. E.g. namespace packages are a consequence how setuptools distributes and installs things. Why force this on everybody?
being able to break a large lump (say zope.*) into seperate distributions is a good idea, which setuptools implements very badly using namespace packages...
A big win could be a modularized setuptools where you are able to only use the things you do want to use, e.g.
- version specifications (not just the heuristics shipped with setuptools).
not sure what you mean by this.
- specification of dependencies.
- resource management
?
- a module system independent from any distribution specific stuff.
?
- any other distribution specific stuff.
?
The "Wouldn't it be nice if?" pacman (page 55) sounds like a nice idea, if it could just handle multiple repositories and one of them being the archive of the os distributor (iirc the java jsr277 proposed the use of multiple repositories, even in different formats).
Where does it say it doesn't support multiple repositories? ;-) Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
You guys are fairly into you debate so hopefully I don't interject something that's already been gone over :-) Chris Withers wrote:
Matthias Klose wrote:
Install debian and get back to productive tasks. This is an almost troll-like answer. See page 35 of the presentation.
I disagree. You could think of "Packages are Pythons Plugins" (taken from page 35) as a troll-like statement as well.
You're welcome to your (incorrect) opinion ;-) Debian packages could just as easilly be seen as Debian's pluggins.
For a *very* loose definition of plugin, perhaps. But if you look at: http://en.wikipedia.org/wiki/Plugin the idea of Debian packages being plugins is a pretty far stretch. The idea of Packages being python plugins is less of a stretch but I'd call it an analogy. It's useful for looking on things in a new light but if we start designing a plugin interface and only viewing packages through that definition I think we'll be hindering ourselves.
- all the package management systems behave differently and expect packages to be set up differently for them
correct, but again they share common requirements.
...but all have different implementations.
The common requirements are more important than the varying implementations when thinking about the metadata and how flexible things need to be. When justifying the need for a separate python build tool and distribution format, realizing that there's different implementations is good. ie: we need to expose package naming, versioning, and dependencies to outside tools because they have a common need for that information on the one hand. We have to realize that there's a need for both run-from-egg and run-from-FHS-locations on the other.
some people prefer to name this "stable releases" instead of "bitrot".
I'll call bullshit on this one. The most common problem I have as a happy Debian user and advocate when I go to try and get help for a packaged application (I use packages because I perhaps mistakenly assume this is the best way to get security-fixed softare), such as postfix, postgres, and Zope if I was foolish enough to take that path, is "why are toy using that ancient and buggy version of the software?!" shortly before pointing out how all the issues I'm facing are solved in newer (stable) releases.
The problem is that first the application needs to be tested and released by its community, then Debian needs to re-package, patch, generally mess around with it, etc before it eventually gets a "Debian release". It's bad enough with apps with huge support bases like portgres, imagine trying to do this "properly" for the 4000-odd packages on PyPI...
You're correct in the results you're seeing but not in the reason that it exists. There are many linux distributions and each has a different policy of how to update packages. The reason for the variety is that there's demand for both fast package updates and slow package updates. The Debian Stable, Red Hat Enterprise Linux, and other stable, enterprise-oriented distributions' aim is to provide a stable base on which people can build their applications and processes. A common misperception among developers who want faster cycles is that the base system is just a core of packages while things closer to the leaves of the dependency tree could be updated (ie: don't update the kernel; do update the python-sqlalchemy package). What's not seen is that these distributions are providing the base for so many people that updates that change the API/ABI/on-disk format/etc are likely to break *someone* out there. You want to be using one of these systems if you have deployed a major application that serves thousands of people and can afford little to no downtime because you can be more assured that any changes to the system are either changes that are overwhelmingly necessary and the API/ABI breakage has been reduced as much as possible or changes that you yourself have introduced. For system administrators it can also be frustrating due to knowing that there's been bug fixes that are not supposed to change backwards compatibility in newer upstream packages. The problem here is that we all know that all software has bugs. The risk with an update to a newer stable version of software is that the new software has bugs that are as bad or worse than the old one. The package maintainers have to evaluate how many changes have gone into the new version of the software and how big the current problem is and then apply the distribution's policy on updates to that. For a stable enterprise-oriented distro, it's often a case of "better the devil you know than the devil you don't". For a developer of software or someone deploying a new system (as opposed to someone who's had one deployed for several years before they hit a certain bug), this can be quite frustrating as you know that there are fixes and features in newer versions of the software. When you have the choice, then, you should use one of the other Linux distributions either whose focus is on staying closer to what upstream is shipping (I'd recommend this for developers) or one which has a stable policy but has released closer to the current date with newer packages. When you don't have a choice, you have to be prepared for the possibility that you will need to install the requirements for your app from another resource (this could be from another version that the distribution supports like installing debian backports or installing from source or installing an egg). Remember though, that sometimes the distribution will update a package for you if you just request it. It depends on the severity of what's broken currently, the risks involved with updating, and the distribution (and maintainer's) policies/perceptions of the risk vs reward.
Speaking of extensions "maintained by the entity originating the python package": this much too often is a way of bitrot. is the shipped library up to date? does it have security fixes? how many duplicates are shipped in different extensions? does the library need to be shipped at all (because some os does ship it)?
So what do you propose doing one projectA depends on version 1.0 of libC and projectB depends on version 2.0 of libC?
This is a problem that is not new for distributions. Each one handles it slightly differently. For Fedora, we've decided the best course is to help upstream port to newer versions of the library. However, since this isn't always practical, we sometimes introduce compatibility packages which have the old version of libraries so older programs will continue to work. Having multiple versions not ideal as this is where bitrot sets in in earnest. If upstream for libC only supports version 2.0 and a security flaw comes out that affects both libC-1.0 and libC-2.0 then we have to fix libC-1.0 at the distribution level. This is more work for us to support something outdated. We'd much rather do work that has a future upstream by porting the application to the newer version. And the time to do that is *not* when there's a security flaw that has to be fixed yesterday. The exact wrong-thing to do (and prohibited in policy in most distributions) is for the applications to have their own copies of the libraries. When a security flaw comes out in that case, we'd have to: 1) hunt through the all the packages we ship to find any that are affected. 2) update the various versions in all of those packages which might mean we have to generate multiple different fixes. 3) rebuild those packages and force our users to redownload all of them. If we had separate library packages for the separate versions we'd: 1) know exactly which packages had to be fixed 2) Only have to apply fixes once to the versions that we were shipping 3) have our users only download the library packages as the applications will load the fixed version from the system. I can go on with other reasons why this is a bad idea and how to mitigate problems but if you're convinced already, I'll surrender the soapbox to someone else :-)
Considering an extension interfacing a library shipped with the os, you do want to use this library, not add another copy.
libxml2 seems to be agood example to use here...
I guess on debian I'd need to likely install libxml2-dev before I could install the lxml package...
Note: I'm a Fedora dev, not a Debian dev but the packaging techniques are similar in generalities. You should just be able to request that lxml be installed and it will automatically pull in libxml2. libxml2-dev shouldn't enter the picture as a python program that imports lxml won't need the C headers. (Unless you're talking about *building* lxml which is a separate problem.)
...what about MacOS X?
...what about Windows?
Are you going to be distributing a separate version for MacOS X and Windows anyway since the norm is not to compile from source on those platforms? Then you're already at the point where you have multiple packages for different OS's. A source tarball for unix distributors and a binary zip/binhex/what have you for MacOSX and Windows.
An upstream extension maintainer cannot provide this unless he builds this extension for every (os) distribution and maintains it during the os' lifecycle.
...or just says in the docs "hey, you need libxml2 for this, unless you're on Windows, in which case the binary includes it".
- os distributors usually try to minimize the versions they include, trying to just ship one version.
...which is fair enough for the "system python", but many of us have a collection of apps, some of which require Python 2.4, some Python 2.5, and on top of each of those, different versions of different packages for each app.
In my case, I do source (alt-)installs of python rather than trusting the broken stuff that ships with Debian and buildout to make sure I get the right versions of the right packages for each project.
So this is fine to a certain extent. Pros: * Allows you to develop new applications using known good or latest versions of other software. * Allows you to deploy an app using newer-than system libraries on an otherwise stable-class distribution. Cons: * You become responsible for the code of all the components your installing. If there's a bug in your alt-install of lxml, you're the one that has to fix it rather than the linux distribution. * If you're distributing this so that everyone can use it, the os packagers are going to have to make sure that the code works with their versions and might have to do porting work. The first Con is the more important one for me.
- setuptools has the narrow minded view of a python package being contained in a single directory, which doesn't fit well when you do have common locations for include or doc files.
Python packages have no idea of "docs" or "includes", which is certainly a deficiency.
+1. I know I've mentioned paver before but one of the things that it does right is making the declarative metadata extensible. Whereas you can't simply add a new piece of metadata to setup.py's setup() you can add a new Bunch() of metadata in a paver pavement.py file without any other code. This makes it easy to do the right thing and write code to operate on "docs", "includes", "locales", etc that you've defined declaratively in the metadata section.
way packaging the python module with rpm or dpkg. E.g. namespace packages are a consequence how setuptools distributes and installs things. Why force this on everybody?
being able to break a large lump (say zope.*) into seperate distributions is a good idea, which setuptools implements very badly using namespace packages...
A big win could be a modularized setuptools where you are able to only use the things you do want to use, e.g.
- version specifications (not just the heuristics shipped with setuptools).
not sure what you mean by this.
I'm not 100% certain of what Matthias means but there's several problems with seutptools usage of versions: 1) The heuristic encourages bad practices. Versions need to be parsed by computer programs (package managers, scripts that maintain repositories, etc). Not all of those are written in python. Having things other than letters and dots in version strings is problematic for these programs. For instance, here's something that setuptools versioning heuristics allow you to do: foo-1.0rc1 foo-1.0 foo-1.0post1 But here's how rpm would order it: foo-1.0 foo-1.0post1 foo-1.0rc1 In Fedora we have rules for puting non-numeric things in our release tag to work around this: version: 1.0 , release: 0.1.rc1 version: 1.0 , release: 1 version: 1.0 , release: 2.post1 This is not all inclusive, but you can see, we have to move the alpha portion of the version to the release to ensure that the upgrade path will move forward sensibly. 2) This is more important but much harder. Something that would really help everyone is having a way of versioning API/ABI. Right now you can specify that you depend on Foo >= 1.0 Foo <= 2.0. But the version numbers don't have meaning until the actual packages are released. If Foo-1.0 and Foo-1.1 don't have compatible API, your numbers are wrong. If Foo-1.0 is succeeded by Foo-2.0 with the same API your numbers are too restrictive. If you lock the versions to only what you've tested: Foo = 1.0 then you're going to have people and distributions that want to use the new version but can't. Some places have good versioning rules:: https://svn.enthought.com/enthought/wiki/EnthoughtVersionNumbers Other places say they have marketing departments that prevent that One possibility would be to have MyLib1-1.0, MyLib2-1.0, MyLib2-2.0, etc with the version for marketing included in the package name. Another idea would be to have API information stored in metadata but not in the package name. That way marketing can have a big party for MyLib-2.0 but the API metadata has API_Revision: 32.
- specification of dependencies.
- resource management
?
http://peak.telecommunity.com/DevCenter/PythonEggs#accessing-package-resourc... I have no love for how pkg_resources implements this (including the API) but the idea of retrieving data files, locales, config files, etc from an API is good. For packages to be coded that conform to the File Hierachy Standard on Linux, the API (and metadata) needs to be more flexible. We need to be able to mark locale, config, and data files in the metadata. The build/install tool needs to be able to install those into the filesystem in the proper places for a Linux distro, an egg, etc. and then we need to be able to call an API to retrieve the specific class of resources or a directory associated with them. use cases: * config files go to /etc on Linux and we'd want to retrieve the contents of /etc/configfile * generic, architecture-independent data files go under /usr/share/. We'd want to place them in or under /usr/share/$PACKAGENAME. Mostly we're going to want to retrieve the contents of a specific data file. * locale files go under /usr/share/locale/ (ex: /usr/share/locale/en_US/LC_MESSAGES/compiz.mo) We'll want to retrieve the directory '/usr/share/locale' for feeding to gettext.
- a module system independent from any distribution specific stuff.
? I read this as "entry_points is a good feature".
- any other distribution specific stuff.
?
I think Matthias is trying to separate out the different services that setuptools provides so that they can be decoupled and worked on separately. So "other distribution specific stuff" would be things to do with distributing the results of your labors. eggs and pypi would fall under this. Matthias, if I'm wrong in any of this, please correct me :-). These are my perceptions due to them being the issues I have as a pakckger for a different distribution. -Toshio
At 11:00 AM 10/1/2008 -0700, Toshio Kuratomi wrote:
I have no love for how pkg_resources implements this (including the API) but the idea of retrieving data files, locales, config files, etc from an API is good. For packages to be coded that conform to the File Hierachy Standard on Linux, the API (and metadata) needs to be more flexible.
There's some confusion here. pkg_resources implements *resource* management and *metadata* management... NOT "file management". Resource files and metadata are no more "data" in the FHS sense than static data segments in a .so file are; they are simply a more convenient way of including such data than having a giant base64 string or something like that hardcoded into the program itself. There is thus no relevance to the FHS and absolutely no reason for them to live anywhere except within the Python packages they are a part of.
We need to be able to mark locale, config, and data files in the metadata.
Sure... and having a standard for specifying that kind of application/system-level install stuff is great; it's just entirely outside the scope of what eggs are for. To be clear, I mean here that a "file" (as opposed to a resource) is something that the user is expected to be able to read or copy, or modify. (Whereas a resource is something that is entirely internal to a library, and metadata is information *about* the library itself.)
The build/install tool needs to be able to install those into the filesystem in the proper places for a Linux distro, an egg, etc. and then we need to be able to call an API to retrieve the specific class of resources or a directory associated with them.
Agreed... assuming of course that we're keeping a clear distinction between static resources+metadata and actual "data" (e.g. configuration) files.
Le mercredi 01 octobre 2008 à 14:39 -0400, Phillip J. Eby a écrit :
We need to be able to mark locale, config, and data files in the metadata.
Sure... and having a standard for specifying that kind of application/system-level install stuff is great; it's just entirely outside the scope of what eggs are for.
I don’t follow you. If the library needs these files to work, you definitely want to ship them, whether it is as their FHS locations in a package, or in the egg.
To be clear, I mean here that a "file" (as opposed to a resource) is something that the user is expected to be able to read or copy, or modify. (Whereas a resource is something that is entirely internal to a library, and metadata is information *about* the library itself.)
It’s not as simple as that. Python is not the only thing out there, and there are many times where your resources need to be shipped in existing formats, in files that land at specific places. For example icons go in /usr/share/icons, locale files in .mo format in /usr/share/locale, etc. -- .''`. : :' : We are debian.org. Lower your prices, surrender your code. `. `' We will add your hardware and software distinctiveness to `- our own. Resistance is futile.
At 09:40 PM 10/1/2008 +0200, Josselin Mouette wrote:
Le mercredi 01 octobre 2008 à 14:39 -0400, Phillip J. Eby a écrit :
We need to be able to mark locale, config, and data files in the metadata.
Sure... and having a standard for specifying that kind of application/system-level install stuff is great; it's just entirely outside the scope of what eggs are for.
I donât follow you. If the library needs these files to work, you definitely want to ship them, whether it is as their FHS locations in a package, or in the egg.
Egg files aren't an all-purpose distribution format; they were designed for application plugins, and for libraries needed to support application plugins. As such, they're self-contained and weren't designed for application-level installation support, such as documentation, configuration or data files, icons, etc. As has been pointed out, these are deficiencies of .egg files wrt the full spectrum of library and application installation needs, which is why I'm pushing for us working on an installation metadata standard that can accommodate these other needs that the .egg layout isn't really suited for. My main point about the resources is simply that it's a needless complication to physically separate static data needed by a library at runtime, based solely on its file extension, in cases where only that library will be reading that file, and the file's contents are constant for that version of the library. To put it another way, if some interpretation of the FHS makes a distinction between two files encoding the same data, one named foo.bar and foo.py, where the only difference between the two is the internal encoding of the data, then that interpretation of the FHS is not based on any real requirement, AFAICT. Of course, for documentation, application icons, and suchlike, the data *will* be read by things other than the library itself, and so a standardized location is appropriate. The .egg format was designed primarily to support resources read only by the package in question, and secondarily to support metadata needed by applications or libraries that the package "plugs in" to. It was not originally intended to be an general-purpose system package installation format.
To be clear, I mean here that a "file" (as opposed to a resource) is something that the user is expected to be able to read or copy, or modify. (Whereas a resource is something that is entirely internal to a library, and metadata is information *about* the library itself.)
Itâs not as simple as that. Python is not the only thing out there, and there are many times where your resources need to be shipped in existing formats, in files that land at specific places. For example icons go in /usr/share/icons, locale files in .mo format in /usr/share/locale, etc.
And docs need to go in /usr/share/doc, I presume. But these aren't necessarily "resources" in the way I'm defining the term. Some of them *could* be, perhaps. Others aren't. To be clear, what I'm trying to say is that it is a perfectly valid use case for a Python package author to have static data contained within their Python package directory layout for purposes of accessing that data as if it were code, but without having to go to the trouble of converting it to a .py file (and possibly having to extract it back out at runtime). This usage of "data" files isn't in conflict with the FHS, as I understand it. But I also understand that there are other kinds of "data" files which *don't* fall under that use case, and which it is desirable to install to shared locations. We need to support both.
Phillip J. Eby wrote:
At 09:40 PM 10/1/2008 +0200, Josselin Mouette wrote:
Le mercredi 01 octobre 2008 à 14:39 -0400, Phillip J. Eby a écrit :
We need to be able to mark locale, config, and data files in the metadata.
Sure... and having a standard for specifying that kind of application/system-level install stuff is great; it's just entirely outside the scope of what eggs are for.
I don’t follow you. If the library needs these files to work, you definitely want to ship them, whether it is as their FHS locations in a package, or in the egg.
Egg files aren't an all-purpose distribution format; they were designed for application plugins, and for libraries needed to support application plugins. As such, they're self-contained and weren't designed for application-level installation support, such as documentation, configuration or data files, icons, etc.
As has been pointed out, these are deficiencies of .egg files wrt the full spectrum of library and application installation needs, which is why I'm pushing for us working on an installation metadata standard that can accommodate these other needs that the .egg layout isn't really suited for.
We need to get the list of problems up somewhere on the wiki so that people can check that the evolving standard doesn't fall into the same pitfalls. After all, people are using the egg and pkg_resources API for just this purpose today with some happy about it and others not so much.
My main point about the resources is simply that it's a needless complication to physically separate static data needed by a library at runtime, based solely on its file extension, in cases where only that library will be reading that file, and the file's contents are constant for that version of the library.
To put it another way, if some interpretation of the FHS makes a distinction between two files encoding the same data, one named foo.bar and foo.py, where the only difference between the two is the internal encoding of the data, then that interpretation of the FHS is not based on any real requirement, AFAICT.
Actually, file encoding is one major criteria in the FHS. However, it's probably not in the manner you're thinking of :-) Files which are architecture dependent generally need to be separated from files which are architecture independent. Since text files and binary data which has a standard byte-oriented format are generally what's used as data these days it's the major reason that data files usually go in /usr/share while libraries/binaries go in /usr/lib and /usr/bin. This is dues to the range of computers that architecture dependent vs architecture independent data can be shared with. Of course, part of python's site-packages on Linux systems violates this rule as python can split architecture dependent and architecture independent packages from one another. I know that some distributions have debated moving the architecture independent portion of site-packages to /usr/share although I don't know if any have (Josselin, has Debian done this?) The idea of moving is not straight forward because of 1) compatibility with unpackaged software and 2) /usr/share is seen in two lights: the place for architecture independent files and the place for data; /usr/lib is seen in two lights: the place for architecture dependent non-executables and the place for code whose instructions are run by executables.
Of course, for documentation, application icons, and suchlike, the data *will* be read by things other than the library itself, and so a standardized location is appropriate. The .egg format was designed primarily to support resources read only by the package in question, and secondarily to support metadata needed by applications or libraries that the package "plugs in" to. It was not originally intended to be an general-purpose system package installation format.
<nod>. Despite this design, it's presently being used for that. So we need to figure out what to do about it.
To be clear, I mean here that a "file" (as opposed to a resource) is something that the user is expected to be able to read or copy, or modify. (Whereas a resource is something that is entirely internal to a library, and metadata is information *about* the library itself.)
It’s not as simple as that. Python is not the only thing out there, and there are many times where your resources need to be shipped in existing formats, in files that land at specific places. For example icons go in /usr/share/icons, locale files in .mo format in /usr/share/locale, etc.
And docs need to go in /usr/share/doc, I presume.
docs are special in the packaging world on several accounts. Generally the packager has to collect at least some of the docs themselves (as things like LICENSE.txt aren't normally included in a doc install but are important for distributions to package.) rpm, at least provides a macro to make it easy for the packager to mark files and directories from the source tree as documentation which rpm will put in the appropriate directory itself. So packagers often use an upstream's build scripts to build the docs, but usually install the docs using the package tool's facilities. Additionally, there's a difference between docs which the program uses (for instance for online help) and docs which the end user would have to navigate the filesystem and invoke a viewer themselves to read. The former is data, the latter is docs.
But these aren't necessarily "resources" in the way I'm defining the term. Some of them *could* be, perhaps. Others aren't.
To be clear, what I'm trying to say is that it is a perfectly valid use case for a Python package author to have static data contained within their Python package directory layout for purposes of accessing that data as if it were code, but without having to go to the trouble of converting it to a .py file (and possibly having to extract it back out at runtime). This usage of "data" files isn't in conflict with the FHS, as I understand it.
But I also understand that there are other kinds of "data" files which *don't* fall under that use case, and which it is desirable to install to shared locations. We need to support both.
Possibly. We could definitely throw out the first case (resources) and just have a data category and the FHS would be fine. Whether there's a case for resources depends on their definition. the test of "Could be put in a python file and extracted" doesn't fly. I could convert all my images to .xpm and put them in python files. But that's a lot of work. And the moment I take them back out to separate .xpm files, they would definitely belong in /usr/share. -Toshio
Josselin Mouette wrote:
Le mercredi 01 octobre 2008 à 14:39 -0400, Phillip J. Eby a écrit :
To be clear, I mean here that a "file" (as opposed to a resource) is something that the user is expected to be able to read or copy, or modify. (Whereas a resource is something that is entirely internal to a library, and metadata is information *about* the library itself.)
It’s not as simple as that. Python is not the only thing out there, and there are many times where your resources need to be shipped in existing formats, in files that land at specific places. For example icons go in /usr/share/icons, locale files in .mo format in /usr/share/locale, etc.
Perhaps I'm putting words into PJE's mouth but it seems to me that if distributions want to put things in widely differing places from where the developer had them (in a single tree), then we're going to need a Python standard library API (implemented per platform/OS that defines those places!) for the code to find these resources/files. Certainly the expectation shouldn't be on developers to have to handle all the possible different locations OSes are going to put things? -- Dave
Dave Peterson wrote:
Josselin Mouette wrote:
Le mercredi 01 octobre 2008 à 14:39 -0400, Phillip J. Eby a écrit :
To be clear, I mean here that a "file" (as opposed to a resource) is something that the user is expected to be able to read or copy, or modify. (Whereas a resource is something that is entirely internal to a library, and metadata is information *about* the library itself.)
It’s not as simple as that. Python is not the only thing out there, and there are many times where your resources need to be shipped in existing formats, in files that land at specific places. For example icons go in /usr/share/icons, locale files in .mo format in /usr/share/locale, etc.
Perhaps I'm putting words into PJE's mouth but it seems to me that if distributions want to put things in widely differing places from where the developer had them (in a single tree), then we're going to need a Python standard library API (implemented per platform/OS that defines those places!) for the code to find these resources/files. Certainly the expectation shouldn't be on developers to have to handle all the possible different locations OSes are going to put things?
The code should be general. We just need to have a configuration file that has the defaults for the OS-architecture the library is installed on. This shouldn't be much of a problem, distutils already holds information like that (for instance the value of distutils.sysconfig.get_python_lib() and get_python_lib(1).) -Toshio
Phillip J. Eby wrote:
At 11:00 AM 10/1/2008 -0700, Toshio Kuratomi wrote:
I have no love for how pkg_resources implements this (including the API) but the idea of retrieving data files, locales, config files, etc from an API is good. For packages to be coded that conform to the File Hierachy Standard on Linux, the API (and metadata) needs to be more flexible.
There's some confusion here. pkg_resources implements *resource* management and *metadata* management... NOT "file management".
Resource files and metadata are no more "data" in the FHS sense than static data segments in a .so file are; they are simply a more convenient way of including such data than having a giant base64 string or something like that hardcoded into the program itself. There is thus no relevance to the FHS and absolutely no reason for them to live anywhere except within the Python packages they are a part of.
If we can agree on a definition of resource files there's a case to be made here. One of the problems, though, is that people use pkg_resources for things that are data. Now there could be two reasons for that: 1) Developers are abusing pkg_resources. 2) Linux distributions disagree with you on what consitutes data vs a resource. Let's discuss the definition of resource vs data below (since you made a good start at it) and we can see which of these it is.
We need to be able to mark locale, config, and data files in the metadata.
Sure... and having a standard for specifying that kind of application/system-level install stuff is great; it's just entirely outside the scope of what eggs are for.
To be clear, I mean here that a "file" (as opposed to a resource) is something that the user is expected to be able to read or copy, or modify. (Whereas a resource is something that is entirely internal to a library, and metadata is information *about* the library itself.)
metadata, I haven't even begun to think about yet. I personally don't see a huge need to shift it around on the filesystem but someone who's thought about it longer might find reasons that it belongs in some other place. resources, as I said needs to be defined. You're saying here that a resource is something internal to the library. A "file" is something that a user can read, copy, or modify. In a typical TurboGears app, there's the following things to be found inside of the app's directory in site-packages: config/{app.cfg,__init__.py,log.cfg} - These could go in /etc/ as their configuration. However, I've tried to stress to upstream that only things that configure the TurboGears framework for use with their app should go in these files (default templating language, identity controller). When those things are true, I can see this as being an "internal resource". If upstream can't get their act together, it's config. locale/{message catalogs for various languages} -- These are binary files that contain strings that the user may see when a message is given. These, I think are data. templates/*html -- These are templates that the application fills in with variables intermixed with short bits of code. These are on the border between code and data. The user sees them in a modified form. The app sometimes executes pieces of them before the user sees them. Some template languages create python byte code from the templates, others load them and write into them every time. None of them can be executed on their own. All of them have to be loaded by a call to parse them from a piece of python code in another file. None of them are directly called or invoked. My leaning is that these are data. static/{javascript,css,images} -- These are things that are definitely never executed. They are served by the webserver verbatim when their URL is called. These are certainly data. (Note: I don't believe these are referenced using the resources API, just via URL.) So... do you agree on which of these are data and which are resources? Do you have an idea on how we can prevent application and framework writers from misusing the resources API to load things that are data?
The build/install tool needs to be able to install those into the filesystem in the proper places for a Linux distro, an egg, etc. and then we need to be able to call an API to retrieve the specific class of resources or a directory associated with them.
Agreed... assuming of course that we're keeping a clear distinction between static resources+metadata and actual "data" (e.g. configuration) files.
<nod>. The definition and distinction is important. -Toshio
At 03:14 PM 10/1/2008 -0700, Toshio Kuratomi wrote:
resources, as I said needs to be defined. You're saying here that a resource is something internal to the library. A "file" is something that a user can read, copy, or modify.
I should probably clarify that I mean "unmediated by the program"... which is why I disagree regarding message catalogs. They're not user-modifiable and there's nothing you can usefully do with them outside the program that uses them. Of course, a default message catalog for someone to use to *create* translations from might be another story...
In a typical TurboGears app, there's the following things to be found inside of the app's directory in site-packages:
config/{app.cfg,__init__.py,log.cfg} - These could go in /etc/ as their configuration. However, I've tried to stress to upstream that only things that configure the TurboGears framework for use with their app should go in these files (default templating language, identity controller). When those things are true, I can see this as being an "internal resource". If upstream can't get their act together, it's config.
Agreed.
locale/{message catalogs for various languages} -- These are binary files that contain strings that the user may see when a message is given. These, I think are data.
I lean the other way, since they're not editable. Keep in mind that some platforms have no "locale" directories as such, and thus trying to support multiple locations thereof is a burden for cross-platform app developers, vs. simply treating them as internal resources. This is especially true for plugin-oriented architectures that want to distribute localizations as plugins, with one plugin being able to supply localization for another.
templates/*html -- These are templates that the application fills in with variables intermixed with short bits of code. These are on the border between code and data. The user sees them in a modified form. The app sometimes executes pieces of them before the user sees them. Some template languages create python byte code from the templates, others load them and write into them every time. None of them can be executed on their own. All of them have to be loaded by a call to parse them from a piece of python code in another file. None of them are directly called or invoked. My leaning is that these are data.
If you follow this logic (create bytecode from it, can't run w/out interp or compiler), then any .py file is *also* data; i.e., this Does Not Follow.
static/{javascript,css,images} -- These are things that are definitely never executed. They are served by the webserver verbatim when their URL is called. These are certainly data. (Note: I don't believe these are referenced using the resources API, just via URL.)
Uh... that would depend entirely on the library or application. But if they're static and the user's got no business tinkering with them, then it's a resource.
So... do you agree on which of these are data and which are resources? Do you have an idea on how we can prevent application and framework writers from misusing the resources API to load things that are data?
Apparently not. The alternative I would suggest is that under the new standard, an install tool should be allowed to relocate any non-Python files, and all access has to go through a resource API. The install tool would then have to be responsible for putting some kind of forwarding information in the package directory to tell the resource API where it squirrelled the file(s) off to. Then we can avoid all this angels-on-a-pin argument and the distros can Have It Their Way[tm]. I'd have preferred to avoid that complexity, but if the two of us can't agree then there's no way on earth to get a community consensus. Btw, pkg_resources' concept of "metadata" would also need to be relocatable, since e.g. the "EggTranslations" package uses that metadata to store localizations of image resources and message catalogs. (Other uses of the metadata files also inlcude scripts, dependencies, version info, etc.) Hopefully, those folks who want relocation ability will chip in with code, docs, testing, etc. for the feature at some point.
Phillip J. Eby wrote:
At 03:14 PM 10/1/2008 -0700, Toshio Kuratomi wrote:
resources, as I said needs to be defined. You're saying here that a resource is something internal to the library. A "file" is something that a user can read, copy, or modify.
I should probably clarify that I mean "unmediated by the program"... which is why I disagree regarding message catalogs. They're not user-modifiable and there's nothing you can usefully do with them outside the program that uses them. Of course, a default message catalog for someone to use to *create* translations from might be another story...
<nod> this is what I was afraid of. This is definitely not a definition of resource-only that has meaning for Linux distributions. None of the data in /usr/share is user-modifiable (a tiny bit of it is copiable for the user to then edit the copy) and although a good fraction of it is usable outside the program that uses it, a much larger portion is taken up with things that are used by one program. I could go through the examples below and tell why Linux distributions feel the way they do but I don't think it's necessary. Whether they're data or resources, the files need to be relocatable. And they need to be accessed via an API for that to work. So as long as we're agreed that these have to be included in the egg on some platforms and in the filesystem on others then I think we know what needs to be done. [...]
So... do you agree on which of these are data and which are resources? Do you have an idea on how we can prevent application and framework writers from misusing the resources API to load things that are data?
Apparently not. The alternative I would suggest is that under the new standard, an install tool should be allowed to relocate any non-Python files, and all access has to go through a resource API. The install tool would then have to be responsible for putting some kind of forwarding information in the package directory to tell the resource API where it squirrelled the file(s) off to. Then we can avoid all this angels-on-a-pin argument and the distros can Have It Their Way[tm].
In terms of implementation I'd much rather see something less centered on the egg being the right way and the filesystem being a secondary concern. We should have metadata that tells us where the types of resources come from. When a package is installed on Linux the metadata could point locales at file:///usr/share/locale. When on Windows egg:locale (Perhaps the uninstalled case would use this too... that depends on how the egg structure and metadata evolves.) A question we'd have to decide is whether this particular metadata is something that should be defined globally or per package. Or globally with a chance for packages to override it.
I'd have preferred to avoid that complexity, but if the two of us can't agree then there's no way on earth to get a community consensus.
Btw, pkg_resources' concept of "metadata" would also need to be relocatable, since e.g. the "EggTranslations" package uses that metadata to store localizations of image resources and message catalogs. (Other uses of the metadata files also inlcude scripts, dependencies, version info, etc.)
Actually, we should decide whether we want to support that kind of thing within the egg metadata at all. The other things we've been talking about belonging in the metadata are simple key value pairs. EggTranslations uses the metadata area as a data store. (Or in your definition, a resource store). This breaks with the definition of what metadata is. Translations don't store information about a package, they store alternate views of data within the package. While the simple key value pairings can be located in either setuptools .egg-info directories or python-2.5+ distutils .egg-info files, the data store in EggTranslations can only be placed in directories. Having a data store/resource store API would be more appropriate for the kinds of things that EggTranslation is doing. -Toshio
Toshio Kuratomi wrote:
<nod> this is what I was afraid of. This is definitely not a definition of resource-only that has meaning for Linux distributions. None of the data in /usr/share is user-modifiable
In that case it must be there because it's architecture-independent, right? But by that criterion, all .py files should be in /usr/share, too. Also all shell scripts, Perl code, awk/sed scripts, etc, etc. Does the FHS specify that? -- Greg
Greg Ewing wrote:
Toshio Kuratomi wrote:
<nod> this is what I was afraid of. This is definitely not a definition of resource-only that has meaning for Linux distributions. None of the data in /usr/share is user-modifiable
In that case it must be there because it's architecture-independent, right?
...That doesn't follow from what I said, but it's true :-)
But by that criterion, all .py files should be in /usr/share, too.
I mentioned in a different post that this has been considered by several distributions. Note that not all .py files can be shifted due to the way python parses modules. But certainly modules which are pure python could be moved. Reasons that Fedora hasn't done this are: 1) Historical: .py files have been in /usr/lib/python2.5/site-packages for a long time. 2) Compatibility with third parties: Unfortunately not everyone uses distutils. If we shifted the location to /usr/share and users installed those packages into /usr/lib it would fail. 3) /usr/share has two purposes/criteria[1]_: architecture independent and datafiles. /usr/lib has two criteria[2]_: architecture independent and libraries. With .py{,c,o} we have both architecture indepedence and a library. So the criteria is in conflict with each other. There may be more reasons, I'm in the /usr/share camp but not so much that I'll keep bringing it up when there's no new arguments to give. Note that Debian has done a lot of neat things with python source recent(ish). Josselin, Matthias, and some of the other Debian devs could tell us if .py files get installed to /usr/share there. .. _[1]: http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREARCHITECTUREINDEPENDENT... .. [2]_: http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGAN...
Also all shell scripts, Perl code, awk/sed scripts, etc, etc.
Things that are directly executable belong in a bin directory. There are next to no shell script libraries, just scripts. Perl, awk, sed, etc *scripts* end up in /bin as well. To my knowledge perl doesn't support the split architecture independent library location/architecture dependent library location that python does so everything goes into /usr/lib. Mono assemblies do not because of a pair of limitations of the mono vm. java jars go in /usr/share. The m4 macros that autoconf/automake use go there as well. Programs that are written in python but don't want to expose their internals to the outside world have their code under /usr/share. We make php apps do the same. Perl is probably the same although I haven't looked at an actual multi-file perl program in.... well, I don't remember when so I don't know.
Does the FHS specify that?
The FHS sets out certain rules and criteria. Linux vendors have interpreted them and sometimes the standard is updated due to either current practice or clarification of former practice. I don't believe that FHS "specifies" that .py files go in /usr/lib or /usr/share. The rules state things like "architecture independent data file" which is why there's some grey area for /usr/lib/python's .py files. Note that although I'm happy to talk about the FHS here, I'm not involved with creating the standard. I'm also only one packager from one distro. So I'm happy to help answer questions about the FHS and how Fedora interprets it but am not in any better position to change it than any of you. -Toshio
Toshio Kuratomi
3) /usr/share has two purposes/criteria[1]_: architecture independent and datafiles. /usr/lib has two criteria[2]_: architecture independent and libraries. With .py{,c,o} we have both architecture indepedence and a library. So the criteria is in conflict with each other.
I think you mean: * /usr/share has two criteria: architecture independent and data files * /usr/lib has two criteria: architecture dependent and libraries and that Python library files are both architecture independent and libraries, thus neither of those fit perfectly. -- \ “A free press is one where it's okay to state the conclusion | `\ you're led to by the evidence.” —Bill Moyers | _o__) | Ben Finney
On Wed, Oct 01, 2008 at 09:33:52PM -0700, Toshio Kuratomi wrote:
Note that Debian has done a lot of neat things with python source recent(ish). Josselin, Matthias, and some of the other Debian devs could tell us if .py files get installed to /usr/share there.
Currently the two helper tools install the .py (and .egg-info) files somewhere into /usr/share. One tool places the .pyc files in /usr/lib while the other places them in /var/lib (and uses a .pth file to get that directory on sys.path), both have symlinks to the real .py next too the .pyc files. This way the same .py file can be shared between more then one version of python. Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org
On Wed, Oct 01, 2008 at 09:33:52PM -0700, Toshio Kuratomi wrote:
Note that Debian has done a lot of neat things with python source recent(ish). Josselin, Matthias, and some of the other Debian devs could tell us if .py files get installed to /usr/share there.
One of the current options in Debian (python-central) involves installing the .py files to /usr/share/pycentral/package_name and symlinking them in the site-package of each Python version the package is known to work with. You have thus separating of binary and source code. Gaël
Le jeudi 02 octobre 2008 à 15:33 +1200, Greg Ewing a écrit :
Toshio Kuratomi wrote:
<nod> this is what I was afraid of. This is definitely not a definition of resource-only that has meaning for Linux distributions. None of the data in /usr/share is user-modifiable
In that case it must be there because it's architecture-independent, right?
But by that criterion, all .py files should be in /usr/share, too.
Also all shell scripts, Perl code, awk/sed scripts, etc, etc. Does the FHS specify that?
The FHS is not that precise, but in Debian, we do: * move architecture-independent perl modules to /usr/share/perl5 * move architecture-independent python modules to /usr/share/{python-support,pyshared} * when possible, move shell scripts (except those in /usr/bin) from /usr/libexec or /usr/lib to /usr/share/$package There’s no rocket science in that. But I wish we could make things as simple for Python modules as they are for Perl ones. But Perl does not allow messing with module namespaces with .pth files, nor shipping resource files and whatnot in the perl module directory. Only perl modules are allowed there. Cheers, -- .''`. : :' : We are debian.org. Lower your prices, surrender your code. `. `' We will add your hardware and software distinctiveness to `- our own. Resistance is futile.
At 07:14 PM 10/1/2008 -0700, Toshio Kuratomi wrote:
In terms of implementation I'd much rather see something less centered on the egg being the right way and the filesystem being a secondary concern.
Eggs don't have anything to do with it; in Python, it's simply common sense to put static resources next to the code that uses them, if you want to "write once, run anywhere". And given Python's strength as an interactive development language with no "build" step, having to *install* your data files somewhere else on the system to use them isn't a *feature* -- not for a developer, anyway. And our hypothetical de-jure standard won't replace the de-facto standard unless it's adopted by developers... and it won't be adopted if it makes their lives harder without a compensating benefit. For the developer, FHS support is a cost, not a benefit, and only relevant to a subset of platforms, so the spec should make it as transparent for them as possible, if they don't have an interest in explicit support for it. By the STASCTAP principle (Simple Things Are Simple, Complex Things Are Possible), it should be possible for distros to relocate, and simple for developers not to care about it.
We should have metadata that tells us where the types of resources come from. When a package is installed on Linux the metadata could point locales at file:///usr/share/locale. When on Windows egg:locale (Perhaps the uninstalled case would use this too... that depends on how the egg structure and metadata evolves.)
A question we'd have to decide is whether this particular metadata is something that should be defined globally or per package. Or globally with a chance for packages to override it.
I think install tools should handle it and keep it out of developers' hair. We should of course distinguish configuration and other writable data from static data, not to mention documentation. Any other file-related info is going to have to be optional, if that. I don't really think it's a good idea to ask developers to fill in information they don't understand. A developer who works entirely on Windows, for example, is not going to have a clue what to specify for FHS stuff, and they absolutely shouldn't have to if all they're doing is including some static data. Even today, there exist Python developers who don't use the distutils to distribute their packages, so anything that makes it even more difficult than it is today, isn't going to be a viable standard. The closer we can get in ease of use to just tarring up a directory, the more viable it'll be. (That's one reason, btw, why setuptools offers revision control support and find_packages() for automating discovery of what to include.)
I'd have preferred to avoid that complexity, but if the two of us can't agree then there's no way on earth to get a community consensus.
Btw, pkg_resources' concept of "metadata" would also need to be relocatable, since e.g. the "EggTranslations" package uses that metadata to store localizations of image resources and message catalogs. (Other uses of the metadata files also inlcude scripts, dependencies, version info, etc.)
Actually, we should decide whether we want to support that kind of thing within the egg metadata at all. The other things we've been talking about belonging in the metadata are simple key value pairs. EggTranslations uses the metadata area as a data store. (Or in your definition, a resource store). This breaks with the definition of what metadata is. Translations don't store information about a package, they store alternate views of data within the package.
I was actually somewhat incorrect in my statement about the distinction between pkg_resources "metadata" and "resources"; "metadata" is really "data that goes with the distribution, not with a specific package within the distribution". Only some of this data is "about" the distribution; the rest is data "with" or "of" the distribution. (This is a slight API wart, but the use case exists nonetheless.) Meanwhile, regarding the proposed key-value pairs system, I don't see how that works; "extras" dependency information and entry points are a bit more structured than just key-value pairs; both are currently represented as .ini-like files with arbitrary section names. I suppose you could squash those entire files into values in some sort of key-value system, but that seems a bit hairy to me. In particular, setuptools design choice for separate metadata files is that many of these things don't need to be loaded at the same time. Also, PKG-INFO-style metadata can contain rather large blobs of text that aren't needed or useful at runtime. Entry points and extras are mostly runtime metadata, with the occasional bit of build or install usage.
Phillip J. Eby wrote:
I think install tools should handle it and keep it out of developers' hair. We should of course distinguish configuration and other writable data from static data, not to mention documentation. Any other file-related info is going to have to be optional, if that. I don't really think it's a good idea to ask developers to fill in information they don't understand. A developer who works entirely on Windows, for example, is not going to have a clue what to specify for FHS stuff, and they absolutely shouldn't have to if all they're doing is including some static data.
I may be missing something, but why should the developer even care about FHS ? We should not standardize what goes where, but the kind of data needed to be installed (doc, etc...), and then have different (pluggable) implementations to put those where they should. Autotools does it this way, for example: you have pkg_data, etc... and every one of them can be changed. The proof that this is flexible is that fact that something like GoboLinux (which is a bit like what Mac OS X handles their files) is possible at all from the same sources. http://www.gobolinux.org/?page=doc/articles/compile I don't see the need for reinventing anything here: just start from the same concepts as autotools, modify it to handle non unix softwares (if it is even needed), and that's it. cheers, David
At 07:40 PM 10/2/2008 +0900, David Cournapeau wrote:
Phillip J. Eby wrote:
I think install tools should handle it and keep it out of developers' hair. We should of course distinguish configuration and other writable data from static data, not to mention documentation. Any other file-related info is going to have to be optional, if that. I don't really think it's a good idea to ask developers to fill in information they don't understand. A developer who works entirely on Windows, for example, is not going to have a clue what to specify for FHS stuff, and they absolutely shouldn't have to if all they're doing is including some static data.
I may be missing something, but why should the developer even care about FHS ? We should not standardize what goes where, but the kind of data needed to be installed (doc, etc...), and then have different (pluggable) implementations to put those where they should.
Yep - that's precisely what I've proposed. We just need to define those categories in such a way as to minimize the burden on a Python developer. Where data files are concerned, however, a developer should only need to distinguish between read-only, read-write, and samples, because any finer-grained distinction that relies on platform-specific concepts (like locale directories) is probably going to be error-prone.
Phillip J. Eby wrote:
Where data files are concerned, however, a developer should only need to distinguish between read-only, read-write, and samples, because any finer-grained distinction that relies on platform-specific concepts (like locale directories) is probably going to be error-prone.
However, if we don't allow for such distinctions, then people who e.g. are passionate about their locale files ending up in the right place on Linux aren't going to be happy. -- Greg
At 03:25 PM 10/3/2008 +1300, Greg Ewing wrote:
Phillip J. Eby wrote:
Where data files are concerned, however, a developer should only need to distinguish between read-only, read-write, and samples, because any finer-grained distinction that relies on platform-specific concepts (like locale directories) is probably going to be error-prone.
However, if we don't allow for such distinctions, then people who e.g. are passionate about their locale files ending up in the right place on Linux aren't going to be happy.
Yep. As I've said, it should be possible to define optional extensions that define these sorts of things. They're needed for e.g. icons, Windows registry entries, etc. anyway.
Phillip J. Eby wrote:
At 07:14 PM 10/1/2008 -0700, Toshio Kuratomi wrote:
In terms of implementation I'd much rather see something less centered on the egg being the right way and the filesystem being a secondary concern.
Eggs don't have anything to do with it; in Python, it's simply common sense to put static resources next to the code that uses them, if you want to "write once, run anywhere". And given Python's strength as an interactive development language with no "build" step, having to *install* your data files somewhere else on the system to use them isn't a *feature* -- not for a developer, anyway.
You're arguing about the developers point of view on something that's hidden behind an API. You've already made it so that the developer cannot just reference the file on the filesystem because the egg may be zipped. So for the developer there's no change here. I'm saying that there's no need to have a hardcoded path to lookup the information at and then make the install tool place "forwarding information" there to send the package somewhere else. We have metadata. We should use it.
And our hypothetical de-jure standard won't replace the de-facto standard unless it's adopted by developers... and it won't be adopted if it makes their lives harder without a compensating benefit. For the developer, FHS support is a cost, not a benefit, and only relevant to a subset of platforms, so the spec should make it as transparent for them as possible, if they don't have an interest in explicit support for it. By the STASCTAP principle (Simple Things Are Simple, Complex Things Are Possible), it should be possible for distros to relocate, and simple for developers not to care about it.
It's both a cost and a benefit. The cost is having to use an API which they have to use anyway due to eggs possibly being zip files. The benefit is getting their code packaged by Linux distributors quicker and getting more contributors as a result of the exposure.
We should have metadata that tells us where the types of resources come from. When a package is installed on Linux the metadata could point locales at file:///usr/share/locale. When on Windows egg:locale (Perhaps the uninstalled case would use this too... that depends on how the egg structure and metadata evolves.)
A question we'd have to decide is whether this particular metadata is something that should be defined globally or per package. Or globally with a chance for packages to override it.
I think install tools should handle it and keep it out of developers' hair. We should of course distinguish configuration and other writable data from static data, not to mention documentation. Any other file-related info is going to have to be optional, if that. I don't really think it's a good idea to ask developers to fill in information they don't understand. A developer who works entirely on Windows, for example, is not going to have a clue what to specify for FHS stuff, and they absolutely shouldn't have to if all they're doing is including some static data.
Needing to have some information about the files you ship is inevitable. Documentation is a good example. man pages, License.txt, gnome help files, windows help files, API docs, sphinx docs, etc each have to be installed in different places, some with requirements to register the files so the system knows they exist. All the knowledge about what to do with these files should be placed in the tool. But the knowledge of what type to mark a given file with will have to lay with the developer.
Even today, there exist Python developers who don't use the distutils to distribute their packages, so anything that makes it even more difficult than it is today, isn't going to be a viable standard. The closer we can get in ease of use to just tarring up a directory, the more viable it'll be. (That's one reason, btw, why setuptools offers revision control support and find_packages() for automating discovery of what to include.)
Actually, as a person who distributes upstream packages which don't use distutils and is exposed to others, I'd say that the shortcomings in terms of where to install files and how to reference the files after install is one of the reasons that distutils is not used. Are there other reasons? Sure. But this is definitely one of the reasons.
I'd have preferred to avoid that complexity, but if the two of us can't agree then there's no way on earth to get a community consensus.
Btw, pkg_resources' concept of "metadata" would also need to be relocatable, since e.g. the "EggTranslations" package uses that metadata to store localizations of image resources and message catalogs. (Other uses of the metadata files also inlcude scripts, dependencies, version info, etc.)
Actually, we should decide whether we want to support that kind of thing within the egg metadata at all. The other things we've been talking about belonging in the metadata are simple key value pairs. EggTranslations uses the metadata area as a data store. (Or in your definition, a resource store). This breaks with the definition of what metadata is. Translations don't store information about a package, they store alternate views of data within the package.
I was actually somewhat incorrect in my statement about the distinction between pkg_resources "metadata" and "resources"; "metadata" is really "data that goes with the distribution, not with a specific package within the distribution". Only some of this data is "about" the distribution; the rest is data "with" or "of" the distribution. (This is a slight API wart, but the use case exists nonetheless.)
Meanwhile, regarding the proposed key-value pairs system, I don't see how that works; "extras" dependency information and entry points are a bit more structured than just key-value pairs; both are currently represented as .ini-like files with arbitrary section names. I suppose you could squash those entire files into values in some sort of key-value system, but that seems a bit hairy to me. In particular, setuptools design choice for separate metadata files is that many of these things don't need to be loaded at the same time. Also, PKG-INFO-style metadata can contain rather large blobs of text that aren't needed or useful at runtime. Entry points and extras are mostly runtime metadata, with the occasional bit of build or install usage.
Structured, yes. Structure and optimizations to how you lookup the data is good. But there is a difference between using metadata to save and lookup configuration and using metadata to save and lookup data (like locale files). You wouldn't save data into gconf or the Windows Registry for instance (at least, not if you don't expect people to make fun of you *cough*evolution*cough*). OTOH if it's not really a metadata store vs a resource store but instead a package store vs a distribution store we need to decide if we really want to have both. Someone pointed out earlier that Side note: the fact that someone wrote EggTranslations speaks of a need for people to be able to access the per-package data store across packages. Let's fix that and work with EggTranslations to rewrite its backend to use a proper storage. (Looking at the EggTranslations documentation, it might even be a proper place for getting ideas and help with designing the API for a public data store.) -Toshio
At 10:33 AM 10/2/2008 -0700, Toshio Kuratomi wrote:
OTOH if it's not really a metadata store vs a resource store but instead a package store vs a distribution store we need to decide if we really want to have both. Someone pointed out earlier that
Side note: the fact that someone wrote EggTranslations speaks of a need for people to be able to access the per-package data store across packages. Let's fix that and work with EggTranslations to rewrite its backend to use a proper storage. (Looking at the EggTranslations documentation, it might even be a proper place for getting ideas and help with designing the API for a public data store.)
If we want to have something that can be adopted with any speed, we're going to have to rule out of scope anything that forces people to change their *runtime* code (vs. packaging code). Even setuptools doesn't require that people use the API; it detects when a program is probably reading stuff directly and installs packages unzipped in that case. So, let's focus the discussion towards ways to make it possible for people to declare what they're *already* doing; otherwise, we are just adding to the switching cost for the new system, which needs to be kept as low as possible. The new system has to be more attractive to developers in the general case, in order to overcome the cost of switching, so adding *new* switching costs is to be avoided at all... costs. :)
Le mercredi 01 octobre 2008 à 19:59 -0400, Phillip J. Eby a écrit :
locale/{message catalogs for various languages} -- These are binary files that contain strings that the user may see when a message is given. These, I think are data.
I lean the other way, since they're not editable.
Locale data should be shipped in standard form, in /usr/share/locale. Again, Python is not the only thing you need to think of. Not only, if you don’t use standard gettext catalogs, you are breaking translation tools, but if you don’t ship them at the standard location, you are breaking tools like localepurge (or anything that cleans up the filesystem based on standard file locations).
Keep in mind that some platforms have no "locale" directories as such, and thus trying to support multiple locations thereof is a burden for cross-platform app developers, vs. simply treating them as internal resources.
No, you are making the burden heavier for Linux platforms by trying to be more portable. There is no reason why you can’t define a locale directory on platforms where it does not exist, but when it exists a standards document defining where files go, you must follow it. Any time you don’t follow it, we consider it a serious breakage and we have to patch the code anyway.
templates/*html -- These are templates that the application fills in with variables intermixed with short bits of code. [snip] My leaning is that these are data.
If you follow this logic (create bytecode from it, can't run w/out interp or compiler), then any .py file is *also* data; i.e., this Does Not Follow.
Sorry, but things don’t work this way. Anything that is *not* a .py file should not land in the python module directories. This is utter abuse of a loophole of the implementation, and I can’t think of another language that allows that. You will not find anything else than perl modules in the perl modules directories. You will not find anything else than C# modules in the Mono modules directory. And so on.
static/{javascript,css,images} -- These are things that are definitely never executed. They are served by the webserver verbatim when their URL is called. These are certainly data. (Note: I don't believe these are referenced using the resources API, just via URL.)
Uh... that would depend entirely on the library or application. But if they're static and the user's got no business tinkering with them, then it's a resource.
No, again it is insane to ship them as resources. These are files that need to be accessed directly by the webserver, and as such they need to be shipped in a data directory, not in a python modules directory.
So... do you agree on which of these are data and which are resources? Do you have an idea on how we can prevent application and framework writers from misusing the resources API to load things that are data?
Apparently not. The alternative I would suggest is that under the new standard, an install tool should be allowed to relocate any non-Python files, and all access has to go through a resource API. The install tool would then have to be responsible for putting some kind of forwarding information in the package directory to tell the resource API where it squirrelled the file(s) off to. Then we can avoid all this angels-on-a-pin argument and the distros can Have It Their Way[tm].
I don’t understand why you want to make it so complicated. All you need is a way to specify directories where different kinds of files land and a simple API to retrieve the file names/contents. Then, you can ship the files at the place you like in eggs, and we can ship the files at the standard places in our packages. Cheers, -- .''`. : :' : We are debian.org. Lower your prices, surrender your code. `. `' We will add your hardware and software distinctiveness to `- our own. Resistance is futile.
Josselin Mouette wrote:
I don’t understand why you want to make it so complicated. All you need is a way to specify directories where different kinds of files land and a simple API to retrieve the file names/contents. Then, you can ship the files at the place you like in eggs, and we can ship the files at the standard places in our packages.
Yes, I don't understand all this complication and concepts either. I have not seen any reason not to do like autoconf. It is simple, obvious, has no burden for the developer, has been used by thousand of softwares for more than one decade, and is extremely flexible. The only thing needed in addition to autoconf is a python API to retrieve the location of each kind of files (python, data, doc, etc...) so that the package itself can find it independently of the way it was installed. cheers, David
David Cournapeau
Yes, I don't understand all this complication and concepts either. I have not seen any reason not to do like autoconf. It is simple, obvious, has no burden for the developer, has been used by thousand of softwares for more than one decade, and is extremely flexible.
I'm not sure exactly what you mean by “do like autoconf”. Can you please describe exactly what behaviour you are envisaging, so that we don't all have a different interpretation of what “like autoconf” means? -- \ “Broken promises don't upset me. I just think, why did they | `\ believe me?” —Jack Handey | _o__) | Ben Finney
Ben Finney wrote:
I'm not sure exactly what you mean by “do like autoconf”. Can you please describe exactly what behaviour you are envisaging, so that we don't all have a different interpretation of what “like autoconf” means ?
Not autoconf, but the whole autotools suite (I think you need to use automake at least to do what I have in mind). The idea is deceptively simple: when you use ./configure, it gives you many options wrt paths: --bindir=DIR user executables [EPREFIX/bin] --sbindir=DIR system admin executables [EPREFIX/sbin] --libexecdir=DIR program executables [EPREFIX/libexec] --sysconfdir=DIR read-only single-machine data [PREFIX/etc] --sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com] --localstatedir=DIR modifiable single-machine data [PREFIX/var] --libdir=DIR object code libraries [EPREFIX/lib] --includedir=DIR C header files [PREFIX/include] --oldincludedir=DIR C header files for non-gcc [/usr/include] --datarootdir=DIR read-only arch.-independent data root [PREFIX/share] --datadir=DIR read-only architecture-independent data [DATAROOTDIR] --infodir=DIR info documentation [DATAROOTDIR/info] --localedir=DIR locale-dependent data [DATAROOTDIR/locale] --mandir=DIR man documentation [DATAROOTDIR/man] --docdir=DIR documentation root [DATAROOTDIR/doc/libsndfile] --htmldir=DIR html documentation [DOCDIR] --dvidir=DIR dvi documentation [DOCDIR] --pdfdir=DIR pdf documentation [DOCDIR] --psdir=DIR ps documentation [DOCDIR] By default, they are laid out more or less like the FHS (and again, the detail do not matter: you can change them), BUT, it is extremely flexible. The only burden for the developer is to say which files go in which category, which he has to do anyway. You could think that because it is autoconf, it is not flexible for other OSes (windows), but with this, you can have extremely different ways of packaging, which are 100 % against the FHS spirit. I gave the gobolinux example, which does package every software in its own directory, a bit like a super-stow if you know stow. The thing is, I am sure the majority of softwares developers never think about this for their own software, still it is possible: it really come for free for the developer. And for the distributor, it is extremely easy (and self-documented). You could have different defaults on different platforms (windows comes to mind). cheers, David
-On [20081002 13:20], Josselin Mouette (joss@debian.org) wrote:
Locale data should be shipped in standard form, in /usr/share/locale. Again, Python is not the only thing you need to think of.
Linux is not the only thing to think of by stating /usr/share/locale. :P
I guess ${PREFIX}/share/locale would be better. But still, for .mo files I
seriously do not see a problem if they're within an egg or something
similar. Imagine also the possibility of providing specific language
translations for a project as an egg (kind of like a plugin).
--
Jeroen Ruigrok van der Werven
Le jeudi 02 octobre 2008 à 13:25 +0200, Jeroen Ruigrok van der Werven a écrit :
-On [20081002 13:20], Josselin Mouette (joss@debian.org) wrote:
Locale data should be shipped in standard form, in /usr/share/locale. Again, Python is not the only thing you need to think of.
Linux is not the only thing to think of by stating /usr/share/locale. :P I guess ${PREFIX}/share/locale would be better. But still, for .mo files I seriously do not see a problem if they're within an egg or something similar. Imagine also the possibility of providing specific language translations for a project as an egg (kind of like a plugin).
There’s definitely no problem with shipping them in an egg, as long as it is possible to ship them at standard locations. Cheers, -- .''`. : :' : We are debian.org. Lower your prices, surrender your code. `. `' We will add your hardware and software distinctiveness to `- our own. Resistance is futile.
-On [20081002 13:29], Josselin Mouette (joss@debian.org) wrote:
There’s definitely no problem with shipping them in an egg, as long as it is possible to ship them at standard locations.
Standard according to whom though?
Contrary to many Linux systems, for example, the BSDs tend to install Python
files to /usr/local or /usr/pkg (if using pkgsrc) and not /usr. I'm sure
/opt is also still widely used.
Furthermore, if by standard you mean */share/locale/XX/LC_MESSAGES/, what
good would an egg do at that place?
--
Jeroen Ruigrok van der Werven
Le jeudi 02 octobre 2008 à 13:35 +0200, Jeroen Ruigrok van der Werven a écrit :
-On [20081002 13:29], Josselin Mouette (joss@debian.org) wrote:
There’s definitely no problem with shipping them in an egg, as long as it is possible to ship them at standard locations.
Standard according to whom though?
Contrary to many Linux systems, for example, the BSDs tend to install Python files to /usr/local or /usr/pkg (if using pkgsrc) and not /usr. I'm sure /opt is also still widely used.
That’s not a problem. The only thing that is needed is to be able to select the path at installation time, just like we do with autoconf. BTW, I’m all for making /usr/local the default, since /usr being the default leads to manual installation overwriting packages instead of going to /usr/local.
Furthermore, if by standard you mean */share/locale/XX/LC_MESSAGES/, what good would an egg do at that place?
In the egg distribution, they can go at another place. Again, not a problem as long as we have a flag somewhere that can install them to the place we like. Cheers, -- .''`. : :' : We are debian.org. Lower your prices, surrender your code. `. `' We will add your hardware and software distinctiveness to `- our own. Resistance is futile.
Jeroen Ruigrok van der Werven wrote:
-On [20081002 13:29], Josselin Mouette (joss@debian.org) wrote:
There’s definitely no problem with shipping them in an egg, as long as it is possible to ship them at standard locations.
Standard according to whom though?
I think the idea is to *define* a standard location within the egg. An install tool is then free to extract them out of the egg and put them somewhere else if it wants. There should also be an API for finding them (rather than assuming they're still inside the egg) so that the code that uses them can still find them if they've been moved somewhere platform-specific. -- Greg
At 01:20 PM 10/2/2008 +0200, Josselin Mouette wrote:
Le mercredi 01 octobre 2008 à 19:59 -0400, Phillip J. Eby a écrit :
locale/{message catalogs for various languages} -- These are binary files that contain strings that the user may see when a message is given. These, I think are data.
I lean the other way, since they're not editable.
Locale data should be shipped in standard form, in /usr/share/locale. Again, Python is not the only thing you need to think of. Not only, if you donât use standard gettext catalogs, you are breaking translation tools, but if you donât ship them at the standard location, you are breaking tools like localepurge (or anything that cleans up the filesystem based on standard file locations).
Keep in mind that some platforms have no "locale" directories as such, and thus trying to support multiple locations thereof is a burden for cross-platform app developers, vs. simply treating them as internal resources.
No, you are making the burden heavier for Linux platforms by trying to be more portable. There is no reason why you canât define a locale directory on platforms where it does not exist, but when it exists a standards document defining where files go, you must follow it. Any time you donât follow it, we consider it a serious breakage and we have to patch the code anyway.
We are defining an *extensible* standard by which developers and build tools will be able to tell what files in a Python project's distribution directories are what "kind" of files and how they should be handled on a given platform, and a way for installation tools to let the installed project access any files that the tool relocates. That means that: 1. An FHS-friendly installation tool will be able to put data files with .mo/.po extensions wherever it pleases, and 2. An optional "locale info" metadata namespace could be defined by and for such tools, to allow more precise specification regarding these files, if they can't be identified automatically. (Similar to how there will need to be optional metadata namespaces for things like icons and menus for Windows, Mac, and other desktops.) Is this "making the burden heavier for linux platforms"? Well, it is for creating the installation tool, I suppose. And if you can't detect what needs to be done automatically, you'll have to encourage upstream add the information to their specs. However, that will be true whether the information is "optional" or "required". And non-linux platforms will certainly have their own extensions to deal with as well (e.g. registry on Windows, "system" DLLs, exe manifests, etc.). So I think it's reasonable to put platform-specific burdens on the installation tools for those platforms - the whole point being to not have *one* installation tool team that's got to mentally juggle all platforms' requirements. Defining the *spec* is going to be tough enough as it is.
Sorry, but things donât work this way. Anything that is *not* a .py file should not land in the python module directories. This is utter abuse of a loophole of the implementation, and I canât think of another language that allows that. You will not find anything else than perl modules in the perl modules directories. You will not find anything else than C# modules in the Mono modules directory. And so on.
I don't see how this is remotely relevant to how Python works or how people use (and have used) Python over the decades. Data files in package directories is something I've seen in Python since, oh, 1997 or so. If it were a mistake for Python, I think somebody would've noticed by now. In any case, "should" is irrelevant; these packages exist in the field and if the only reason not to have them is to not annoy Linux packagers, there's not much that can be done about it. I could even agree with you, and it would still make no difference: you'd have to get basically *everyone* to agree with you, and it's just not gonna happen.
Apparently not. The alternative I would suggest is that under the new standard, an install tool should be allowed to relocate any non-Python files, and all access has to go through a resource API. The install tool would then have to be responsible for putting some kind of forwarding information in the package directory to tell the resource API where it squirrelled the file(s) off to. Then we can avoid all this angels-on-a-pin argument and the distros can Have It Their Way[tm].
I donât understand why you want to make it so complicated. All you need is a way to specify directories where different kinds of files land and a simple API to retrieve the file names/contents. Then, you can ship the files at the place you like in eggs, and we can ship the files at the standard places in our packages.
How is that not *exactly* what I said in the paragraph above yours? All I added was that "some kind of forwarding information" be used to implement the "simple API to retrieve the file names/contents". In the case of Linux, of course, symlinks would be an ideal "kind of forwarding information" (at least from a Python developer POV), since it would allow most existing programs to work with no changes -- even ones that don't use the pkg_resources API.
On Thu, Oct 02, 2008 at 10:05:49AM -0400, Phillip J. Eby wrote:
I don't see how this is remotely relevant to how Python works or how people use (and have used) Python over the decades. Data files in package directories is something I've seen in Python since, oh, 1997 or so. If it were a mistake for Python, I think somebody would've noticed by now.
Well, maybe someone just did notice! I don't see why you disagree so strongly with this as it is entirely in line with providing separate source tree locations for differents types of data which you are in favour of. Specifiying that this would be the recommended way and providing an API to deal with it doesn't mean you forbit the old way. You'd still be able to dig around with __file__. Even if any GNU/Linux packagers will try and convince upstreams not to do so. Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org
At 08:08 PM 10/2/2008 +0100, Floris Bruynooghe wrote:
On Thu, Oct 02, 2008 at 10:05:49AM -0400, Phillip J. Eby wrote:
I don't see how this is remotely relevant to how Python works or how people use (and have used) Python over the decades. Data files in package directories is something I've seen in Python since, oh, 1997 or so. If it were a mistake for Python, I think somebody would've noticed by now.
Well, maybe someone just did notice! I don't see why you disagree so strongly with this as it is entirely in line with providing separate source tree locations for differents types of data which you are in favour of.
Specifiying that this would be the recommended way and providing an API to deal with it doesn't mean you forbit the old way. You'd still be able to dig around with __file__. Even if any GNU/Linux packagers will try and convince upstreams not to do so.
We're actually in "violent agreement" here regarding what is to be done at a practical level. Linux packagers want it one way, developers want it another, and both can have what they want... unless they want the other guys to agree their way is the "right" way of doing it. ;-) That is, the Linux packagers appear to be upset and offended by the mere idea that anyone should consider the Python way to be correct, but it's not reasonable to expect people to change their minds, any more than it's reasonable to exepct the Linux packagers to agree the Python way is the right way and that everything they've been doing is wrong. (And if you respond to this idea by thinking that it would be because the Linux packagers are *right*, then you're simply illustrating my point.) Now, on a practical level, if what you are trying to say is that everybody should access all data strictly through some new API created for that purpose, then that's a non-starter for practical reasons, rather than ideological ones. Even setuptools doesn't *require* that people use an API to access their static data, and does its darnedest to support that use case. And, if you want a new spec to actually *replace* setuptools, then you're going to have to be very careful about what setuptools features are dropped, or the spec is unlikely to go from "de jure" to "de facto".
On Thu, Oct 02, 2008 at 01:20:45PM +0200, Josselin Mouette wrote:
Sorry, but things don’t work this way. Anything that is *not* a .py file should not land in the python module directories. This is utter abuse of a loophole of the implementation, and I can’t think of another language that allows that. You will not find anything else than perl modules in the perl modules directories. You will not find anything else than C# modules in the Mono modules directory. And so on.
What about .so of compiled modules? Gaël
On Thu, Oct 02, 2008 at 01:20:45PM +0200, Josselin Mouette wrote:
Sorry, but things don’t work this way. Anything that is *not* a .py file should not land in the python module directories.
So where *should* they go, on platforms where there is no defined place for such things? And how can the code that uses them find them without platform-specific knowledge? -- Greg
Le vendredi 03 octobre 2008 à 15:57 +1300, Greg Ewing a écrit :
On Thu, Oct 02, 2008 at 01:20:45PM +0200, Josselin Mouette wrote:
Sorry, but things don’t work this way. Anything that is *not* a .py file should not land in the python module directories.
So where *should* they go, on platforms where there is no defined place for such things?
If there is no standard place, you can simply define one.
And how can the code that uses them find them without platform-specific knowledge?
With a simple API to locate them. In fact, you don’t even need an API. If you distribute a python program using autoconf, you can simply say: mydata = file (@datadir@ "/blah/blahblah.dat",'r') call the file blah.py.in, and add it to AC_CONFIG_FILES. Autoconf will detect the correct directory depending on the platform. Cheers, -- .''`. : :' : We are debian.org. Lower your prices, surrender your code. `. `' We will add your hardware and software distinctiveness to `- our own. Resistance is futile.
Le mercredi 01 octobre 2008 à 11:00 -0700, Toshio Kuratomi a écrit :
1) The heuristic encourages bad practices. Versions need to be parsed by computer programs (package managers, scripts that maintain repositories, etc). Not all of those are written in python. Having things other than letters and dots in version strings is problematic for these programs. For instance, here's something that setuptools versioning heuristics allow you to do:
foo-1.0rc1 foo-1.0 foo-1.0post1
But here's how rpm would order it: foo-1.0 foo-1.0post1 foo-1.0rc1
In Fedora we have rules for puting non-numeric things in our release tag to work around this:
version: 1.0 , release: 0.1.rc1 version: 1.0 , release: 1 version: 1.0 , release: 2.post1
This is not all inclusive, but you can see, we have to move the alpha portion of the version to the release to ensure that the upgrade path will move forward sensibly.
In Debian, we have the ~ character to handle such versions easily: foo_1.0~rc1 << foo_1.0 << foo_1.0post1 In all cases, this is not something that matters much to us since the version of the tarball in Debian does not need to be the same as that of the upstream one; for example, 1.0rc1 will always be renamed to 1.0~rc1.
2) This is more important but much harder. Something that would really help everyone is having a way of versioning API/ABI. Right now you can specify that you depend on Foo >= 1.0 Foo <= 2.0. But the version numbers don't have meaning until the actual packages are released. If Foo-1.0 and Foo-1.1 don't have compatible API, your numbers are wrong. If Foo-1.0 is succeeded by Foo-2.0 with the same API your numbers are too restrictive. If you lock the versions to only what you've tested: Foo = 1.0 then you're going to have people and distributions that want to use the new version but can't. Some places have good versioning rules:: https://svn.enthought.com/enthought/wiki/EnthoughtVersionNumbers
Other places say they have marketing departments that prevent that One possibility would be to have MyLib1-1.0, MyLib2-1.0, MyLib2-2.0, etc with the version for marketing included in the package name.
Another idea would be to have API information stored in metadata but not in the package name. That way marketing can have a big party for MyLib-2.0 but the API metadata has API_Revision: 32.
Fully agreed. We need to be able to specify an API version one way or the other. The simple way is to set naming rules that postfix the marketing name with such an API version, but I don’t think developers are going to follow that, so it needs a more formalized way to be specified.
I have no love for how pkg_resources implements this (including the API) but the idea of retrieving data files, locales, config files, etc from an API is good. For packages to be coded that conform to the File Hierachy Standard on Linux, the API (and metadata) needs to be more flexible. We need to be able to mark locale, config, and data files in the metadata. The build/install tool needs to be able to install those into the filesystem in the proper places for a Linux distro, an egg, etc. and then we need to be able to call an API to retrieve the specific class of resources or a directory associated with them.
Amen to that. Currently, we find all kind of data and configuration files right into the python package directories, and this must really stop. The python-specific package management tools wouldn’t need to be so complex if you only found python modules in the python modules directory. Cheers, -- .''`. : :' : We are debian.org. Lower your prices, surrender your code. `. `' We will add your hardware and software distinctiveness to `- our own. Resistance is futile.
Toshio Kuratomi wrote:
Another idea would be to have API information stored in metadata but not in the package name. That way marketing can have a big party for MyLib-2.0 but the API metadata has API_Revision: 32.
+1. (I think this idea has been mentioned before too.) We should definitely put a requirement for an API version number in the meta-data PEP. Or should it be an ABI number? -- Dave
On Tue, Sep 23, 2008 at 11:37:53AM +0100, Chris Withers wrote:
Install debian and get back to productive tasks.
This is an almost troll-like answer.
True. But it worked, didn't it? Hopefully the thread will end up with some positive output. :)
See page 35 of the presentation.
If you're too lazy, here's a re-hash:
- there are lots of different operating systems
Agreed.
- even with Linux, there are many different package management systems
I would not link package management systems that strongly to operating systems (or kernels): http://www.debian.org/ports/index#nonlinux
- all the package management systems behave differently and expect packages to be set up differently for them
But there is the Linux Standard Base to bind them.
- expecting package developers to shoulder this burden is unfair and results in various bitrotting repackaged versions of python packages rather than just one up-to-date version maintained by the entity originating the python package
PyPI says it holds 4818 packages. I doubt all of these are actually worth being packaged. Please note that my point of view is not the one of the developer or single-user who wants to try out something easily and quickly, but the one of the developer that has to deploy his software many times in many places and maintain them in production for a long time. I understand easy_install and similar tools make it easy to try something out in one's home directory and I have nothing to say against that. I complain that some advocate this as the One True Way to distribute their code and end up with code that depends on it, thus complicating matters for the ones who have been happily using tools that manage entire systems for years.
- Adobe Photoshop Plugins, Firefox Add-ons, etc do not delegate their management to an OS package manager. Packages are Python's "plugins" and so should get the same type of consistent, cross-platform package management targetted at the application in question, which is Python in this case.
I strongly disagree with this. I guess this is why you may like a Python-specific package management system whereas I never will. To me, there is no such thing as a clear boundary between a "Python subsystem" and the rest of a computing system. A have only one (complex) system, in my case Debian, and want only one tool to manage it. -- Nicolas Chauvat logilab.fr - services en informatique scientifique et gestion de connaissances
On Fri, Sep 26, 2008 at 10:33:22PM +0200, Nicolas Chauvat wrote:
On Tue, Sep 23, 2008 at 11:37:53AM +0100, Chris Withers wrote:
- all the package management systems behave differently and expect packages to be set up differently for them
But there is the Linux Standard Base to bind them.
Hmm, I've never read the packaging sections of the LSB (and maybe I should before making a comment like this), but I've always heard complaints that it is very RPM-centric and ignores DEBs etc. So maybe even the LSB is not quite up to the task of binding them all yet... Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org
Floris Bruynooghe
Hmm, I've never read the packaging sections of the LSB (and maybe I should before making a comment like this)
URL:http://refspecs.linux-foundation.org/LSB_3.2.0/LSB-Core-generic/LSB-Core...
but I've always heard complaints that it is very RPM-centric and ignores DEBs etc.
The LSB core specification does recommend (and specify the format of) RPM packages for distribution; but it doesn't require the system to use RPMs natively, only that such packages should be installable. Note: Supplying an RPM format package is encouraged because it makes systems easier to manage. This specification does not require the implementation to use RPM as the package manager; it only specifies the format of the package file. URL:http://refspecs.linux-foundation.org/LSB_3.2.0/LSB-Core-generic/LSB-Core... So, a Debian system is LSB-compliant if it enables the installation of externally-produced LSB-compliant RPM packages. This is easily done by making an appropriate tool available; I think 'alien' is the one used on a Debian system. The LSB folks have (IMHO as an outside observer) worked quite well with the Debian project and vice versa. A Debian system now has various 'lsb-*' packages available that make it simple to bring a Debian system into compliance with the various specifications of the LSB (note: I don't know what gaps in compliance may remain even with those packages installed). -- \ “Everything is futile.” —Marvin of Borg | `\ | _o__) | Ben Finney
participants (20)
-
Andreas Jung
-
Ben Finney
-
Chris Withers
-
Dave Peterson
-
David Cournapeau
-
Floris Bruynooghe
-
Gael Varoquaux
-
Greg Ewing
-
Ian Bicking
-
Jeff Younker
-
Jeroen Ruigrok van der Werven
-
Jim Fulton
-
Josselin Mouette
-
Matthias Klose
-
Michael
-
Nicolas Chauvat
-
Phillip J. Eby
-
Rick Warner
-
Tarek Ziadé
-
Toshio Kuratomi