Re: [Distutils] PyPI - Evolve our own or reuse existing package systems?
(Since my email was a bit long and wide I'm trying to update the subject when the response is rather focused.) On Aug 15, 2007, at 06:37, Paul Boddie wrote:
Bjørn Stabell wrote: [...] I've been moderately negative about evolving a parallel infrastructure to other package and dependency management systems in the past, and I'm not enthusiastic about things like CPAN or language-specific equivalents. The first thing most people using a GNU/Linux or *BSD distribution are likely to wonder is, "Where are the Python packages in my package selector?"
There are exceptions, of course. Some people may be sufficiently indoctrinated in the ways of Python, which I doubt is the case for a lot of people looking for packages. Others may be working in restricted environments where system package management tools don't really help. And people coming from Perl might wonder where the CPAN equivalent is, but they should also remind themselves what the system provides - they have manpages for Perl, after all. [...] I've read through the text that I've mercilessly cut from this response, and I admire the scope of this effort, but I do wonder whether we couldn't make use of existing projects (as others have noted), and not only at the Python-specific level, especially since the user interface to the "egg" tool seems to strongly resemble other established tools - as you seem to admit in this and later messages, Bjørn. [...] I was thinking of re-using the Debian indexing strategy. It's very simple, perhaps almost quaintly so, but a lot of the problems revealed with the current strategies around PyPI (not exactly mitigated by bizarre tool-related constraints) could be solved by adopting existing well-worn techniques. [...] If I recall correctly, the PEP concerned just "bailed" on the version numbering and dependency management issue, despite seeming to be inspired by Debian or RPM-style syntax. [...] As I've said before, it's arguably best to work with whatever is already there, particularly because of the "interface" issue you mention with non-Python packages. I suppose the apparent lack of an open and widespread package/dependency management system on Windows (and some UNIX flavours) can be used as a justification to write something entirely new, but I imagine that only very specific tools need writing in order to make existing distribution mechanisms work with Windows - there's no need to duplicate existing work from end to end "just because". [...] Agreed. And by adopting existing mechanisms, we can hopefully avoid having to reinvent their feature sets, too.
P.S. Sorry if this sounds a bit negative, but I've been reading the archives of the catalog-sig for a while now, and it's a bit painful reading about how sensitive various projects are to downtime in PyPI, how various workarounds have been devised with accompanying whisper campaigns to tell people where unofficial mirrors are, all whilst the business of package distribution continues uninterrupted in numerous other communities.
If I had a critical need to get Python packages directly from their authors to run on a Windows machine, for example, I'd want to know how to do so via a Debian package channel or something like that. This isn't original thought: I'm sure that Ximian Red Carpet and Red Hat Network address many related issues.
There seems to be two issues: 1) Should Python have its own package management system (with dependencies etc) in parallel with what's already on many platforms (at least Linux and OS X)? Anyone that has worked with two parallel package management systems knows that dependencies are hellish. * If you mix and match you often end up with two of everything. * It'll be incomplete because you can't easily specify dependencies to non-Python packages. 2) If we agree Python should have a package management system, should we build or repurpose some other one? * I think it's a matter of pride and proof of concept to have one written in Python. That doesn't mean we can't get ideas from others. * It's also not that hard to do. The prototype I threw up took one weekend + half a day, and consists of about 500 lines of new code. It could be refactored and made smaller, but even if a complete version is ten times the size of that, it's still not a huge undertaking. * With a Python version we could relatively easily innovate beyond what traditional packaging systems do; ports and apt are pretty much stagnated. I think RubyGems seems to have some cool features, features that probably wouldn't have happened if they were using ports or apt-get (but then they could piggyback on innovations in those tools, I guess). If it works for them, why shouldn't it work for us? * It would have to be as portable as Python is; many packaging systems are by nature relatively platform-specific. * If we don't build our own, doesn't that mean we throw out eggs? * Packaging systems are useful for mega frameworks like Zope, TurboGears, and Django, and slightly less so for projects you roll on your own, to manage distribution and installation of plugins and addons. Relying on platform-specific packaging systems for these may not work that well. (But I could be wrong about that.) That said, it might be possible to do some kind of hybrid, for PyPI to be a "meta package" repository that can easily feed into platform specific packaging systems. And to perhaps also have a client-side "meta package manager" that will call upon the platform-specific package manager to install stuff. It looks like, for example, ports have targets to build to other systems, e.g., pkg, mpkg, dmg, rpm, srpm, dpkg. So maintaining package information in (or compatible with) ports could make it easy to feed packages into other package systems. * Benefit: We're working with other package systems, just making it easier to get Python packages into them. * Drawback: They may not want to include all packages, at the speed at which we want, or the way we want to. (I.e., there may still be packages you'd want that are only available on PyPI.) * Drawback: Some systems don't have package systems. Which brings me to: If we're just distributing source files why don't we use a source control system such as svn, bzr, or hg? The package developers have trunk, PyPI is a branch, the platform-specific package maintainers have a branch, and what's installed onto your system is in the end a branch (serially connected). Some systems, like Subversion, can also include externals like I did with cliutils on the egg package. Just a thought. Rgds, Bjorn
Hei Bjørn :)
These are some interesting points you are making.
I have in fact been developing a general software deployment system,
Conduithttp://conduit.simula.no, in Python for some time, that is
capable of supporting several major platforms (at the moment: Linux,
Windows and OS X). It's not reached any widespread use, but we (at the
Simula Research Laboratory) are using it to distribute
software to students attending some courses at the university of Oslo.
Right now we are in the middle of preparing for the semester start, which is
next week.
The system is designed to be general, both with regard to the target
platform and the deployable software. I've solved the distribution problem
by constructing an XML-RPC Web service, that serves information about
software projects in RDF (based on the DOAP format). This distribution
service is general and independent of the installation system, which acts as
a client of the latter.
If this sounds interesting to you I'd love if you checked it out and gave me
some feedback. It is an experimental project, and as such we are definitely
interested in ideas/help from others.
Arve
On 8/15/07, Bjørn Stabell
(Since my email was a bit long and wide I'm trying to update the subject when the response is rather focused.)
On Aug 15, 2007, at 06:37, Paul Boddie wrote:
Bjørn Stabell wrote: [...] I've been moderately negative about evolving a parallel infrastructure to other package and dependency management systems in the past, and I'm not enthusiastic about things like CPAN or language-specific equivalents. The first thing most people using a GNU/Linux or *BSD distribution are likely to wonder is, "Where are the Python packages in my package selector?"
There are exceptions, of course. Some people may be sufficiently indoctrinated in the ways of Python, which I doubt is the case for a lot of people looking for packages. Others may be working in restricted environments where system package management tools don't really help. And people coming from Perl might wonder where the CPAN equivalent is, but they should also remind themselves what the system provides - they have manpages for Perl, after all. [...] I've read through the text that I've mercilessly cut from this response, and I admire the scope of this effort, but I do wonder whether we couldn't make use of existing projects (as others have noted), and not only at the Python-specific level, especially since the user interface to the "egg" tool seems to strongly resemble other established tools - as you seem to admit in this and later messages, Bjørn. [...] I was thinking of re-using the Debian indexing strategy. It's very simple, perhaps almost quaintly so, but a lot of the problems revealed with the current strategies around PyPI (not exactly mitigated by bizarre tool-related constraints) could be solved by adopting existing well-worn techniques. [...] If I recall correctly, the PEP concerned just "bailed" on the version numbering and dependency management issue, despite seeming to be inspired by Debian or RPM-style syntax. [...] As I've said before, it's arguably best to work with whatever is already there, particularly because of the "interface" issue you mention with non-Python packages. I suppose the apparent lack of an open and widespread package/dependency management system on Windows (and some UNIX flavours) can be used as a justification to write something entirely new, but I imagine that only very specific tools need writing in order to make existing distribution mechanisms work with Windows - there's no need to duplicate existing work from end to end "just because". [...] Agreed. And by adopting existing mechanisms, we can hopefully avoid having to reinvent their feature sets, too.
P.S. Sorry if this sounds a bit negative, but I've been reading the archives of the catalog-sig for a while now, and it's a bit painful reading about how sensitive various projects are to downtime in PyPI, how various workarounds have been devised with accompanying whisper campaigns to tell people where unofficial mirrors are, all whilst the business of package distribution continues uninterrupted in numerous other communities.
If I had a critical need to get Python packages directly from their authors to run on a Windows machine, for example, I'd want to know how to do so via a Debian package channel or something like that. This isn't original thought: I'm sure that Ximian Red Carpet and Red Hat Network address many related issues.
There seems to be two issues:
1) Should Python have its own package management system (with dependencies etc) in parallel with what's already on many platforms (at least Linux and OS X)? Anyone that has worked with two parallel package management systems knows that dependencies are hellish.
* If you mix and match you often end up with two of everything.
* It'll be incomplete because you can't easily specify dependencies to non-Python packages.
2) If we agree Python should have a package management system, should we build or repurpose some other one?
* I think it's a matter of pride and proof of concept to have one written in Python. That doesn't mean we can't get ideas from others.
* It's also not that hard to do. The prototype I threw up took one weekend + half a day, and consists of about 500 lines of new code. It could be refactored and made smaller, but even if a complete version is ten times the size of that, it's still not a huge undertaking.
* With a Python version we could relatively easily innovate beyond what traditional packaging systems do; ports and apt are pretty much stagnated. I think RubyGems seems to have some cool features, features that probably wouldn't have happened if they were using ports or apt-get (but then they could piggyback on innovations in those tools, I guess). If it works for them, why shouldn't it work for us?
* It would have to be as portable as Python is; many packaging systems are by nature relatively platform-specific.
* If we don't build our own, doesn't that mean we throw out eggs?
* Packaging systems are useful for mega frameworks like Zope, TurboGears, and Django, and slightly less so for projects you roll on your own, to manage distribution and installation of plugins and addons. Relying on platform-specific packaging systems for these may not work that well. (But I could be wrong about that.)
That said, it might be possible to do some kind of hybrid, for PyPI to be a "meta package" repository that can easily feed into platform specific packaging systems. And to perhaps also have a client-side "meta package manager" that will call upon the platform-specific package manager to install stuff.
It looks like, for example, ports have targets to build to other systems, e.g., pkg, mpkg, dmg, rpm, srpm, dpkg. So maintaining package information in (or compatible with) ports could make it easy to feed packages into other package systems.
* Benefit: We're working with other package systems, just making it easier to get Python packages into them.
* Drawback: They may not want to include all packages, at the speed at which we want, or the way we want to. (I.e., there may still be packages you'd want that are only available on PyPI.)
* Drawback: Some systems don't have package systems.
Which brings me to: If we're just distributing source files why don't we use a source control system such as svn, bzr, or hg? The package developers have trunk, PyPI is a branch, the platform-specific package maintainers have a branch, and what's installed onto your system is in the end a branch (serially connected). Some systems, like Subversion, can also include externals like I did with cliutils on the egg package. Just a thought.
Rgds, Bjorn _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
On Aug 15, 2007, at 21:34, Arve Knudsen wrote:
These are some interesting points you are making. I have in fact been developing a general software deployment system, Conduit, in Python for some time, that is capable of supporting several major platforms (at the moment: Linux, Windows and OS X). It's not reached any widespread use, but we (at the Simula Research Laboratory) are using it to distribute software to students attending some courses at the university of Oslo. Right now we are in the middle of preparing for the semester start, which is next week.
The system is designed to be general, both with regard to the target platform and the deployable software. I've solved the distribution problem by constructing an XML-RPC Web service, that serves information about software projects in RDF (based on the DOAP format). This distribution service is general and independent of the installation system, which acts as a client of the latter.
If this sounds interesting to you I'd love if you checked it out and gave me some feedback. It is an experimental project, and as such we are definitely interested in ideas/help from others.
Hi Arve, That's an interesting coincidence! :) Without turning it into a big research project, it would be interesting to hear what you (honestly) thought were the strengths and weaknesses of Conduit compared to, say, deb/rpm/ports/emerge, whichever you have experience with. I did download and look at Conduit, but haven't tried it yet. There are so many ways to take this, and so many "strategic" decisions that I'd hope people on the list could help out with. Personally I think it would be great if we had a strong Python-based central package system, perhaps based on Conduit. I'm pretty sure Conduit would have to have the client and server-side components even more clearly separated, though, and the interface between them open and clearly defined (which I think it is, but it would have to be discussed). I see Conduit (and PyPI) supports DOAP, and looking around I also found http://python.codezoo.com/ by O'Reilly; it also seems to have a few good ideas, for example voting and some quality control (although that's a very difficult decision, I guess). Rgds, Bjorn
On 8/14/07, Bjørn Stabell
There seems to be two issues:
1) Should Python have its own package management system (with dependencies etc) in parallel with what's already on many platforms (at least Linux and OS X)? Anyone that has worked with two parallel package management systems knows that dependencies are hellish.
* If you mix and match you often end up with two of everything.
* It'll be incomplete because you can't easily specify dependencies to non-Python packages.
On that second bullet, tools like 'buildout' seem better equipped for handling those situations. Yesterday I saw a `buildout.cfg` for building and testing `lxml` against fresh downloads and builds of libxml2 and libxslt. It downloaded and built those two things locally before getting, building, and installing the `lxml` egg locally. Platform package management terrifies me. I work in Python far more than I work in a particular operating system (even though our office is pretty much Mac OS X and FreeBSD based). It's very easy for our servers to get stuck at a particular FreeBSD version, while the ports move on. Eventually they get so out of date that ports are pretty much unusable.
2) If we agree Python should have a package management system, should we build or repurpose some other one?
[snip]
* With a Python version we could relatively easily innovate beyond what traditional packaging systems do; ports and apt are pretty much stagnated. I think RubyGems seems to have some cool features, features that probably wouldn't have happened if they were using ports or apt-get (but then they could piggyback on innovations in those tools, I guess). If it works for them, why shouldn't it work for us?
I agree.
* It would have to be as portable as Python is; many packaging systems are by nature relatively platform-specific.
You could change "have to" to "gets to" there. :). This is a big plus -- I know how easy_install and `gem` work as I use them far more frequently on both my desktop and various servers than any platform-specific packaging system.
* Packaging systems are useful for mega frameworks like Zope, TurboGears, and Django, and slightly less so for projects you roll on your own, to manage distribution and installation of plugins and addons. Relying on platform-specific packaging systems for these may not work that well. (But I could be wrong about that.)
Personally, I think packaging systems are worse here. But I just may be a control freak... And I've had the luxury of Zope being a big self contained package for quite some time. Now that it's breaking into smaller pieces, it gets a bit more complex, but the combination of `setuptools` and `buildout` seem to be doing their jobs admirably. Relatively admirably. Once you have Ruby and Gems, Ruby on Rails installs with just one line:: gem install rails --include-dependencies I think Pylons and/or Turbogears does just about the same..? It's been a while since I looked at either of them. But that one line is a lot easier to work with than:: If running Debian, run ``apt-get ....`` If running RedHat or RPM system, .... If running Mac OS X with MacPorts, run ... If running ... then ...
That said, it might be possible to do some kind of hybrid, for PyPI to be a "meta package" repository that can easily feed into platform specific packaging systems. And to perhaps also have a client-side "meta package manager" that will call upon the platform-specific package manager to install stuff.
For my own experience, that sounds worse. However, it would be nice if 'egg' could detect that certain things were installed by a non-egg system (ie, having `py-sqlite` from MacPorts) and not installing it. This goes into a deeper frustration I've had in the past: I installed MySQL on my desktop (Mac OS X) using a disk image / .pkg installer downloaded from MySQL's web site. Then I think I tried installing a python package from MacPorts (maybe just the mysql bindings?) that had a MySQL dependency. It didn't detect that I already had MySQL installed, and MacPorts then tried installing it on its own. At that point, I stopped using ports for just about anything Python related, aside from getting Python and Py-Readline. It was easier to use easy_install or regular distutils and the like. The dependencies were met, but not advertised in a way that was friendly to the packaging system in question.
It looks like, for example, ports have targets to build to other systems, e.g., pkg, mpkg, dmg, rpm, srpm, dpkg. So maintaining package information in (or compatible with) ports could make it easy to feed packages into other package systems.
* Benefit: We're working with other package systems, just making it easier to get Python packages into them.
* Drawback: They may not want to include all packages, at the speed at which we want, or the way we want to. (I.e., there may still be packages you'd want that are only available on PyPI.)
Or packages you only want internally. Or packages you don't want available on PyPI because they're very specific to a large framework/toolkit like Zope 3.
* Drawback: Some systems don't have package systems.
And some administrators don't use them beyond (maybe) initially setting up the system. I also don't know how well those package systems deal with concepts like local-installs. Not just local to a single user, but local to a single package. `zc.buildout` is good about this, almost to a fault. There is a rough balance there between desktop and personal machine global-install ease of use and being able to set up fine-tuned self-contained setups. Anyways, I'd vote pure-python. Even on the most barren of machines, it's relatively easy to build and install Python from source. Even on a fairly old installation, it's easy to build an install a new version of Python from source - probably far easier than wrestling with the package manager about updating its database and then updating package after package after package after package that one doesn't want. I think that Python should be all that you need in order to get other Python packages. `easy_install` pretty much gives us this today. There are improvements I'd love to see - reports of what I have installed, what's active, what's taking precedence in my environment, etc. Your tool may do this, I haven't had time to look yet. Ruby's 'gem' command does this beautifully. And I hardly ever touch Ruby or gems; it was just very easy to use for the few things I've wanted to try. -- Jeff Shell
Hello there, Jeff Shell wrote:
This goes into a deeper frustration I've had in the past: I installed MySQL on my desktop (Mac OS X) using a disk image / .pkg installer downloaded from MySQL's web site. Then I think I tried installing a python package from MacPorts (maybe just the mysql bindings?) that had a MySQL dependency. It didn't detect that I already had MySQL installed, and MacPorts then tried installing it on its own.
IIRC there are variants in some MacPorts that removed dependencies from their setup.py-like file and used the system ones. It can also be dealt with by creating phantom-packages which provide the virtual "name" mysql-client, for example (which is how it's been done in apt-get repositories, etc, etc.).
Even on the most barren of machines, it's relatively easy to build and install Python from source.
I can agree with that.
Very glad to hear you're interested in my system Bjørn.
On 8/16/07, Bjørn Stabell
On Aug 15, 2007, at 21:34, Arve Knudsen wrote:
These are some interesting points you are making. I have in fact been developing a general software deployment system, Conduit, in Python for some time, that is capable of supporting several major platforms (at the moment: Linux, Windows and OS X). It's not reached any widespread use, but we (at the Simula Research Laboratory) are using it to distribute software to students attending some courses at the university of Oslo. Right now we are in the middle of preparing for the semester start, which is next week.
The system is designed to be general, both with regard to the target platform and the deployable software. I've solved the distribution problem by constructing an XML-RPC Web service, that serves information about software projects in RDF (based on the DOAP format). This distribution service is general and independent of the installation system, which acts as a client of the latter.
If this sounds interesting to you I'd love if you checked it out and gave me some feedback. It is an experimental project, and as such we are definitely interested in ideas/help from others.
Hi Arve,
That's an interesting coincidence! :)
Without turning it into a big research project, it would be interesting to hear what you (honestly) thought were the strengths and weaknesses of Conduit compared to, say, deb/rpm/ports/emerge, whichever you have experience with. I did download and look at Conduit, but haven't tried it yet.
I would say the main difference lies in how Conduit is designed to be a completely general solution for distributing software and deploying it on user's systems, with as loose coupling as possible. You could say that what I am trying to achieve is closer to MacroVision's Install Anywhere / Flexnet Connect than to monolithic package managers such as APT, Emerge etc. The former offers a complete solution to independent providers for letting them deliver software and maintain it (with updates) over time, while the latter is a tightly integrated service which is even used to implement operating systems (e.g. Debian, Gentoo). Conduit tries to offer the best of both worlds by building a central software portal from independent project representations. The idea is that software providers maintain their own profile within the portal service, and associate with this a number of projects which are described in RDF (an extension of the DOAP vocabulary). The portal service accumulates these data, and expose them to installation agents via a public XML-RPC API. I've written a framework for Conduit agents that currently supports installing on Linux, Windows (XP/Vista) and OS X. I find it a great strength to be able to offer a common installation system for all three platforms, but the weakness is that generally it doesn't integrate as well with the operating systems as native installers do. On Windows at least I plan to piggy-back on the native installation service (Windows Installer), to achieve better integration without having to reinvent the wheel. On Linux it is worse since there is no well-defined native installation service, but instead a bunch of different packaging systems which overlap with my own deployment model (specification of dependencies etc.). There are so many ways to take this, and so many "strategic"
decisions that I'd hope people on the list could help out with.
Personally I think it would be great if we had a strong Python-based central package system, perhaps based on Conduit. I'm pretty sure Conduit would have to have the client and server-side components even more clearly separated, though, and the interface between them open and clearly defined (which I think it is, but it would have to be discussed).
The client and server components should be clearly separated as-is, but the server API should definitely be reviewed and properly defined. Conduit-specific support exists on the server as extensions (namespace "conduit") of the RDF vocabulary. I see Conduit (and PyPI) supports DOAP, and looking around I also
found http://python.codezoo.com/ by O'Reilly; it also seems to have a few good ideas, for example voting and some quality control (although that's a very difficult decision, I guess).
CodeZoo is a very interesting initiative. I let myself inspire in part by CodeZoo when I started designing Conduit, but mostly by SWED ( http://swed.org.uk) which has a similar model of accumulating decentralized information in RDF for centralized access (via a Web interface). I would actually like Conduit's distribution service to evolve into something with similar functionality to CodeZoo. A rich Web interface for navigating the catalog of software would be awesome (an alternative to sourceforge?). I've also pondered the possibility of user profiles in the portal so that one can keep preferences centrally, for instance as a way to define personal "installation sets" (e.g., after installing a new Linux, restore your previous installations). Arve
participants (4)
-
Arve Knudsen
-
Bjørn Stabell
-
Jeff Shell
-
Luis Bruno