Announcing toydist, improving distribution and packaging situation
(warning, long post) Hi there, As some of you already know, the packaging and distributions of scientific python packages have been a constant source of frustration. Open source is about making it easy for anyone to use software how they see fit, and I think python packaging infrastructure has not been very successfull for people not intimately familiar with python. A few weeks ago, after Guido visited Berkeley and was told how those issues were still there for the scientific community, he wrote an email asking whether current efforts on distutils-sig will be enough (see http://aspn.activestate.com/ASPN/Mail/Message/distutils-sig/3775972). Several of us have been participating to this discussion, but I feel like the divide between current efforts on distutils-sig and us (the SciPy community) is not getting smaller. At best, their efforts will be more work for us to track the new distribute fork, and more likely, it will be all for nothing as it won't solve any deep issue. To be honest, most of what is considered on distutils-sig sounds like anti-goals to me. Instead of keeping up with the frustrating process of "improving" distutils, I think we have enough smart people and manpower in the scientific community to go with our own solution. I am convinced it is doable because R or haskell, with a much smaller community than python, managed to pull out something with is miles ahead compared to pypi. The SciPy community is hopefully big enough so that a SciPy-specific solution may reach critical mass. Ideally, I wish we had something with the following capabilities: - easy to understand tools - http-based package repository ala CRAN, which would be easy to mirror and backup (through rsync-like tools) - decoupling the building, packaging and distribution of code and data - reliable install/uninstall/query of what is installed locally - facilities for building windows/max os x binaries - making the life of OS vendors (Linux, *BSD, etc...) easier The packaging part ============== Speaking is easy, so I started coding part of this toolset, called toydist (temporary name), which I presented at Scipy India a few days ago: http://github.com/cournape/toydist/ Toydist is more or less a rip off of cabal (http://www.haskell.org/cabal/), and consist of three parts: - a core which builds a package description from a declarative file similar to cabal files. The file is almost purely declarative, and can be parsed so that no arbitrary code is executed, thus making it easy to sandbox packages builds (e.g. on a build farm). - a set of command line tools to configure, build, install, build installers (egg only for now) etc... from the declarative file - backward compatibility tools: a tool to convert existing setup.py to the new format has been written, and a tool to use distutils through the new format for backward compatibility with complex distutils extensions should be relatively easy. The core idea is to make the format just rich enough to describe most packages out there, but simple enough so interfacing it with external tools is possible and reliable. As a regular contributor to scons, I am all too aware that a build tool is a very complex beast to get right, and repeating their efforts does not make sense. Typically, I envision that complex packages such as numpy, scipy or matplotlib would use make/waf/scons for the build - in a sense, toydist is written so that writing something like numscons would be easier. OTOH, most if not all scikits should be buildable from a purely declarative file. To give you a feel of the format, here is a snippet for the grin package from Robert K. (automatically converted): Name: grin Version: 1.1.1 Summary: A grep program configured the way I like it. Description: ==== grin ==== I wrote grin to help me search directories full of source code. The venerable GNU grep_ and find_ are great tools, but they fall just a little short for my normal use cases. <snip> License: BSD Platforms: UNKNOWN Classifiers: License :: OSI Approved :: BSD License, Development Status :: 5 - Production/Stable, Environment :: Console, Intended Audience :: Developers, Operating System :: OS Independent, Programming Language :: Python, Topic :: Utilities, ExtraSourceFiles: README.txt, setup.cfg, setup.py, Library: InstallDepends: argparse, Modules: grin, Executable: grin module: grin function: grin_main Executable: grind module: grin function: grind_main Although still very much experimental at this point, toydist already makes some things much easier than with distutils/setuptools: - path customization for any target can be done easily: you can easily add an option in the file so that configure --mynewdir=value works and is accessible at every step. - making packages FHS compliant is not a PITA anymore, and the scheme can be adapted to any OS, be it traditional FHS-like unix, mac os x, windows, etc... - All the options are accessible at every step (no more distutils commands nonsense) - data files can finally be handled correctly and consistently, instead of the 5 or 6 magics methods currently available in distutils/setuptools/numpy.distutils - building eggs does not involve setuptools anymore - not much coupling between package description and build infrastructure (building extensions is actually done through distutils ATM). Repository ======== The goal here is to have something like CRAN (http://cran.r-project.org/web/views/), ideally with a build farm so that whenever anyone submits a package to our repository, it would automatically be checked, and built for windows/mac os x and maybe a few major linux distributions. One could investigate the build service from open suse to that end (http://en.opensuse.org/Build_Service), which is based on xen VM to build installers in a reproducible way. Installed package db =============== I believe that the current open source enstaller package from Enthought can be a good starting point. It is based on eggs, but eggs are only used as a distribution format (eggs are never installed as eggs AFAIK). You can easily remove packages, query installed versions, etc... Since toydist produces eggs, interoperation between toydist and enstaller should not be too difficult. What's next ? ========== At this point, I would like to ask for help and comments, in particular: - Does all this make sense, or hopelessly intractable ? - Besides the points I have mentioned, what else do you think is needed ? - There has already been some work for the scikits webportal, but I think we should bypass pypi entirely (the current philosophy of not enforcing consistent metadata does not make much sense to me, and is at the opposite of most other similar system out there). - I think a build farm for at least windows packages would be a killer feature, and enough incentive to push some people to use our new infrastructure. It would be good to have a windows guy familiar with windows sandboxing/virtualization to do something there. The people working on the opensuse build service have started working on windows support - I think being able to automatically convert most of scientific packages is a significant feature, and needs to be more robust - so anyone is welcomed to try converting existing setup.py with toydist (see toydist readme). thanks, David
David wrote:
Repository ========
The goal here is to have something like CRAN (http://cran.r-project.org/web/views/), ideally with a build farm so that whenever anyone submits a package to our repository, it would automatically be checked, and built for windows/mac os x and maybe a few major linux distributions. One could investigate the build service from open suse to that end (http://en.opensuse.org/Build_Service), which is based on xen VM to build installers in a reproducible way.
Do you here mean automatic generation of Ubuntu debs, Debian debs, Windows MSI installer, Windows EXE installer, and so on? (If so then great!) If this is the goal, I wonder if one looks outside of Python-land one might find something that already does this -- there's a lot of different package format, "Linux meta-distributions", "install everywhere packages" and so on. Of course, toydist could have such any such tool as a backend/in a pipeline.
What's next ? ==========
At this point, I would like to ask for help and comments, in particular: - Does all this make sense, or hopelessly intractable ? - Besides the points I have mentioned, what else do you think is needed ?
Hmm. What I miss is the discussion of other native libraries which the Python libraries need to bundle. Is it assumed that one want to continue linking C and Fortran code directly into Python .so modules, like the scipy library currently does? Let me take CHOLMOD (sparse Cholesky) as an example. - The Python package cvxopt use it, simply by linking about 20 C files directly into the Python-loadable module (.so) which goes into the Python site-packages (or wherever). This makes sure it just works. But, it doesn't feel like the right way at all. - scikits.sparse.cholmod OTOH simple specifies libraries=["cholmod"], and leave it up to the end-user to make sure it is installed. Linux users with root access can simply apt-get, but it is a pain for everybody else (Windows, Mac, non-root Linux). - Currently I'm making a Sage SPKG for CHOLMOD. This essentially gets the job done by not bothering about the problem, not even using the OS-installed Python. Something that would spit out both Sage SPKGs, Ubuntu debs, Windows installers, both with Python code and C/Fortran code or a mix (and put both in the place preferred by the system in question), seems ideal. Of course one would still need to make sure that the code builds properly everywhere, but just solving the distribution part of this would be a huge step ahead. What I'm saying is that this is a software distribution problem in general, and I'm afraid that Python-specific solutions are too narrow. Dag Sverre
On Tue, Dec 29, 2009 at 3:49 AM, Dag Sverre Seljebotn
Do you here mean automatic generation of Ubuntu debs, Debian debs, Windows MSI installer, Windows EXE installer, and so on? (If so then great!)
Yes (although this is not yet implemented). In particular on windows, I want to implement a scheme so that you can convert from eggs to .exe and vice et versa, so people can still install as exe (or msi), even though the method would default to eggs.
If this is the goal, I wonder if one looks outside of Python-land one might find something that already does this -- there's a lot of different package format, "Linux meta-distributions", "install everywhere packages" and so on.
Yes, there are things like 0install or autopackage. I think those are deemed to fail, as long as it is not supported thoroughly by the distribution. Instead, my goal here is much simpler: producing rpm/deb. It does not solve every issue (install by non root, multiple // versions), but one has to be realistic :) I think automatically built rpm/deb, easy integration with native method can solve a lot of issues already.
- Currently I'm making a Sage SPKG for CHOLMOD. This essentially gets the job done by not bothering about the problem, not even using the OS-installed Python.
Something that would spit out both Sage SPKGs, Ubuntu debs, Windows installers, both with Python code and C/Fortran code or a mix (and put both in the place preferred by the system in question), seems ideal. Of course one would still need to make sure that the code builds properly everywhere, but just solving the distribution part of this would be a huge step ahead.
On windows, this issue may be solved using eggs: enstaller has a feature where dll put in a special location of an egg are installed in python such as they are found by the OS loader. One could have mechanisms based on $ORIGIN + rpath on linux to solve this issue for local installs on Linux, etc... But again, one has to be realistic on the goals. With toydist, I want to remove all the pile of magic, hacks built on top of distutils so that people can again hack their own solutions, as it should have been from the start (that's a big plus of python in general). It won't magically solve every issue out there, but it would hopefully help people to make their own. Bundling solutions like SAGE, EPD, etc... are still the most robust ways to deal with those issues in general, and I do not intended to replace those.
What I'm saying is that this is a software distribution problem in general, and I'm afraid that Python-specific solutions are too narrow.
Distribution is a hard problem. Instead of pushing a very narrow (and mostly ill-funded) view of how people should do things like distutils/setuptools/pip/buildout do, I want people to be able to be able to build their own solutions. No more "use this magic stick v 4.0.3.3.14svn1234, trust me it work you don't have to understand" which is too prevalant with those tools, which has always felt deeply unpythonic to me. David
On Tue, Dec 29, 2009 at 3:49 AM, Dag Sverre Seljebotn
wrote: Do you here mean automatic generation of Ubuntu debs, Debian debs, Windows MSI installer, Windows EXE installer, and so on? (If so then great!)
Yes (although this is not yet implemented). In particular on windows, I want to implement a scheme so that you can convert from eggs to .exe and vice et versa, so people can still install as exe (or msi), even though the method would default to eggs.
If this is the goal, I wonder if one looks outside of Python-land one might find something that already does this -- there's a lot of different package format, "Linux meta-distributions", "install everywhere packages" and so on.
Yes, there are things like 0install or autopackage. I think those are deemed to fail, as long as it is not supported thoroughly by the distribution. Instead, my goal here is much simpler: producing rpm/deb. It does not solve every issue (install by non root, multiple // versions), but one has to be realistic :)
I think automatically built rpm/deb, easy integration with native method can solve a lot of issues already.
- Currently I'm making a Sage SPKG for CHOLMOD. This essentially gets the job done by not bothering about the problem, not even using the OS-installed Python.
Something that would spit out both Sage SPKGs, Ubuntu debs, Windows installers, both with Python code and C/Fortran code or a mix (and put both in the place preferred by the system in question), seems ideal. Of course one would still need to make sure that the code builds properly everywhere, but just solving the distribution part of this would be a huge step ahead.
On windows, this issue may be solved using eggs: enstaller has a feature where dll put in a special location of an egg are installed in python such as they are found by the OS loader. One could have mechanisms based on $ORIGIN + rpath on linux to solve this issue for local installs on Linux, etc...
But again, one has to be realistic on the goals. With toydist, I want to remove all the pile of magic, hacks built on top of distutils so that people can again hack their own solutions, as it should have been from the start (that's a big plus of python in general). It won't magically solve every issue out there, but it would hopefully help people to make their own.
Bundling solutions like SAGE, EPD, etc... are still the most robust ways to deal with those issues in general, and I do not intended to replace those.
What I'm saying is that this is a software distribution problem in general, and I'm afraid that Python-specific solutions are too narrow.
Distribution is a hard problem. Instead of pushing a very narrow (and mostly ill-funded) view of how people should do things like distutils/setuptools/pip/buildout do, I want people to be able to be able to build their own solutions. No more "use this magic stick v 4.0.3.3.14svn1234, trust me it work you don't have to understand" which is too prevalant with those tools, which has always felt deeply unpythonic to me.
Thanks, this cleared things up, and I like the direction this is heading. Thanks a lot for doing this! Dag Sverre
Hi, In the toydist proposal/release notes, I would address 'what does toydist do better' more explicitly. **** A big problem for science users is that numpy does not work with pypi + (easy_install, buildout or pip) and python 2.6. **** Working with the rest of the python community as much as possible is likely a good goal. At least getting numpy to work with the latest tools would be great. An interesting read is the history of python packaging here: http://faassen.n--tree.net/blog/view/weblog/2009/11/09/0 Buildout is what a lot of the python community are using now. Getting numpy to work nicely with buildout and pip would be a good start. numpy used to work with buildout in python2.5, but not with 2.6. buildout lets other team members get up to speed with a project by running one command. It installs things in the local directory, not system wide. So you can have different dependencies per project. Plenty of good work is going on with python packaging. Lots of the python community are not using compiled packages however, so the requirements are different. There are a lot of people (thousands) working with the python packaging system, improving it, and building tools around it. Distribute for example has many committers, as do buildout/pip. eg, there are fifty or so buildout plugins, which people use to customise their builds( see the buildout recipe list on pypi at http://pypi.python.org/pypi?:action=browse&show=all&c=512 ). There are build farms for windows packages and OSX uploaded to pypi. Start uploading pre releases to pypi, and you get these for free (once you make numpy compile out of the box on those compile farms). There are compile farms for other OSes too... like ubuntu/debian, macports etc. Some distributions even automatically download, compile and package new releases once they spot a new file on your ftp/web site. Speeding up the release cycle to be continuous can let people take advantage of these tools built together. If you get your tests running after the build step, all of these distributions also turn into test farms :) pypm: http://pypm.activestate.com/list-n.html#numpy ubuntu PPA: https://launchpad.net/ubuntu/+ppas the snakebite project : http://www.snakebite.org/ (seems mostly dead... but they have a lot of hardware) suse build service: https://build.opensuse.org/ pony-build: http://wiki.github.com/ctb/pony-build zope, and pygame also have their own build/test farms. They are two other compiled python packages projects. As do a number of other python projects(eg twisted...). Projects like pony-build should hopefully make it easier for people to run their own build farms, independently of the main projects. You just really need a script to: (download, build, test, post results), and then post a link to your mailing list... and someone will be able to run a build farm. Documentation projects are being worked on to document, give tutorials and make python packaging be easier all round. As witnessed by 20 or so releases on pypi every day(and growing), lots of people are using the python packaging tools successfully. Documenting how people can make numpy addon libraries(plugins) would encourage people to do so. Currently there is no documentation from the numpy community, or encouragement to do so. This combined with numpy being broken with python2.6+pypi will result in less science related packages. There is still a whole magnitude of people not releasing on pypi though, there are thousands of projects on the pygame.org website that are not on the pypi website for example. There are likely many hundreds or thousands of scientific projects not listed on their either. Given all of these projects not on pypi, obviously things could be improved. The pygame.org website also shows that community specific websites are very helpful. A science view of pypi would make it much more useful - so people don't have to look through web/game/database etc packages. Here is a view of 535 science/engineering related packages on pypi now: http://pypi.python.org/pypi?:action=browse&c=385 458 science/research packages on pypi: http://pypi.python.org/pypi?:action=browse&show=all&c=40 So there are already hundreds of science related packages and hundreds of people making those science related packages for pypi. Not too bad. Distribution of Applications is another issue that needs improving. That is so that people can share applications without needing to install a whole bunch of things. Think about sending applications to your grandma. Do you ask her to download python, grab these libraries, do this... do that. It would be much better if you could give her a url, and away you go! Bug tracking, and diff tracking between distributions is an area where many projects can improve. Searching through the distributions bug trackers, and diffs to apply to the core dramatically helps packages getting updated. So does maintaining good communication with different distribution packagers. I'm not sure making a separate build tool is a good idea. I think going with the rest of the python community, and improving the tools there is a better idea. cheers, pps. some notes on toydist itself. - toydist convert is cool for people converting a setup.py . This means that most people can try out toydist right away. but what does it gain these people who convert their setup.py files? - a toydist convert that generates a setup.py file might be cool :) It could also generate a Makefile and a configure script :) - arbitrary code execution happens when building or testing with toydist. However the source packaging part does not with toydist. Compiling, running and testing the code happens most of the time anyway, so moving the sandboxing to the OS is more useful as are reviews, trust and reputation of different packages. - it should be possible to build this toydist functionality as a distutils/distribute/buildout extension. - extending toydist? How are extensions made? there are 175 buildout packages which extend buildout, and many that extend distutils/setuptools - so extension of build tools in a necessary thing. - scripting builds in python for python developers is easier than scripting a different new language.
On Tue, Dec 29, 2009 at 10:27 PM, René Dudfield
Hi,
In the toydist proposal/release notes, I would address 'what does toydist do better' more explicitly.
**** A big problem for science users is that numpy does not work with pypi + (easy_install, buildout or pip) and python 2.6. ****
Working with the rest of the python community as much as possible is likely a good goal.
Yes, but it is hopeless. Most of what is being discussed on distutils-sig is useless for us, and what matters is ignored at best. I think most people on distutils-sig are misguided, and I don't think the community is representative of people concerned with packaging anyway - most of the participants seem to be around web development, and are mostly dismissive of other's concerns (OS packagers, etc...). I want to note that I am not starting this out of thin air - I know most of distutils code very well, I have been the mostly sole maintainer of numpy.distutils for 2 years now. I have written extensive distutils extensions, in particular numscons which is able to fully build numpy, scipy and matplotlib on every platform that matters. Simply put, distutils code is horrible (this is an objective fact) and flawed beyond repair (this is more controversial). IMHO, it has almost no useful feature, except being standard. If you want a more detailed explanation of why I think distutils and all tools on top are deeply flawed, you can look here: http://cournape.wordpress.com/2009/04/01/python-packaging-a-few-observations...
numpy used to work with buildout in python2.5, but not with 2.6. buildout lets other team members get up to speed with a project by running one command. It installs things in the local directory, not system wide. So you can have different dependencies per project.
I don't think it is a very useful feature, honestly. It seems to me that they created a huge infrastructure to split packages into tiny pieces, and then try to get them back together, imaganing that multiple installed versions is a replacement for backward compatibility. Anyone with extensive packaging experience knows that's a deeply flawed model in general.
Plenty of good work is going on with python packaging.
That's the opposite of my experience. What I care about is: - tools which are hackable and easily extensible - robust install/uninstall - real, DAG-based build system - explicit and repeatability None of this is supported by the tools, and the current directions go even further away. When I have to explain at length why the command-based design of distutils is a nightmare to work with, I don't feel very confident that the current maintainers are aware of the issues, for example. It shows that they never had to extend distutils much.
There are build farms for windows packages and OSX uploaded to pypi. Start uploading pre releases to pypi, and you get these for free (once you make numpy compile out of the box on those compile farms). There are compile farms for other OSes too... like ubuntu/debian, macports etc. Some distributions even automatically download, compile and package new releases once they spot a new file on your ftp/web site.
I am familiar with some of those systems (PPA and opensuse build service in particular). One of the goal of my proposal is to make it easier to interoperate with those tools. I think Pypi is mostly useless. The lack of enforced metadata is a big no-no IMHO. The fact that Pypi is miles beyond CRAN for example is quite significant. I want CRAN for scientific python, and I don't see Pypi becoming it in the near future. The point of having our own Pypi-like server is that we could do the following: - enforcing metadata - making it easy to extend the service to support our needs
It is interesting to note that one of the maintainer of pypm has recently quitted the discussion about Pypi, most likely out of frustration from the other participants.
Documentation projects are being worked on to document, give tutorials and make python packaging be easier all round. As witnessed by 20 or so releases on pypi every day(and growing), lots of people are using the python packaging tools successfully.
This does not mean much IMO. Uploading on Pypi is almost required to use virtualenv, buildout, etc.. An interesting metric is not how many packages are uploaded, but how much it is used outside developers.
I'm not sure making a separate build tool is a good idea. I think going with the rest of the python community, and improving the tools there is a better idea.
It has been tried, and IMHO has been proved to have failed. You can look at the recent discussion (the one started by Guido in particular).
pps. some notes on toydist itself. - toydist convert is cool for people converting a setup.py . This means that most people can try out toydist right away. but what does it gain these people who convert their setup.py files?
Not much ATM, except that it is easier to write a toysetup.info compared to setup.py IMO, and that it supports a simple way to include data files (something which is currently *impossible* to do without writing your own distutils extensions). It has also the ability to build eggs without using setuptools (I consider not using setuptools a feature, given the too many failure modes of this package). The main goals though are to make it easier to build your own tools on top of if, and to integrate with real build systems.
- a toydist convert that generates a setup.py file might be cool :)
toydist started like this, actually: you would write a setup.py file which loads the package from toysetup.info, and can be converted to a dict argument to distutils.core.setup. I have not updated it recently, but that's definitely on the TODO list for a first alpha, as it would enable people to benefit from the format, with 100 % backward compatibility with distutils.
- arbitrary code execution happens when building or testing with toydist.
You are right for testing, but wrong for building. As long as the build is entirely driven by toysetup.info, you only have to trust toydist (which is not safe ATM, but that's an implementation detail), and your build tools of course. Obviously, if you have a package which uses an external build tool on top of toysetup.info (as will be required for numpy itself for example), all bets are off. But I think that's a tiny fraction of the interesting packages for scientific computing. Sandboxing is particularly an issue on windows - I don't know a good solution for windows sandboxing, outside of full vms, which are heavy-weights.
- it should be possible to build this toydist functionality as a distutils/distribute/buildout extension.
No, it cannot, at least as far as distutils/distribute are concerned (I know nothing about buildout). Extending distutils is horrible, and fragile in general. Even autotools with its mix of generated sh scripts through m4 and perl is a breeze compared to distutils.
- extending toydist? How are extensions made? there are 175 buildout packages which extend buildout, and many that extend distutils/setuptools - so extension of build tools in a necessary thing.
See my answer earlier about interoperation with build tools. cheers, David
hello again,
On Tue, Dec 29, 2009 at 2:22 PM, David Cournapeau
On Tue, Dec 29, 2009 at 10:27 PM, René Dudfield
wrote: Hi,
In the toydist proposal/release notes, I would address 'what does toydist do better' more explicitly.
**** A big problem for science users is that numpy does not work with pypi + (easy_install, buildout or pip) and python 2.6. ****
Working with the rest of the python community as much as possible is likely a good goal.
Yes, but it is hopeless. Most of what is being discussed on distutils-sig is useless for us, and what matters is ignored at best. I think most people on distutils-sig are misguided, and I don't think the community is representative of people concerned with packaging anyway - most of the participants seem to be around web development, and are mostly dismissive of other's concerns (OS packagers, etc...).
Sitting down with Tarek(who is one of the current distutils maintainers) in Berlin we had a little discussion about packaging over pizza and beer... and he was quite mindful of OS packagers problems and issues. He was also interested to hear about game developers issues with packaging (which are different again to scientific users... but similar in many ways). However these systems were developed by the zope/plone/web crowd, so they are naturally going to be thinking a lot about zope/plone/web issues. Debian, and ubuntu packages for them are mostly useless because of the age. Waiting a couple of years for your package to be released is just not an option (waiting even an hour for bug fixes is sometimes not an option). Also isolation of packages is needed for machines that have 100s of different applications running, written by different people, each with dozens of packages used by each application. Tools like checkinstall and stdeb ( http://pypi.python.org/pypi/stdeb/ ) can help with older style packaging systems like deb/rpm. I think perhaps if toydist included something like stdeb as not an extension to distutils, but a standalone tool (like toydist) there would be less problems with it. One thing the various zope related communities do is make sure all the relevant and needed packages are built/tested by their compile farms. This makes pypi work for them a lot better than a non-coordinated effort does. There are also lots of people trying out new versions all of the time.
I want to note that I am not starting this out of thin air - I know most of distutils code very well, I have been the mostly sole maintainer of numpy.distutils for 2 years now. I have written extensive distutils extensions, in particular numscons which is able to fully build numpy, scipy and matplotlib on every platform that matters.
Simply put, distutils code is horrible (this is an objective fact) and flawed beyond repair (this is more controversial). IMHO, it has almost no useful feature, except being standard.
yes, I have also battled with distutils over the years. However it is simpler than autotools (for me... maybe distutils has perverted my fragile mind), and works on more platforms for python than any other current system. It is much worse for C/C++ modules though. It needs dependency, and configuration tools for it to work better (like what many C/C++ projects hack into distutils themselves). Monkey patching, and extensions are especially a problem... as is the horrible code quality of distutils by modern standards. However distutils has had more tests and testing systems added, so that refactoring/cleaning up of distutils can happen more so.
If you want a more detailed explanation of why I think distutils and all tools on top are deeply flawed, you can look here:
http://cournape.wordpress.com/2009/04/01/python-packaging-a-few-observations...
I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil! Leave my python site-packages directory alone I say... especially don't let setuptools infect it :) Many people currently find the multi versions of packages in isolation approach works well for them - so for some use cases the tools are working wonderfully.
numpy used to work with buildout in python2.5, but not with 2.6. buildout lets other team members get up to speed with a project by running one command. It installs things in the local directory, not system wide. So you can have different dependencies per project.
I don't think it is a very useful feature, honestly. It seems to me that they created a huge infrastructure to split packages into tiny pieces, and then try to get them back together, imaganing that multiple installed versions is a replacement for backward compatibility. Anyone with extensive packaging experience knows that's a deeply flawed model in general.
Science is supposed to allow repeatability. Without the same versions of packages, repeating experiments is harder. This is a big problem in science that multiple versions of packages in _isolation_ can help get to a solution to the repeatability problem. Just pick some random paper and try to reproduce their results. It's generally very hard, unless the software is quite well packaged. Especially for graphics related papers, there are often many different types of environments, so setting up the environments to try out their techniques, and verify results quickly is difficult. Multiple versions are not a replacement for backwards compatibility, just a way to avoid the problem in the short term to avoid being blocked. If a new package version breaks your app, then you can either pin it to an old version, fix your app, or fix the package. It is also not a replacement for building on stable high quality components, but helps you work with less stable, and less high quality components - at a much faster rate of change, with a much larger dependency list.
Plenty of good work is going on with python packaging.
That's the opposite of my experience. What I care about is: - tools which are hackable and easily extensible - robust install/uninstall - real, DAG-based build system - explicit and repeatability
None of this is supported by the tools, and the current directions go even further away. When I have to explain at length why the command-based design of distutils is a nightmare to work with, I don't feel very confident that the current maintainers are aware of the issues, for example. It shows that they never had to extend distutils much.
All agreed! I'd add to the list parallel builds/tests (make -j 16), and outputting to native build systems. eg, xcode, msvc projects, and makefiles. It would interesting to know your thoughts on buildout recipes ( see creating recipes http://www.buildout.org/docs/recipe.html ). They seem to work better from my perspective. However, that is probably because of isolation. The recipe are only used by those projects that require them. So the chance of them interacting are lower, as they are not installed in the main python. How will you handle toydist extensions so that multiple extensions do not have problems with each other? I don't think this is possible without isolation, and even then it's still a problem. Note, the section in the distutils docs on creating command extensions is only around three paragraphs. There is also no central place to go looking for extra commands (that I know of). Or a place to document or share each others command extensions. Many of the methods for extending distutils are not very well documented either. For example, 'how do I you change compiler command line arguments for certain source files?' Basic things like that are possible with disutils, but not documented (very well).
There are build farms for windows packages and OSX uploaded to pypi. Start uploading pre releases to pypi, and you get these for free (once you make numpy compile out of the box on those compile farms). There are compile farms for other OSes too... like ubuntu/debian, macports etc. Some distributions even automatically download, compile and package new releases once they spot a new file on your ftp/web site.
I am familiar with some of those systems (PPA and opensuse build service in particular). One of the goal of my proposal is to make it easier to interoperate with those tools.
yeah, cool.
I think Pypi is mostly useless. The lack of enforced metadata is a big no-no IMHO. The fact that Pypi is miles beyond CRAN for example is quite significant. I want CRAN for scientific python, and I don't see Pypi becoming it in the near future.
The point of having our own Pypi-like server is that we could do the following: - enforcing metadata - making it easy to extend the service to support our needs
Yeah, cool. Many other projects have their own servers too. pygame.org, plone, etc etc, which meet their own needs. Patches are accepted for pypi btw. What type of enforcements of meta data, and how would they help? I imagine this could be done in a number of ways to pypi. - a distutils command extension that people could use. - change pypi source code. - check the metadata for certain packages, then email their authors telling them about issues.
It is interesting to note that one of the maintainer of pypm has recently quitted the discussion about Pypi, most likely out of frustration from the other participants.
yeah, big mailing list discussions hardly ever help I think :) oops, this is turning into one.
Documentation projects are being worked on to document, give tutorials and make python packaging be easier all round. As witnessed by 20 or so releases on pypi every day(and growing), lots of people are using the python packaging tools successfully.
This does not mean much IMO. Uploading on Pypi is almost required to use virtualenv, buildout, etc.. An interesting metric is not how many packages are uploaded, but how much it is used outside developers.
Yeah, it only means that there are lots of developers able to use the packaging system to put their own packages up there. However there are over 500 science related packages on there now - which is pretty cool. A way to measure packages being used would be by downloads, and by which packages depend on which other packages. I think the science ones would be reused lower than normal, since a much higher percentage are C/C++ based, and are likely to be more fragile packages.
I'm not sure making a separate build tool is a good idea. I think going with the rest of the python community, and improving the tools there is a better idea.
It has been tried, and IMHO has been proved to have failed. You can look at the recent discussion (the one started by Guido in particular).
I don't think 500+ science related packages is a total failure really.
pps. some notes on toydist itself. - toydist convert is cool for people converting a setup.py . This means that most people can try out toydist right away. but what does it gain these people who convert their setup.py files?
Not much ATM, except that it is easier to write a toysetup.info compared to setup.py IMO, and that it supports a simple way to include data files (something which is currently *impossible* to do without writing your own distutils extensions). It has also the ability to build eggs without using setuptools (I consider not using setuptools a feature, given the too many failure modes of this package).
yeah, I always make setuptools not used in my packages by default. However I use command line arguments to use the features of setuptools required (eggs, bdist_mpkg etc etc). Having a tool to create eggs without setuptools would be great in itself. Definitely list this in the feature list :)
The main goals though are to make it easier to build your own tools on top of if, and to integrate with real build systems.
yeah, cool.
- a toydist convert that generates a setup.py file might be cool :)
toydist started like this, actually: you would write a setup.py file which loads the package from toysetup.info, and can be converted to a dict argument to distutils.core.setup. I have not updated it recently, but that's definitely on the TODO list for a first alpha, as it would enable people to benefit from the format, with 100 % backward compatibility with distutils.
yeah, cool. That would let you develop things incrementally too, and still have toydist be useful for the whole development period until it catches up with the features of distutils needed.
- arbitrary code execution happens when building or testing with toydist.
You are right for testing, but wrong for building. As long as the build is entirely driven by toysetup.info, you only have to trust toydist (which is not safe ATM, but that's an implementation detail), and your build tools of course.
If you execute build tools on arbitrary code, then arbitrary code execution is easy for someone who wants to do bad things. Trust and secondarily sandboxing are the best ways to solve these problems imho.
Obviously, if you have a package which uses an external build tool on top of toysetup.info (as will be required for numpy itself for example), all bets are off. But I think that's a tiny fraction of the interesting packages for scientific computing.
yeah, currently 1/5th of science packages use C/C++/fortran/cython etc (see http://pypi.python.org/pypi?:action=browse&c=40 110/458 on that page ). There seems to be a lot more using C/C++ compared to other types of pakages on there (eg zope3 packages list 0 out of 900 packages using C/C++). So the hight number of C/C++ science related packages on pypi demonstrate that better C/C++ tools for scientific packages is a big need. Especially getting compile/testing farms for all these packages. Getting compile farms is a big need compared to python packages - since C/C++ is MUCH harder to write/test in a portable way. I would say it is close to impossible to get code to work without quite good knowledge on multiple platforms without errors. There are many times with pygame development that I make changes on an osx, windows or linux box, commit the change, then wait for the compile/tests to run on the build farm ( http://thorbrian.com/pygame/builds.php ). Releasing packages otherwise makes the process *heaps* longer... and many times I still get errors on different platforms, despite many years of multi platform coding.
Sandboxing is particularly an issue on windows - I don't know a good solution for windows sandboxing, outside of full vms, which are heavy-weights.
yeah, VMs are the way to go. If only to make the copies a fresh install each time. However I think automated distributed building, and trust are more useful. ie, only build those packages where you trust the authors, and let anyone download, build and then post their build/test results. MS have given out copies of windows to some people to set up VMs for building to different members of the python community in the past. By automated distributed building, I mean what happens with mailing lists usually. Where people post their test results when they have a problem. Except in a more automated manner. Adding a 'Do you want to upload your build/test results?' at the end of a setup.py for subversion builds would give you dozens or hundreds of test results daily from all sorts of machines. Making it easy for people to set up package builders which also upload their packages somewhere gives you distributed package building, in a fairly safe automated manner. (more details here: http://renesd.blogspot.com/2009/09/python-build-bots-down-maybe-they-need.ht... )
- it should be possible to build this toydist functionality as a distutils/distribute/buildout extension.
No, it cannot, at least as far as distutils/distribute are concerned (I know nothing about buildout). Extending distutils is horrible, and fragile in general. Even autotools with its mix of generated sh scripts through m4 and perl is a breeze compared to distutils.
- extending toydist? How are extensions made? there are 175 buildout packages which extend buildout, and many that extend distutils/setuptools - so extension of build tools in a necessary thing.
See my answer earlier about interoperation with build tools.
I'm still not clear on how toydist will be extended. I am however, a lot clearer about its goals. cheers,
On Wednesday 30 December 2009 06:15:45 René Dudfield wrote:
I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil!
You have stated this several times, but is there any evidence that this is the desire of the majority of users? In the scientific community, interactive experimentation is critical and users are typically not seasoned systems administrators. For such users, almost all packages installed after installing python itself are packages they use. In particular, all I want to do is to use apt/yum to get the packages (or ask my sysadmin, who rightfully has no interest in learning the intricacies of python package installation, to do so) and continue with my work. "Packages-in-isolation" is for people whose job is to run server farms, not interactive experimenters.
Leave my python site-packages directory alone I say... especially don't let setuptools infect it :) Many people currently find the multi versions of packages in isolation approach works well for them - so for some use cases the tools are working wonderfully.
More power to them. But for the rest of us, that approach is too much hassle.
Science is supposed to allow repeatability. Without the same versions of packages, repeating experiments is harder.
Really? IME, this is not the case. Simulations in signal processing are typically run with two different kinds of data sets: - random data for Monte Carlo simulations - well-known and widely available test streams In both kinds of data sets, reimplementation of the same algorithms is rarely, if ever, affected by the versions of packages, primarily because of the wide variety of tool sets (and even more versions) that are in use.
This is a big problem in science that multiple versions of packages in _isolation_ can help get to a solution to the repeatability problem.
Package versions are, at worst, a very minor distraction in solving the repeatability problem. Usually, the main issues are unclear descriptions of the algorithms and unstated assumptions.
Just pick some random paper and try to reproduce their results. It's generally very hard, unless the software is quite well packaged.
In scientific experimentation, it is folly to rely on software from the author of some random paper. In signal processing, almost every critical algorithm is re-implemented, and usually in a different language. The only exceptions are when the software can be validated with a large amount of test data, but this very rare. Usually, you use some package to get started in your current environment. If it works (i.e., results meet your quality metric), you then build on it. If it does not work (even if only due to version incompatibility), you usually jettison it and either find an alternative or rewrite it.
Multiple versions are not a replacement for backwards compatibility, just a way to avoid the problem in the short term to avoid being blocked. If a new package version breaks your app, then you can either pin it to an old version, fix your app, or fix the package. It is also not a replacement for building on stable high quality components, but helps you work with less stable, and less high quality components - at a much faster rate of change, with a much larger dependency list.
This is a software engineer + systems administrator solution. In larger institutions, this absolutely unworkable if you rely on IT for package management/installation.
Plenty of good work is going on with python packaging.
That's the opposite of my experience. What I care about is: - tools which are hackable and easily extensible - robust install/uninstall - real, DAG-based build system - explicit and repeatability
None of this is supported by the tools, and the current directions go even further away. When I have to explain at length why the command-based design of distutils is a nightmare to work with, I don't feel very confident that the current maintainers are aware of the issues, for example. It shows that they never had to extend distutils much.
All agreed! I'd add to the list parallel builds/tests (make -j 16), and outputting to native build systems. eg, xcode, msvc projects, and makefiles.
Essentially out of frustration with distutils and setuptools, I have migrated to CMake for pretty much all my build systems (except for a few scons ones I haven't had to touch for a while) since it supports all the features mentioned above. Even dealing with CMake's god-awful "scripting language" is better than dealing with distutils. I am very happy to see David C's efforts to finally get away from distutils, but I am worried that a cross-platform build system that has all the features that he wants is simply beyond the scope of 1-2 people unless they work on it full time for a year or two.
yeah, currently 1/5th of science packages use C/C++/fortran/cython etc (see http://pypi.python.org/pypi?:action=browse&c=40 110/458 on that page ). There seems to be a lot more using C/C++ compared to other types of pakages on there (eg zope3 packages list 0 out of 900 packages using C/C++).
So the hight number of C/C++ science related packages on pypi demonstrate that better C/C++ tools for scientific packages is a big need. Especially getting compile/testing farms for all these packages. Getting compile farms is a big need compared to python packages - since C/C++ is MUCH harder to write/test in a portable way. I would say it is close to impossible to get code to work without quite good knowledge on multiple platforms without errors.
Not sure that that is quite true. C++ is not a very popular language around here, but the combination of boost+Qt+python+scipy+hdf5+h5py has made virtually all of my platform-specific code vanish (with the exception of some platform-specific stuff in my CMake scripts). Regards, Ravi
On Wed, Dec 30, 2009 at 9:26 AM, Ravi
On Wednesday 30 December 2009 06:15:45 René Dudfield wrote:
I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil!
I don't think this is an appropriate analogy, and hyperbolic statements like "threads are evil!" are unlikely to persuade a scientific audience.
You have stated this several times, but is there any evidence that this is the desire of the majority of users? In the scientific community, interactive experimentation is critical and users are typically not seasoned systems administrators. For such users, almost all packages installed after installing python itself are packages they use. In particular, all I want to do is to use apt/yum to get the packages (or ask my sysadmin, who rightfully has no interest in learning the intricacies of python package installation, to do so) and continue with my work. "Packages-in-isolation" is for people whose job is to run server farms, not interactive experimenters.
I agree.
Leave my python site-packages directory alone I say... especially don't let setuptools infect it :)
There are already mechanisms in place for this. "python setup.py install --user" or "easy_install --prefix=/usr/local" for example. Darren
On Wed, Dec 30, 2009 at 2:26 PM, Ravi
On Wednesday 30 December 2009 06:15:45 René Dudfield wrote:
I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil!
You have stated this several times, but is there any evidence that this is the desire of the majority of users? In the scientific community, interactive experimentation is critical and users are typically not seasoned systems administrators. For such users, almost all packages installed after installing python itself are packages they use. In particular, all I want to do is to use apt/yum to get the packages (or ask my sysadmin, who rightfully has no interest in learning the intricacies of python package installation, to do so) and continue with my work. "Packages-in-isolation" is for people whose job is to run server farms, not interactive experimenters.
500+ packages on pypi. Provide a counter point, otherwise the evidence is against your position - overwhelmingly.
René Dudfield wrote:
On Wed, Dec 30, 2009 at 2:26 PM, Ravi
wrote: On Wednesday 30 December 2009 06:15:45 René Dudfield wrote:
I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil! You have stated this several times, but is there any evidence that this is the desire of the majority of users? In the scientific community, interactive experimentation is critical and users are typically not seasoned systems administrators. For such users, almost all packages installed after installing python itself are packages they use. In particular, all I want to do is to use apt/yum to get the packages (or ask my sysadmin, who rightfully has no interest in learning the intricacies of python package installation, to do so) and continue with my work. "Packages-in-isolation" is for people whose job is to run server farms, not interactive experimenters.
500+ packages on pypi. Provide a counter point, otherwise the evidence is against your position - overwhelmingly.
?!? Wouldn't you need to measure the number of downloads (and also figure out something else to measure that relative to)? Uploading something to PyPI is easy enough to do and probably done by default by a lot of package authors -- that doesn't mean that it is the main distribution method. -- Dag Sverre
On Wed, Dec 30, 2009 at 1:47 PM, René Dudfield
On Wed, Dec 30, 2009 at 2:26 PM, Ravi
wrote: On Wednesday 30 December 2009 06:15:45 René Dudfield wrote:
I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil!
You have stated this several times, but is there any evidence that this is the desire of the majority of users? In the scientific community, interactive experimentation is critical and users are typically not seasoned systems administrators. For such users, almost all packages installed after installing python itself are packages they use. In particular, all I want to do is to use apt/yum to get the packages (or ask my sysadmin, who rightfully has no interest in learning the intricacies of python package installation, to do so) and continue with my work. "Packages-in-isolation" is for people whose job is to run server farms, not interactive experimenters.
500+ packages on pypi. Provide a counter point, otherwise the evidence is against your position - overwhelmingly.
packages on pypi has no implication whether I have only one version of it or several in different isolated environment. Actually, I have only 1 version for almost all packages that I use, and those are on the python path, (and many of them are easy_installed from pypi, or exe installed). The only ones I have in several versions are the ones I work on myself, or where I track a repository. Josef
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Dec 30, 2009 at 12:47, René Dudfield
On Wed, Dec 30, 2009 at 2:26 PM, Ravi
wrote: On Wednesday 30 December 2009 06:15:45 René Dudfield wrote:
I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil!
You have stated this several times, but is there any evidence that this is the desire of the majority of users? In the scientific community, interactive experimentation is critical and users are typically not seasoned systems administrators. For such users, almost all packages installed after installing python itself are packages they use. In particular, all I want to do is to use apt/yum to get the packages (or ask my sysadmin, who rightfully has no interest in learning the intricacies of python package installation, to do so) and continue with my work. "Packages-in-isolation" is for people whose job is to run server farms, not interactive experimenters.
500+ packages on pypi. Provide a counter point, otherwise the evidence is against your position - overwhelmingly.
Linux distributions, which are much, much more popular than any collection of packages on PyPI you might care to name. Isolated environments have their uses, but they are the exception, not the rule. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Wed, Dec 30, 2009 at 7:08 PM, Robert Kern
On Wed, Dec 30, 2009 at 12:47, René Dudfield
wrote: On Wed, Dec 30, 2009 at 2:26 PM, Ravi
wrote: On Wednesday 30 December 2009 06:15:45 René Dudfield wrote:
I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil!
You have stated this several times, but is there any evidence that this is the desire of the majority of users? In the scientific community, interactive experimentation is critical and users are typically not seasoned systems administrators. For such users, almost all packages installed after installing python itself are packages they use. In particular, all I want to do is to use apt/yum to get the packages (or ask my sysadmin, who rightfully has no interest in learning the intricacies of python package installation, to do so) and continue with my work. "Packages-in-isolation" is for people whose job is to run server farms, not interactive experimenters.
500+ packages on pypi. Provide a counter point, otherwise the evidence is against your position - overwhelmingly.
Linux distributions, which are much, much more popular than any collection of packages on PyPI you might care to name. Isolated environments have their uses, but they are the exception, not the rule.
wrong. pypi has way more python packages than any linux distribution. 8500+ listed, compared to how many in debian?
On Wed, Dec 30, 2009 at 11:10 AM, René Dudfield
On Wed, Dec 30, 2009 at 7:08 PM, Robert Kern
wrote: On Wed, Dec 30, 2009 at 12:47, René Dudfield
wrote: On Wed, Dec 30, 2009 at 2:26 PM, Ravi
wrote: On Wednesday 30 December 2009 06:15:45 René Dudfield wrote:
I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil!
You have stated this several times, but is there any evidence that this is the desire of the majority of users? In the scientific community, interactive experimentation is critical and users are typically not seasoned systems administrators. For such users, almost all packages installed after installing python itself are packages they use. In particular, all I want to do is to use apt/yum to get the packages (or ask my sysadmin, who rightfully has no interest in learning the intricacies of python package installation, to do so) and continue with my work. "Packages-in-isolation" is for people whose job is to run server farms, not interactive experimenters.
500+ packages on pypi. Provide a counter point, otherwise the evidence is against your position - overwhelmingly.
Linux distributions, which are much, much more popular than any collection of packages on PyPI you might care to name. Isolated environments have their uses, but they are the exception, not the rule.
wrong. pypi has way more python packages than any linux distribution. 8500+ listed, compared to how many in debian?
Debian has over 30k packages. But I think he was talking about popularity, not the number of packages.
On Wed, Dec 30, 2009 at 11:13 AM, Keith Goodman
On Wed, Dec 30, 2009 at 11:10 AM, René Dudfield
wrote: On Wed, Dec 30, 2009 at 7:08 PM, Robert Kern
wrote: On Wed, Dec 30, 2009 at 12:47, René Dudfield
wrote: On Wed, Dec 30, 2009 at 2:26 PM, Ravi
wrote: On Wednesday 30 December 2009 06:15:45 René Dudfield wrote:
I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil!
You have stated this several times, but is there any evidence that this is the desire of the majority of users? In the scientific community, interactive experimentation is critical and users are typically not seasoned systems administrators. For such users, almost all packages installed after installing python itself are packages they use. In particular, all I want to do is to use apt/yum to get the packages (or ask my sysadmin, who rightfully has no interest in learning the intricacies of python package installation, to do so) and continue with my work. "Packages-in-isolation" is for people whose job is to run server farms, not interactive experimenters.
500+ packages on pypi. Provide a counter point, otherwise the evidence is against your position - overwhelmingly.
Linux distributions, which are much, much more popular than any collection of packages on PyPI you might care to name. Isolated environments have their uses, but they are the exception, not the rule.
wrong. pypi has way more python packages than any linux distribution. 8500+ listed, compared to how many in debian?
Debian has over 30k packages. But I think he was talking about popularity, not the number of packages.
Oh, 30k is all packages not just python.
On Wed, Dec 30, 2009 at 13:10, René Dudfield
On Wed, Dec 30, 2009 at 7:08 PM, Robert Kern
wrote: On Wed, Dec 30, 2009 at 12:47, René Dudfield
wrote:
500+ packages on pypi. Provide a counter point, otherwise the evidence is against your position - overwhelmingly.
Linux distributions, which are much, much more popular than any collection of packages on PyPI you might care to name. Isolated environments have their uses, but they are the exception, not the rule.
wrong. pypi has way more python packages than any linux distribution. 8500+ listed, compared to how many in debian?
I said "more popular". As in "more users", not "more packages". But if you insist, Debian has ~30000 or so, depending on the architecture and release and how you count. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Thu, Dec 31, 2009 at 3:47 AM, René Dudfield
On Wed, Dec 30, 2009 at 2:26 PM, Ravi
wrote: On Wednesday 30 December 2009 06:15:45 René Dudfield wrote:
I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil!
You have stated this several times, but is there any evidence that this is the desire of the majority of users? In the scientific community, interactive experimentation is critical and users are typically not seasoned systems administrators. For such users, almost all packages installed after installing python itself are packages they use. In particular, all I want to do is to use apt/yum to get the packages (or ask my sysadmin, who rightfully has no interest in learning the intricacies of python package installation, to do so) and continue with my work. "Packages-in-isolation" is for people whose job is to run server farms, not interactive experimenters.
500+ packages on pypi. Provide a counter point, otherwise the evidence is against your position - overwhelmingly.
Number of packages is a useless metric to measure the success of something like pypi. I don't even know why someones would think it is an interesting number. Note that CRAN has several times more packages, and R community is much smaller than python's, if you care about that number. Haskell has ~2000 packages, and hackageDB ("haskell's pypi") is much smaller than python. You really should not try to make the point that Pypi is working for the scipy community: I know there is a bias in conferences and mailing list, but the consensus is vastly toward "pypi does not work very well for us". The issue keeps coming up. I think trying to convince us otherwise is counter productive at best. David
David Cournapeau wrote:
You really should not try to make the point that Pypi is working for the scipy community:
I think the evidence supports that pypi is useful, and therefor better than nothing -- which in no way means it couldn't be much better. My personal experience is that I always try: easy_install whatever first, and when it works, I'm happy and a bit surprised. It virtually never works for more complex packages, particularly on OS-X: - PIL - scipy - matplotlib - netcdf4 - gdal I don't now if wxPython is on PyPi, I've never even tried. many of these fail because of other non-python dependencies. So yes -- we really could use something better! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Dec 31, 2009 at 10:50 AM, Christopher Barker
David Cournapeau wrote:
You really should not try to make the point that Pypi is working for the scipy community:
I think the evidence supports that pypi is useful
Yes - I stressed that it was not working well, for the scipy community. Not that it was not working at all for python
, and therefor better than nothing -- which in no way means it couldn't be much better.
My personal experience is that I always try:
easy_install whatever
first, and when it works, I'm happy and a bit surprised.
To say that you are happy when it works is telling about your low expectations, no ? My main point of disagreement with most discussion on distutils-sig is that I think the lack of robustness is rooted in the way tools are conceived and used, whereas others think it can be fixed by adding more features. This, and the refusal of learning how other communities do it is why I am considering starting with my own solution, David
On Wed, Dec 30, 2009 at 8:15 PM, René Dudfield
Sitting down with Tarek(who is one of the current distutils maintainers) in Berlin we had a little discussion about packaging over pizza and beer... and he was quite mindful of OS packagers problems and issues.
This has been said many times on distutils-sig, but no concrete action has ever been taken in that direction. For example, toydist already supports the FHS better than distutils, and is more flexible. I have tried several times to explain why this matters on distutils-sig, but you then have the peanuts gallery interfering with unrelated nonsense (like it would break windows, as if it could not be implemented independently). Also, retrofitting support for --*dir in distutils would be *very* difficult, unless you are ready to break backward compatibility (there are 6 ways to install data files, and each of them has some corner cases, for example - it is a real pain to support this correctly in the convert command of toydist, and you simply cannot recover missing information to comply with the FHS in every case).
However these systems were developed by the zope/plone/web crowd, so they are naturally going to be thinking a lot about zope/plone/web issues.
Agreed - it is natural that they care about their problems first, that's how it works in open source. What I find difficult is when our concern are constantly dismissed by people who have no clue about our issues - and later claim we are not cooperative.
Debian, and ubuntu packages for them are mostly useless because of the age.
That's where the build farm enters. This is known issue, that's why the build service or PPA exist in the first place.
I think perhaps if toydist included something like stdeb as not an extension to distutils, but a standalone tool (like toydist) there would be less problems with it.
That's pretty much how I intend to do things. Currently, in toydist, you can do something like: from toydist.core import PackageDescription pkg = PackageDescription.from_file("toysetup.info") # pkg now gives you access to metadata, as well as extensions, python modules, etc... I think this gives almost everything that is needed to implement a sdist_dsc command. Contrary to the Distribution class in distutils, this class would not need to be subclassed/monkey-patched by extensions, as it only cares about the description, and is 100 % uncoupled from the build part.
yes, I have also battled with distutils over the years. However it is simpler than autotools (for me... maybe distutils has perverted my fragile mind), and works on more platforms for python than any other current system.
Autotools certainly works on more platforms (windows notwhistanding), if only because python itself is built with autoconf. Distutils simplicity is a trap: it is simpler only if you restrict to what distutils gives you. Don't get me wrong, autotools are horrible, but I have never encountered cases where I had to spend hours to do trivial tasks, as has been the case with distutils. Numpy build system would be much, much easier to implement through autotools, and would be much more reliable.
However distutils has had more tests and testing systems added, so that refactoring/cleaning up of distutils can happen more so.
You can't refactor distutils without breaking backward compatibility, because distutils has no API. The whole implementation is the API. That's one of the fundamental disagreement I and other scipy dev have with current contributors on distutils-sig: the starting point (distutils) and the goal are so far away from each other that getting there step by step is hopeless.
I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil!
I don't find the comparison very helpful (for once, you can share data between processes, whereas virtualenv cannot see each other AFAIK).
Science is supposed to allow repeatability. Without the same versions of packages, repeating experiments is harder. This is a big problem in science that multiple versions of packages in _isolation_ can help get to a solution to the repeatability problem.
I don't think that's true - at least it does not reflect my experience at all. But then, I don't pretend to have an extensive experience either. From most of my discussions at scipy conferences, I know most people are dissatisfied with the current python solutions.
Plenty of good work is going on with python packaging.
That's the opposite of my experience. What I care about is: - tools which are hackable and easily extensible - robust install/uninstall - real, DAG-based build system - explicit and repeatability
None of this is supported by the tools, and the current directions go even further away. When I have to explain at length why the command-based design of distutils is a nightmare to work with, I don't feel very confident that the current maintainers are aware of the issues, for example. It shows that they never had to extend distutils much.
All agreed! I'd add to the list parallel builds/tests (make -j 16), and outputting to native build systems. eg, xcode, msvc projects, and makefiles.
Yep - I got quite far with numscons already. It cannot be used as a general solution, but as a dev tool for my own work on numpy/scipy, it has been a huge time saver, especially given the top notch dependency tracking system. It supports // builds, and I can build full debug builds of scipy < 1 minute on a fast machine. That's a real productivity booster.
How will you handle toydist extensions so that multiple extensions do not have problems with each other? I don't think this is possible without isolation, and even then it's still a problem.
By doing it mostly the Unix way, through protocols and file format, not through API. Good API is hard, but for build tools, it is much, much harder. When talking about extensions, I mostly think about the following: - adding a new compiler/new platform - adding a new installer format - adding a new kind of source file/target (say ctypes extension, cython compilation, etc...) Instead of using classes for compilers/tools, I am considering using python modules for each tool, and each tool would be registered through a source file extension (associate a function to ".c", for example). Actual compilation steps would be done through strings ("$CC ...."). The system would be kept simple, because for complex projects, one should forward all this to a real build system (like waf or scons). There is also the problem of post/pre hooks, adding new steps in toymaker: I have not thought much about this, but I like waf's way of doing it, and it may be applicable. In waf, the main script (called wscript) defines a function for each build step: def configure(): pass def build(): pass .... And undefined functions are considered unmodified. What I know for sure is that the distutils-way of extending through inheritance does not work at all. As soon as two extensions subclass the same base class, you're done.
Yeah, cool. Many other projects have their own servers too. pygame.org, plone, etc etc, which meet their own needs. Patches are accepted for pypi btw.
Yes, but how long before the patch is accepted and deployed ?
What type of enforcements of meta data, and how would they help? I imagine this could be done in a number of ways to pypi. - a distutils command extension that people could use. - change pypi source code. - check the metadata for certain packages, then email their authors telling them about issues.
First, packages with malformed metadata would be rejected, and it would not be possible to register a package without uploading the sources. I simply do not want to publish a package which does not even have a name or a version, for example. The current way of doing things in pypi in insane if you ask me. For example, if you want to install a package with its dependencies, you need to download the package, which may be in another website, and you need to execute setup.py just to know its dependencies. This has so many failures modes, I don't understand how this can seriously be considered, really. Every other system has an index to do this kind of things (curiously, both EPD and pypm have an index as well AFAIK). Again, a typical example of NIH, with inferior solutions implemented in the case of python.
yeah, cool. That would let you develop things incrementally too, and still have toydist be useful for the whole development period until it catches up with the features of distutils needed.
Initially, toydist was started to show that writing something compatible with distutils without being tight to distutils was possible.
If you execute build tools on arbitrary code, then arbitrary code execution is easy for someone who wants to do bad things.
Well, you could surely exploit built tools bugs. But at least, I can query metadata and packages features in a safe way - and this is very useful already (cf my points about being able to query packages metadata in one "query").
and many times I still get errors on different platforms, despite many years of multi platform coding.
Yes, that's a difficult process. We cannot fix this - but having automatically built (and hopefully tested) installers on major platforms would be a significant step in the right direction. That's one of the killer feature of CRAN (whenever you submit a package for CRAN, a windows installer is built, and tested). cheers, David
On Tue, Dec 29, 2009 at 10:27 PM, René Dudfield
Buildout is what a lot of the python community are using now.
I would like to note that buildout is a solution to a problem that I don't care to solve. This issue is particularly difficult to explain to people accustomed with buildout in my experience - I have not found a way to explain it very well yet. Buildout, virtualenv all work by sandboxing from the system python: each of them do not see each other, which may be useful for development, but as a deployment solution to the casual user who may not be familiar with python, it is useless. A scientist who installs numpy, scipy, etc... to try things out want to have everything available in one python interpreter, and does not want to jump to different virtualenvs and whatnot to try different packages. This has strong consequences on how you look at things from a packaging POV: - uninstall is crucial - a package bringing down python is a big no no (this happens way too often when you install things through setuptools) - if something fails, the recovery should be trivial - the person doing the installation may not know much about python - you cannot use sandboxing as a replacement for backward compatibility (that's why I don't care much about all the discussion about versioning - I don't think it is very useful as long as python itself does not support it natively). In the context of ruby, this article makes a similar point: http://www.madstop.com/ruby/ruby_has_a_distribution_problem.html David
On Tue, Dec 29, 2009 at 11:34:44PM +0900, David Cournapeau wrote:
Buildout, virtualenv all work by sandboxing from the system python: each of them do not see each other, which may be useful for development, but as a deployment solution to the casual user who may not be familiar with python, it is useless. A scientist who installs numpy, scipy, etc... to try things out want to have everything available in one python interpreter, and does not want to jump to different virtualenvs and whatnot to try different packages.
I think that you are pointing out a large source of misunderstanding in packaging discussion. People behind setuptools, pip or buildout care to have a working ensemble of packages that deliver an application (often a web application)[1]. You and I, and many scientific developers see libraries as building blocks that need to be assembled by the user, the scientist using them to do new science. Thus the idea of isolation is not something that we can accept, because it means that we are restricting the user to a set of libraries. Our definition of user is not the same as the user targeted by buildout. Our user does not push buttons, but he writes code. However, unlike the developer targeted by buildout and distutils, our user does not want or need to learn about packaging. Trying to make the debate clearer... Gaël [1] I know your position on why simply focusing on sandboxing working ensemble of libraries is not a replacement for backward compatibility, and will only create impossible problems in the long run. While I agree with you, this is not my point here.
On Tue, Dec 29, 2009 at 10:55 AM, Gael Varoquaux
On Tue, Dec 29, 2009 at 11:34:44PM +0900, David Cournapeau wrote:
Buildout, virtualenv all work by sandboxing from the system python: each of them do not see each other, which may be useful for development, but as a deployment solution to the casual user who may not be familiar with python, it is useless. A scientist who installs numpy, scipy, etc... to try things out want to have everything available in one python interpreter, and does not want to jump to different virtualenvs and whatnot to try different packages.
I think that you are pointing out a large source of misunderstanding in packaging discussion. People behind setuptools, pip or buildout care to have a working ensemble of packages that deliver an application (often a web application)[1]. You and I, and many scientific developers see libraries as building blocks that need to be assembled by the user, the scientist using them to do new science. Thus the idea of isolation is not something that we can accept, because it means that we are restricting the user to a set of libraries.
Our definition of user is not the same as the user targeted by buildout. Our user does not push buttons, but he writes code. However, unlike the developer targeted by buildout and distutils, our user does not want or need to learn about packaging.
Trying to make the debate clearer...
I wanted to say the same thing. Pylons during the active development time, required a different combination of versions of several different packages almost every month. virtualenv and pip are the only solutions if you don't want to spend all the time updating. In the last half year, I started to have similar problem with numpy trunk and scipy and the rest, but I hope this will be only temporary, and might not really be a problem for the end user. Additionally for obtaining packages from pypi, I never had problems with pure python packages, or packages that had complete binary installers (eg. wxpython or matplotlib). However, the standard case for scientific packages, with different build dependencies are often a pain. (A nice example that I never tried is http://fenics.org/wiki/Installing_DOLFIN_on_Windows - website doesn't respond but it looks like it takes a week to install a lot of required source packages). On pypm.activestate.com scipy, matplotlib, mayavi all fail, scipy because of missing lapack/blas. That's also a reason why CRAN is nice, because it has automatic platform specific binary installation. And any improvement will be very welcome, especially if we start with a more widespread use of cython. I'm reluctant to use cython in statsmodels, exactly to avoid any build and distribution problems, even though it would be very useful. Josef
Gaël
[1] I know your position on why simply focusing on sandboxing working ensemble of libraries is not a replacement for backward compatibility, and will only create impossible problems in the long run. While I agree with you, this is not my point here. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Dec 29, 2009 at 2:34 PM, David Cournapeau
On Tue, Dec 29, 2009 at 10:27 PM, René Dudfield
wrote: Buildout is what a lot of the python community are using now.
I would like to note that buildout is a solution to a problem that I don't care to solve. This issue is particularly difficult to explain to people accustomed with buildout in my experience - I have not found a way to explain it very well yet.
Hello, The main problem buildout solves is getting developers up to speed very quickly on a project. They should be able to call one command and get dozens of packages, and everything else needed ready to go, completely isolated from the rest of the system. If a project does not want to upgrade to the latest versions of packages, they do not have to. This reduces the dependency problem a lot. As one package does not have to block on waiting for 20 other packages. It makes iterating packages daily, or even hourly to not be a problem - even with dozens of different packages used. This is not theoretical, many projects iterate this quickly, and do not have problems. Backwards compatibility is of course a great thing to keep up... but harder to do with dozens of packages, some of which are third party ones. For example, some people are running pygame applications written 8 years ago that are still running today on the latest versions of pygame. I don't think people in the python world understand API, and ABI compatibility as much as those in the C world. However buildout is a solution to their problem, and allows them to iterate quickly with many participants, on many different projects. Many of these people work on maybe 20-100 different projects at once, and some machines may be running that many applications at once too. So using the system pythons packages is completely out of the question for them.
A scientist who installs numpy, scipy, etc... to try things out want to have everything available in one python interpreter, and does not want to jump to different virtualenvs and whatnot to try different packages.
It is very easy to include a dozen packages in a buildout, so that you have all the packages required. Anyway... here is a skeleton buildout project that uses numpy if anyone wants to have a play. http://renesd.blogspot.com/2009/12/buildout-project-that-uses-numpy.html cheers,
On Wed, Dec 30, 2009 at 3:36 AM, René Dudfield
On Tue, Dec 29, 2009 at 2:34 PM, David Cournapeau
wrote: On Tue, Dec 29, 2009 at 10:27 PM, René Dudfield
wrote: Buildout is what a lot of the python community are using now.
I would like to note that buildout is a solution to a problem that I don't care to solve. This issue is particularly difficult to explain to people accustomed with buildout in my experience - I have not found a way to explain it very well yet.
Hello,
The main problem buildout solves is getting developers up to speed very quickly on a project. They should be able to call one command and get dozens of packages, and everything else needed ready to go, completely isolated from the rest of the system.
If a project does not want to upgrade to the latest versions of packages, they do not have to. This reduces the dependency problem a lot. As one package does not have to block on waiting for 20 other packages. It makes iterating packages daily, or even hourly to not be a problem - even with dozens of different packages used. This is not theoretical, many projects iterate this quickly, and do not have problems.
Backwards compatibility is of course a great thing to keep up... but harder to do with dozens of packages, some of which are third party ones. For example, some people are running pygame applications written 8 years ago that are still running today on the latest versions of pygame. I don't think people in the python world understand API, and ABI compatibility as much as those in the C world.
However buildout is a solution to their problem, and allows them to iterate quickly with many participants, on many different projects. Many of these people work on maybe 20-100 different projects at once, and some machines may be running that many applications at once too. So using the system pythons packages is completely out of the question for them.
This is all great, but I don't care about solving this issue, this is a *developer* issue. I don't mean this is not an important issue, it is just totally out of scope. The developer issues I care about are much more fine-grained (corrent dependency handling between target, toolchain customization, etc...). Note however that hopefully, by simplifying the packaging tools, the problems you see with numpy on 2.6 would be less common. The whole distutils/setuptools/distribute stack is hopelessly intractable, given how messy the code is.
It is very easy to include a dozen packages in a buildout, so that you have all the packages required.
I think there is a confusion - I mostly care about *end users*. People who may not have compilers, who want to be able to easily upgrade one package, etc... David
On Tue, Dec 29, 2009 at 11:20 PM, David Cournapeau
On Wed, Dec 30, 2009 at 3:36 AM, René Dudfield
wrote: On Tue, Dec 29, 2009 at 2:34 PM, David Cournapeau
wrote: On Tue, Dec 29, 2009 at 10:27 PM, René Dudfield
wrote: Buildout is what a lot of the python community are using now.
I would like to note that buildout is a solution to a problem that I don't care to solve. This issue is particularly difficult to explain to people accustomed with buildout in my experience - I have not found a way to explain it very well yet.
Hello,
The main problem buildout solves is getting developers up to speed very quickly on a project. They should be able to call one command and get dozens of packages, and everything else needed ready to go, completely isolated from the rest of the system.
If a project does not want to upgrade to the latest versions of packages, they do not have to. This reduces the dependency problem a lot. As one package does not have to block on waiting for 20 other packages. It makes iterating packages daily, or even hourly to not be a problem - even with dozens of different packages used. This is not theoretical, many projects iterate this quickly, and do not have problems.
Backwards compatibility is of course a great thing to keep up... but harder to do with dozens of packages, some of which are third party ones. For example, some people are running pygame applications written 8 years ago that are still running today on the latest versions of pygame. I don't think people in the python world understand API, and ABI compatibility as much as those in the C world.
However buildout is a solution to their problem, and allows them to iterate quickly with many participants, on many different projects. Many of these people work on maybe 20-100 different projects at once, and some machines may be running that many applications at once too. So using the system pythons packages is completely out of the question for them.
This is all great, but I don't care about solving this issue, this is a *developer* issue. I don't mean this is not an important issue, it is just totally out of scope.
The developer issues I care about are much more fine-grained (corrent dependency handling between target, toolchain customization, etc...). Note however that hopefully, by simplifying the packaging tools, the problems you see with numpy on 2.6 would be less common. The whole distutils/setuptools/distribute stack is hopelessly intractable, given how messy the code is.
The numpy issue is because of the change in package handling for 2.6, which numpy 1.3 was not developed for.
It is very easy to include a dozen packages in a buildout, so that you have all the packages required.
I think there is a confusion - I mostly care about *end users*. People who may not have compilers, who want to be able to easily upgrade one package, etc...
I was just describing the problems that buildout solves(for others). If I have a project that depends on numpy and 12 other packages, I can send it to other people who can get their project up and running fairly quickly (assuming everything installs ok). btw, numpy 1.4 works with buildout! (at least on my ubuntu box) sweet :) cd /tmp/ bzr branch lp:numpybuildout cd numpybuildout/trunk/ python bootstrap.py -d ./bin/buildout ./bin/py
import numpy numpy.__file__ '/tmp/numpybuildout/trunk/eggs/numpy-1.4.0-py2.6-linux-i686.egg/numpy/__init__.pyc'
David Cournapeau wrote:
Buildout, virtualenv all work by sandboxing from the system python: each of them do not see each other, which may be useful for development,
And certain kinds of deployment, like web servers or installed tools.
but as a deployment solution to the casual user who may not be familiar with python, it is useless. A scientist who installs numpy, scipy, etc... to try things out want to have everything available in one python interpreter, and does not want to jump to different virtualenvs and whatnot to try different packages.
Absolutely true -- which is why Python desperately needs package version selection of some sort. I've been tooting this horn on and off for years but never got any interest at all from the core python developers. I see putting packages in with no version like having non-versioned dynamic libraries in a system -- i.e. dll hell. If I have a bunch of stuff running just fine with the various package versions I've installed, but then I start working on something (maybe just testing, maybe something more real) that requires the latest version of a package, I have a few choices: - install the new package and hope I don't break too much - use something like virtualenv, which requires a lot of overhead to setup and use (my evidence is personal, despite working with a team that uses it, somehow I've never gotten around to using for my dev work, even though, in theory, it should be a good solution) - setuptools does supposedly support multiple version installs and selection, but it's ugly and poorly documented enough that I've never figured out how to use it. This has been addressed with a handful of ad-hock solution: wxPython as wxversion.select, and I think PyGTK has something, and who knows what else. It would be really nice to have a standard solution available. Note that the usual response I've gotten is to use py2exe or something to distribute, so you're defining the whole stack. That's good for some things, but not all (though py2app's "alias" bundles are nice), and really pretty worthless for development. Also, many, many packages are a pain to use with py2exe and friends anyway (see my forthcoming other long post...)
- you cannot use sandboxing as a replacement for backward compatibility (that's why I don't care much about all the discussion about versioning - I don't think it is very useful as long as python itself does not support it natively).
could be -- I'd love to have Python support it natively, though wxversion isn't too bad. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Tue, Dec 29, 2009 at 6:34 AM, David Cournapeau
Buildout, virtualenv all work by sandboxing from the system python: each of them do not see each other, which may be useful for development, but as a deployment solution to the casual user who may not be familiar with python, it is useless. A scientist who installs numpy, scipy, etc... to try things out want to have everything available in one python interpreter, and does not want to jump to different virtualenvs and whatnot to try different packages.
What I do -- and documented for people in my lab to do -- is set up one virtualenv in my user account, and use it as my default python. (I 'activate' it from my login scripts.) The advantage of this is that easy_install (or pip) just works, without any hassle about permissions etc. This should be easier, but I think the basic approach is sound. "Integration with the package system" is useless; the advantage of distribution packages is that distributions can provide a single coherent system with consistent version numbers across all packages, etc., and the only way to "integrate" with that is to, well, get the packages into the distribution. On another note, I hope toydist will provide a "source prepare" step, that allows arbitrary code to be run on the source tree. (For, e.g., cython->C conversion, ad-hoc template languages, etc.) IME this is a very common pain point with distutils; there is just no good way to do it, and it has to be supported in the distribution utility in order to get everything right. In particular: -- Generated files should never be written to the source tree itself, but only the build directory -- Building from a source checkout should run the "source prepare" step automatically -- Building a source distribution should also run the "source prepare" step, and stash the results in such a way that when later building the source distribution, this step can be skipped. This is a common requirement for user convenience, and necessary if you want to avoid arbitrary code execution during builds. And if you just set up the distribution util so that the only place you can specify arbitrary code execution is in the "source prepare" step, then even people who know nothing about packaging will automatically get all of the above right. Cheers, -- Nathaniel
On Sun, Jan 03, 2010 at 03:05:54AM -0800, Nathaniel Smith wrote:
What I do -- and documented for people in my lab to do -- is set up one virtualenv in my user account, and use it as my default python. (I 'activate' it from my login scripts.) The advantage of this is that easy_install (or pip) just works, without any hassle about permissions etc. This should be easier, but I think the basic approach is sound. "Integration with the package system" is useless; the advantage of distribution packages is that distributions can provide a single coherent system with consistent version numbers across all packages, etc., and the only way to "integrate" with that is to, well, get the packages into the distribution.
That works because either you use packages that don't have much hard-core compiled dependencies, or these are already installed. Think about installing VTK or ITK this way, even something simpler such as umfpack. I think that you would loose most of your users. In my lab, I do lose users on such packages actually. Beside, what you are describing is possible without package isolation, it is simply the use of a per-user local site-packages, which now semi automatic in python2.6 using the '.local' directory. I do agree that, in a research lab, this is a best practice. Gaël
On Sun, Jan 3, 2010 at 8:05 PM, Nathaniel Smith
On Tue, Dec 29, 2009 at 6:34 AM, David Cournapeau
wrote: Buildout, virtualenv all work by sandboxing from the system python: each of them do not see each other, which may be useful for development, but as a deployment solution to the casual user who may not be familiar with python, it is useless. A scientist who installs numpy, scipy, etc... to try things out want to have everything available in one python interpreter, and does not want to jump to different virtualenvs and whatnot to try different packages.
What I do -- and documented for people in my lab to do -- is set up one virtualenv in my user account, and use it as my default python. (I 'activate' it from my login scripts.) The advantage of this is that easy_install (or pip) just works, without any hassle about permissions etc.
It just works if you happen to be able to build everything from sources. That alone means you ignore the majority of users I intend to target. No other community (except maybe Ruby) push those isolated install solutions as a general deployment solutions. If it were such a great idea, other people would have picked up those solutions.
This should be easier, but I think the basic approach is sound. "Integration with the package system" is useless; the advantage of distribution packages is that distributions can provide a single coherent system with consistent version numbers across all packages, etc., and the only way to "integrate" with that is to, well, get the packages into the distribution.
Another way is to provide our own repository for a few major distributions, with automatically built packages. This is how most open source providers work. Miguel de Icaza explains this well: http://tirania.org/blog/archive/2007/Jan-26.html I hope we will be able to reuse much of the opensuse build service infrastructure.
On another note, I hope toydist will provide a "source prepare" step, that allows arbitrary code to be run on the source tree. (For, e.g., cython->C conversion, ad-hoc template languages, etc.) IME this is a very common pain point with distutils; there is just no good way to do it, and it has to be supported in the distribution utility in order to get everything right. In particular: -- Generated files should never be written to the source tree itself, but only the build directory -- Building from a source checkout should run the "source prepare" step automatically -- Building a source distribution should also run the "source prepare" step, and stash the results in such a way that when later building the source distribution, this step can be skipped. This is a common requirement for user convenience, and necessary if you want to avoid arbitrary code execution during builds.
Build directories are hard to implement right. I don't think toydist will support this directly. IMO, those advanced builds warrant a real build tool - one main goal of toydist is to make integration with waf or scons much easier. Both waf and scons have the concept of a build directory, which should do everything you described. David
On Sun, Jan 3, 2010 at 4:23 AM, David Cournapeau
On Sun, Jan 3, 2010 at 8:05 PM, Nathaniel Smith
wrote: What I do -- and documented for people in my lab to do -- is set up one virtualenv in my user account, and use it as my default python. (I 'activate' it from my login scripts.) The advantage of this is that easy_install (or pip) just works, without any hassle about permissions etc.
It just works if you happen to be able to build everything from sources. That alone means you ignore the majority of users I intend to target.
No other community (except maybe Ruby) push those isolated install solutions as a general deployment solutions. If it were such a great idea, other people would have picked up those solutions.
AFAICT, R works more-or-less identically (once I convinced it to use a per-user library directory); install.packages() builds from source, and doesn't automatically pull in and build random C library dependencies. I'm not advocating the 'every app in its own world' model that virtualenv's designers had min mind, but virtualenv is very useful to give each user their own world. Normally I only use a fraction of virtualenv's power this way, but sometimes it's handy that they've solved the more general problem -- I can easily move my environment out of the way and rebuild if I've done something stupid, or experiment with new python versions in isolation, or whatever. And when you *do* have to reproduce some old environment -- if only to test that the new improved environment gives the same results -- then it's *really* handy.
This should be easier, but I think the basic approach is sound. "Integration with the package system" is useless; the advantage of distribution packages is that distributions can provide a single coherent system with consistent version numbers across all packages, etc., and the only way to "integrate" with that is to, well, get the packages into the distribution.
Another way is to provide our own repository for a few major distributions, with automatically built packages. This is how most open source providers work. Miguel de Icaza explains this well:
http://tirania.org/blog/archive/2007/Jan-26.html
I hope we will be able to reuse much of the opensuse build service infrastructure.
Sure, I'm aware of the opensuse build service, have built third-party packages for my projects, etc. It's a good attempt, but also has a lot of problems, and when talking about scientific software it's totally useless to me :-). First, I don't have root on our compute cluster. Second, even if I did I'd be very leery about installing third-party packages because there is no guarantee that the version numbering will be consistent between the third-party repo and the real distro repo -- suppose that the distro packages 0.1, then the third party packages 0.2, then the distro packages 0.3, will upgrades be seamless? What if the third party screws up the version numbering at some point? Debian has "epochs" to deal with this, but third-parties can't use them and maintain compatibility. What if the person making the third party packages is not an expert on these random distros that they don't even use? Will bug reporting tools work properly? Distros are complicated. Third, while we shouldn't advocate that people screw up backwards compatibility, version skew is a real issue. If I need one version of a package and my lab-mate needs another and we have submissions due tomorrow, then filing bugs is a great idea but not a solution. Fourth, even if we had expert maintainers taking care of all these third-party packages and all my concerns were answered, I couldn't convince our sysadmin of that; he's the one who'd have to clean up if something went wrong we don't have a big budget for overtime. Let's be honest -- scientists, on the whole, suck at IT infrastructure, and small individual packages are not going to be very expertly put together. IMHO any real solution should take this into account, keep them sandboxed from the rest of the system, and focus on providing the most friendly and seamless sandbox possible.
On another note, I hope toydist will provide a "source prepare" step, that allows arbitrary code to be run on the source tree. (For, e.g., cython->C conversion, ad-hoc template languages, etc.) IME this is a very common pain point with distutils; there is just no good way to do it, and it has to be supported in the distribution utility in order to get everything right. In particular: -- Generated files should never be written to the source tree itself, but only the build directory -- Building from a source checkout should run the "source prepare" step automatically -- Building a source distribution should also run the "source prepare" step, and stash the results in such a way that when later building the source distribution, this step can be skipped. This is a common requirement for user convenience, and necessary if you want to avoid arbitrary code execution during builds.
Build directories are hard to implement right. I don't think toydist will support this directly. IMO, those advanced builds warrant a real build tool - one main goal of toydist is to make integration with waf or scons much easier. Both waf and scons have the concept of a build directory, which should do everything you described.
Maybe I was unclear -- proper build directory handling is nice, Cython/Pyrex's distutils integration get it wrong (not their fault, distutils is just impossible to do anything sensible with, as you've said), and I've never found build directories hard to implement (perhaps I'm missing something). But what I'm really talking about is having a "pre-build" step that integrates properly with the source and binary packaging stages, and that's not something waf or scons have any particular support for, AFAIK. -- Nathaniel
On Sun, Jan 3, 2010 at 17:42, Nathaniel Smith
On Sun, Jan 3, 2010 at 4:23 AM, David Cournapeau
wrote: On Sun, Jan 3, 2010 at 8:05 PM, Nathaniel Smith
wrote: What I do -- and documented for people in my lab to do -- is set up one virtualenv in my user account, and use it as my default python. (I 'activate' it from my login scripts.) The advantage of this is that easy_install (or pip) just works, without any hassle about permissions etc.
It just works if you happen to be able to build everything from sources. That alone means you ignore the majority of users I intend to target.
No other community (except maybe Ruby) push those isolated install solutions as a general deployment solutions. If it were such a great idea, other people would have picked up those solutions.
AFAICT, R works more-or-less identically (once I convinced it to use a per-user library directory); install.packages() builds from source, and doesn't automatically pull in and build random C library dependencies.
That's not quite the same. That is the R equivalent of Python's recent per-user site-packages feature (every user get's their own sandbox), not virtualenv (every project gets it's own sandbox). The former feature has a long history in the multiuser UNIX world and is not really controversial. http://www.python.org/dev/peps/pep-0370/ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Mon, Jan 4, 2010 at 8:42 AM, Nathaniel Smith
On Sun, Jan 3, 2010 at 4:23 AM, David Cournapeau
wrote: On Sun, Jan 3, 2010 at 8:05 PM, Nathaniel Smith
wrote: What I do -- and documented for people in my lab to do -- is set up one virtualenv in my user account, and use it as my default python. (I 'activate' it from my login scripts.) The advantage of this is that easy_install (or pip) just works, without any hassle about permissions etc.
It just works if you happen to be able to build everything from sources. That alone means you ignore the majority of users I intend to target.
No other community (except maybe Ruby) push those isolated install solutions as a general deployment solutions. If it were such a great idea, other people would have picked up those solutions.
AFAICT, R works more-or-less identically (once I convinced it to use a per-user library directory); install.packages() builds from source, and doesn't automatically pull in and build random C library dependencies.
As mentioned by Robert, this is different from the usual virtualenv approach. Per-user app installation is certainly a useful (and uncontroversial) feature. And R does support automatically-built binary installers.
Sure, I'm aware of the opensuse build service, have built third-party packages for my projects, etc. It's a good attempt, but also has a lot of problems, and when talking about scientific software it's totally useless to me :-). First, I don't have root on our compute cluster.
True, non-root install is a problem. Nothing *prevents* dpkg to run in non root environment in principle if the packages itself does not require it, but it is not really supported by the tools ATM.
Second, even if I did I'd be very leery about installing third-party packages because there is no guarantee that the version numbering will be consistent between the third-party repo and the real distro repo -- suppose that the distro packages 0.1, then the third party packages 0.2, then the distro packages 0.3, will upgrades be seamless? What if the third party screws up the version numbering at some point? Debian has "epochs" to deal with this, but third-parties can't use them and maintain compatibility.
Actually, at least with .deb-based distributions, this issue has a solution. As packages has their own version in addition to the upstream version, PPA-built packages have their own versions. https://help.launchpad.net/Packaging/PPA/BuildingASourcePackage Of course, this assumes a simple versioning scheme in the first place, instead of the cluster-fck that versioning has became within python packages (again, the scheme used in python is much more complicated than everyone else, and it seems that nobody has ever stopped and thought 5 minutes about the consequences, and whether this complexity was a good idea in the first place).
What if the person making the third party packages is not an expert on these random distros that they don't even use?
I think simple rules/conventions + build farms would solve most issues. The problem is if you allow total flexibility as input, then automatic and simple solutions become impossible. Certainly, PPA and the build service provides for a much better experience than anything pypi has ever given to me.
Third, while we shouldn't advocate that people screw up backwards compatibility, version skew is a real issue. If I need one version of a package and my lab-mate needs another and we have submissions due tomorrow, then filing bugs is a great idea but not a solution.
Nothing prevents you from using virtualenv in that case (I may sound dismissive of those tools, but I am really not. I use them myselves. What I strongly react to is when those are pushed as the de-facto, standard method).
Fourth, even if we had expert maintainers taking care of all these third-party packages and all my concerns were answered, I couldn't convince our sysadmin of that; he's the one who'd have to clean up if something went wrong we don't have a big budget for overtime.
I am not advocating using only packaged, binary installers. I am advocating using them as much as possible where it makes sense - on windows and mac os x in particular. Toydist also aims at making it easier to build, customize installs. Although not yet implemented, --user-like scheme would be quite simple to implement, because toydist installer internally uses autoconf-like directories description (of which --user is a special case). If you need sandboxed installs, customized installs, toydist will not prevent it. It is certainly my intention to make it possible to use virtualenv and co (you already can by building eggs, actually). I hope that by having our own "SciPi", we can actually have a more reliable approach. For example, the static dependency description + mandated metadata would make this much easier and more robust, as there would not be a need to run a setup.py to get the dependencies. If you look at hackageDB (http://hackage.haskell.org/packages/hackage.html), they have a very simple index structure, which makes it easy to download it entirely, and reuse this locally to avoid any internet access.
Let's be honest -- scientists, on the whole, suck at IT infrastructure, and small individual packages are not going to be very expertly put together. IMHO any real solution should take this into account, keep them sandboxed from the rest of the system, and focus on providing the most friendly and seamless sandbox possible.
I agree packages will not always be well put together - but I don't see why this would be worse than the current situation. I also strongly disagree about the sandboxing as the solution of choice. For most users, having only one install of most packages is the typical use-case. Once you start sandboxing, you create artificial barriers between the sandboxes, and this becomes too complicated for most users IMHO.
Maybe I was unclear -- proper build directory handling is nice, Cython/Pyrex's distutils integration get it wrong (not their fault, distutils is just impossible to do anything sensible with, as you've said), and I've never found build directories hard to implement
It is simple if you have a good infrastructure in place (node abstraction, etc...), but that infrastructure is hard to get right.
But what I'm really talking about is having a "pre-build" step that integrates properly with the source and binary packaging stages, and that's not something waf or scons have any particular support for, AFAIK.
Could you explain with a concrete example what a pre-build stage would look like ? I don't think I understand what you want, cheers, David
Nathaniel Smith
On Sun, Jan 3, 2010 at 4:23 AM, David Cournapeau
wrote: Another way is to provide our own repository for a few major distributions, with automatically built packages. This is how most open source providers work. Miguel de Icaza explains this well:
http://tirania.org/blog/archive/2007/Jan-26.html
I hope we will be able to reuse much of the opensuse build service infrastructure.
Sure, I'm aware of the opensuse build service, have built third-party packages for my projects, etc. It's a good attempt, but also has a lot of problems, and when talking about scientific software it's totally useless to me :-). First, I don't have root on our compute cluster.
I use Sage for this very reason, and others use EPD or FEMHub or Python(x,y) for the same reasons. Rolling this into the Python package distribution scheme seems backwards though, since a lot of binary packages that have nothing to do with Python are used as well -- the Python stuff is simply thin wrappers around what should ideally be located in /usr/lib or similar (but are nowadays compiled into the Python extension .so because of distribution problems). To solve the exact problem you (and me) have I think the best solution is to integrate the tools mentioned above with what David is planning (SciPI etc.). Or if that isn't good enough, find generic "userland package manager" that has nothing to do with Python (I'm sure a dozen half-finished ones must have been written but didn't look), finish it, and connect it to SciPI. What David does (I think) is seperate the concerns. This makes the task feasible, and also has the advantage of convenience for the people that *do* want to use Ubuntu, Red Hat or whatever to roll out scientific software on hundreds of clients. Dag Sverre
On Mon, Jan 4, 2010 at 5:48 PM, Dag Sverre Seljebotn
Rolling this into the Python package distribution scheme seems backwards though, since a lot of binary packages that have nothing to do with Python are used as well
Yep, exactly.
To solve the exact problem you (and me) have I think the best solution is to integrate the tools mentioned above with what David is planning (SciPI etc.). Or if that isn't good enough, find generic "userland package manager" that has nothing to do with Python (I'm sure a dozen half-finished ones must have been written but didn't look), finish it, and connect it to SciPI.
You have 0install, autopackage and klik, to cite the ones I know about. I wish people had looked at those before rolling toy solutions to complex problems.
What David does (I think) is seperate the concerns.
Exactly - you've describe this better than I did David
Hi David,
On Mon, Dec 28, 2009 at 9:03 AM, David Cournapeau
Executable: grin module: grin function: grin_main
Executable: grind module: grin function: grind_main
Have you thought at all about operations that are currently performed by post-installation scripts? For example, it might be desirable for the ipython or MayaVi windows installers to create a folder in the Start menu that contains links the the executable and the documentation. This is probably a secondary issue at this point in toydist's development, but I think it is an important feature in the long run. Also, have you considered support for package extras (package variants in Ports, allowing you to specify features that pull in additional dependencies like traits[qt4])? Enthought makes good use of them in ETS, and I think they would be worth keeping. Darren
On Wed, Dec 30, 2009 at 11:26 PM, Darren Dale
Hi David,
On Mon, Dec 28, 2009 at 9:03 AM, David Cournapeau
wrote: Executable: grin module: grin function: grin_main
Executable: grind module: grin function: grind_main
Have you thought at all about operations that are currently performed by post-installation scripts? For example, it might be desirable for the ipython or MayaVi windows installers to create a folder in the Start menu that contains links the the executable and the documentation. This is probably a secondary issue at this point in toydist's development, but I think it is an important feature in the long run.
The main problem I see with post hooks is how to support them in installers. For example, you would have a function which does the post install, and declare it as a post install hook through decorator: @hook.post_install def myfunc(): pass The main issue is how to communicate data - that's a major issue in every build system I know of (scons' solution is ugly: every function takes an env argument, which is basically a giant global variable).
Also, have you considered support for package extras (package variants in Ports, allowing you to specify features that pull in additional dependencies like traits[qt4])? Enthought makes good use of them in ETS, and I think they would be worth keeping.
The declarative format may declare flags as follows: Flag: c_exts Description: Build (optional) C extensions Default: false Library: if flag(c_exts): Extension: foo sources: foo.c And this is automatically available at configure stage. It can be used anywhere in Library, not just for Extension (you could use is within the Requires section). I am considering adding more than Flag (flag are boolean), if it does not make the format too complex. The use case I have in mind is something like: toydist configure --with-lapack-dir=/opt/mkl/lib which I have wished to implement for numpy for ages. David
On Wed, Dec 30, 2009 at 11:26 PM, Darren Dale
Hi David,
On Mon, Dec 28, 2009 at 9:03 AM, David Cournapeau
wrote: Executable: grin module: grin function: grin_main
Executable: grind module: grin function: grind_main
Have you thought at all about operations that are currently performed by post-installation scripts? For example, it might be desirable for the ipython or MayaVi windows installers to create a folder in the Start menu that contains links the the executable and the documentation. This is probably a secondary issue at this point in toydist's development, but I think it is an important feature in the long run.
Also, have you considered support for package extras (package variants in Ports, allowing you to specify features that pull in additional dependencies like traits[qt4])? Enthought makes good use of them in ETS, and I think they would be worth keeping.
Does this example covers what you have in mind ? I am not so familiar with this feature of setuptools: Name: hello Version: 1.0 Library: BuildRequires: paver, sphinx, numpy if os(windows) BuildRequires: pywin32 Packages: hello Extension: hello._bar sources: src/hellomodule.c if os(linux) Extension: hello._linux_backend sources: src/linbackend.c Note that instead of os(os_name), you can use flag(flag_name), where flag are boolean variables which can be user defined: http://github.com/cournape/toydist/blob/master/examples/simples/conditional/... http://github.com/cournape/toydist/blob/master/examples/var_example/toysetup... David
On Wed, Dec 30, 2009 at 11:16 AM, David Cournapeau
On Wed, Dec 30, 2009 at 11:26 PM, Darren Dale
wrote: Hi David,
On Mon, Dec 28, 2009 at 9:03 AM, David Cournapeau
wrote: Executable: grin module: grin function: grin_main
Executable: grind module: grin function: grind_main
Have you thought at all about operations that are currently performed by post-installation scripts? For example, it might be desirable for the ipython or MayaVi windows installers to create a folder in the Start menu that contains links the the executable and the documentation. This is probably a secondary issue at this point in toydist's development, but I think it is an important feature in the long run.
Also, have you considered support for package extras (package variants in Ports, allowing you to specify features that pull in additional dependencies like traits[qt4])? Enthought makes good use of them in ETS, and I think they would be worth keeping.
Does this example covers what you have in mind ? I am not so familiar with this feature of setuptools:
Name: hello Version: 1.0
Library: BuildRequires: paver, sphinx, numpy if os(windows) BuildRequires: pywin32 Packages: hello Extension: hello._bar sources: src/hellomodule.c if os(linux) Extension: hello._linux_backend sources: src/linbackend.c
Note that instead of os(os_name), you can use flag(flag_name), where flag are boolean variables which can be user defined:
http://github.com/cournape/toydist/blob/master/examples/simples/conditional/...
http://github.com/cournape/toydist/blob/master/examples/var_example/toysetup...
I should defer to the description of extras in the setuptools documentation. It is only a few paragraphs long: http://peak.telecommunity.com/DevCenter/setuptools#declaring-extras-optional... Darren
On Thu, Dec 31, 2009 at 6:06 AM, Darren Dale
I should defer to the description of extras in the setuptools documentation. It is only a few paragraphs long:
http://peak.telecommunity.com/DevCenter/setuptools#declaring-extras-optional...
Ok, so there are two issues related to this feature: - supporting variant at the build stage - supporting different variants of the same package in the dependency graph at install time The first issue is definitely supported - I fixed a bug in toydist to support this correctly, and this will be used when converting setuptools-based setup.py which use the features argument. The second issue is more challenging. It complicates the dependency handling quite a bit, and may cause difficult situations to happen at dependency resolution time. This becomes particularly messy if you mix packages you build yourself with packages grabbed from a repository. I wonder if there is a simpler solution which would give a similar feature set. cheers, David
On Sat, Jan 02, 2010 at 11:32:00AM +0900, David Cournapeau wrote:
[snip] - supporting different variants of the same package in the dependency graph at install time
[snip]
The second issue is more challenging. It complicates the dependency handling quite a bit, and may cause difficult situations to happen at dependency resolution time. This becomes particularly messy if you mix packages you build yourself with packages grabbed from a repository. I wonder if there is a simpler solution which would give a similar feature set.
AFAICT, in Debian, the same feature is given via virtual packages: you would have: python-matplotlib python-matplotlib-basemap for instance. It is interesting to note that the same source package may be used to generate both binary, end-user, packages. And happy new year! Gaël
On Sat, Jan 2, 2010 at 4:58 PM, Gael Varoquaux
On Sat, Jan 02, 2010 at 11:32:00AM +0900, David Cournapeau wrote:
[snip] - supporting different variants of the same package in the dependency graph at install time
[snip]
The second issue is more challenging. It complicates the dependency handling quite a bit, and may cause difficult situations to happen at dependency resolution time. This becomes particularly messy if you mix packages you build yourself with packages grabbed from a repository. I wonder if there is a simpler solution which would give a similar feature set.
AFAICT, in Debian, the same feature is given via virtual packages: you would have:
I don't think virtual-packages entirely fix the issue. AFAIK, virtual packages have two uses: - handle dependencies where several packages may resolve one particular dependency in an equivalent way (one good example is LAPACK: both liblapack and libatlas provides the lapack feature) - closer to this discussion, you can build several variants of the same package, and each variant would resolve the dependency on a virtual package handling the commonalities. For example, say we have two numpy packages, one built with lapack (python-numpy-full), the other without (python-numpy-core). What happens when a package foo depends on numpy-full, but numpy-core is installed ? AFAICS, this can only work as long as the set containing every variant can be ordered (in the conventional set ordering sense), and the dependency can be satisfied by the smallest one. cheers, David
participants (11)
-
Christopher Barker
-
Dag Sverre Seljebotn
-
Darren Dale
-
David Cournapeau
-
Gael Varoquaux
-
josef.pktd@gmail.com
-
Keith Goodman
-
Nathaniel Smith
-
Ravi
-
René Dudfield
-
Robert Kern