![](https://secure.gravatar.com/avatar/93e0ac11afc8dce51b428ea47d8fddfa.jpg?s=120&d=mm&r=g)
hello again, On Tue, Dec 29, 2009 at 2:22 PM, David Cournapeau <cournape@gmail.com> wrote:
On Tue, Dec 29, 2009 at 10:27 PM, René Dudfield <renesd@gmail.com> wrote:
Hi,
In the toydist proposal/release notes, I would address 'what does toydist do better' more explicitly.
**** A big problem for science users is that numpy does not work with pypi + (easy_install, buildout or pip) and python 2.6. ****
Working with the rest of the python community as much as possible is likely a good goal.
Yes, but it is hopeless. Most of what is being discussed on distutils-sig is useless for us, and what matters is ignored at best. I think most people on distutils-sig are misguided, and I don't think the community is representative of people concerned with packaging anyway - most of the participants seem to be around web development, and are mostly dismissive of other's concerns (OS packagers, etc...).
Sitting down with Tarek(who is one of the current distutils maintainers) in Berlin we had a little discussion about packaging over pizza and beer... and he was quite mindful of OS packagers problems and issues. He was also interested to hear about game developers issues with packaging (which are different again to scientific users... but similar in many ways). However these systems were developed by the zope/plone/web crowd, so they are naturally going to be thinking a lot about zope/plone/web issues. Debian, and ubuntu packages for them are mostly useless because of the age. Waiting a couple of years for your package to be released is just not an option (waiting even an hour for bug fixes is sometimes not an option). Also isolation of packages is needed for machines that have 100s of different applications running, written by different people, each with dozens of packages used by each application. Tools like checkinstall and stdeb ( http://pypi.python.org/pypi/stdeb/ ) can help with older style packaging systems like deb/rpm. I think perhaps if toydist included something like stdeb as not an extension to distutils, but a standalone tool (like toydist) there would be less problems with it. One thing the various zope related communities do is make sure all the relevant and needed packages are built/tested by their compile farms. This makes pypi work for them a lot better than a non-coordinated effort does. There are also lots of people trying out new versions all of the time.
I want to note that I am not starting this out of thin air - I know most of distutils code very well, I have been the mostly sole maintainer of numpy.distutils for 2 years now. I have written extensive distutils extensions, in particular numscons which is able to fully build numpy, scipy and matplotlib on every platform that matters.
Simply put, distutils code is horrible (this is an objective fact) and flawed beyond repair (this is more controversial). IMHO, it has almost no useful feature, except being standard.
yes, I have also battled with distutils over the years. However it is simpler than autotools (for me... maybe distutils has perverted my fragile mind), and works on more platforms for python than any other current system. It is much worse for C/C++ modules though. It needs dependency, and configuration tools for it to work better (like what many C/C++ projects hack into distutils themselves). Monkey patching, and extensions are especially a problem... as is the horrible code quality of distutils by modern standards. However distutils has had more tests and testing systems added, so that refactoring/cleaning up of distutils can happen more so.
If you want a more detailed explanation of why I think distutils and all tools on top are deeply flawed, you can look here:
http://cournape.wordpress.com/2009/04/01/python-packaging-a-few-observations...
I agree with many things in that post. Except your conclusion on multiple versions of packages in isolation. Package isolation is like processes, and package sharing is like threads - and threads are evil! Leave my python site-packages directory alone I say... especially don't let setuptools infect it :) Many people currently find the multi versions of packages in isolation approach works well for them - so for some use cases the tools are working wonderfully.
numpy used to work with buildout in python2.5, but not with 2.6. buildout lets other team members get up to speed with a project by running one command. It installs things in the local directory, not system wide. So you can have different dependencies per project.
I don't think it is a very useful feature, honestly. It seems to me that they created a huge infrastructure to split packages into tiny pieces, and then try to get them back together, imaganing that multiple installed versions is a replacement for backward compatibility. Anyone with extensive packaging experience knows that's a deeply flawed model in general.
Science is supposed to allow repeatability. Without the same versions of packages, repeating experiments is harder. This is a big problem in science that multiple versions of packages in _isolation_ can help get to a solution to the repeatability problem. Just pick some random paper and try to reproduce their results. It's generally very hard, unless the software is quite well packaged. Especially for graphics related papers, there are often many different types of environments, so setting up the environments to try out their techniques, and verify results quickly is difficult. Multiple versions are not a replacement for backwards compatibility, just a way to avoid the problem in the short term to avoid being blocked. If a new package version breaks your app, then you can either pin it to an old version, fix your app, or fix the package. It is also not a replacement for building on stable high quality components, but helps you work with less stable, and less high quality components - at a much faster rate of change, with a much larger dependency list.
Plenty of good work is going on with python packaging.
That's the opposite of my experience. What I care about is: - tools which are hackable and easily extensible - robust install/uninstall - real, DAG-based build system - explicit and repeatability
None of this is supported by the tools, and the current directions go even further away. When I have to explain at length why the command-based design of distutils is a nightmare to work with, I don't feel very confident that the current maintainers are aware of the issues, for example. It shows that they never had to extend distutils much.
All agreed! I'd add to the list parallel builds/tests (make -j 16), and outputting to native build systems. eg, xcode, msvc projects, and makefiles. It would interesting to know your thoughts on buildout recipes ( see creating recipes http://www.buildout.org/docs/recipe.html ). They seem to work better from my perspective. However, that is probably because of isolation. The recipe are only used by those projects that require them. So the chance of them interacting are lower, as they are not installed in the main python. How will you handle toydist extensions so that multiple extensions do not have problems with each other? I don't think this is possible without isolation, and even then it's still a problem. Note, the section in the distutils docs on creating command extensions is only around three paragraphs. There is also no central place to go looking for extra commands (that I know of). Or a place to document or share each others command extensions. Many of the methods for extending distutils are not very well documented either. For example, 'how do I you change compiler command line arguments for certain source files?' Basic things like that are possible with disutils, but not documented (very well).
There are build farms for windows packages and OSX uploaded to pypi. Start uploading pre releases to pypi, and you get these for free (once you make numpy compile out of the box on those compile farms). There are compile farms for other OSes too... like ubuntu/debian, macports etc. Some distributions even automatically download, compile and package new releases once they spot a new file on your ftp/web site.
I am familiar with some of those systems (PPA and opensuse build service in particular). One of the goal of my proposal is to make it easier to interoperate with those tools.
yeah, cool.
I think Pypi is mostly useless. The lack of enforced metadata is a big no-no IMHO. The fact that Pypi is miles beyond CRAN for example is quite significant. I want CRAN for scientific python, and I don't see Pypi becoming it in the near future.
The point of having our own Pypi-like server is that we could do the following: - enforcing metadata - making it easy to extend the service to support our needs
Yeah, cool. Many other projects have their own servers too. pygame.org, plone, etc etc, which meet their own needs. Patches are accepted for pypi btw. What type of enforcements of meta data, and how would they help? I imagine this could be done in a number of ways to pypi. - a distutils command extension that people could use. - change pypi source code. - check the metadata for certain packages, then email their authors telling them about issues.
It is interesting to note that one of the maintainer of pypm has recently quitted the discussion about Pypi, most likely out of frustration from the other participants.
yeah, big mailing list discussions hardly ever help I think :) oops, this is turning into one.
Documentation projects are being worked on to document, give tutorials and make python packaging be easier all round. As witnessed by 20 or so releases on pypi every day(and growing), lots of people are using the python packaging tools successfully.
This does not mean much IMO. Uploading on Pypi is almost required to use virtualenv, buildout, etc.. An interesting metric is not how many packages are uploaded, but how much it is used outside developers.
Yeah, it only means that there are lots of developers able to use the packaging system to put their own packages up there. However there are over 500 science related packages on there now - which is pretty cool. A way to measure packages being used would be by downloads, and by which packages depend on which other packages. I think the science ones would be reused lower than normal, since a much higher percentage are C/C++ based, and are likely to be more fragile packages.
I'm not sure making a separate build tool is a good idea. I think going with the rest of the python community, and improving the tools there is a better idea.
It has been tried, and IMHO has been proved to have failed. You can look at the recent discussion (the one started by Guido in particular).
I don't think 500+ science related packages is a total failure really.
pps. some notes on toydist itself. - toydist convert is cool for people converting a setup.py . This means that most people can try out toydist right away. but what does it gain these people who convert their setup.py files?
Not much ATM, except that it is easier to write a toysetup.info compared to setup.py IMO, and that it supports a simple way to include data files (something which is currently *impossible* to do without writing your own distutils extensions). It has also the ability to build eggs without using setuptools (I consider not using setuptools a feature, given the too many failure modes of this package).
yeah, I always make setuptools not used in my packages by default. However I use command line arguments to use the features of setuptools required (eggs, bdist_mpkg etc etc). Having a tool to create eggs without setuptools would be great in itself. Definitely list this in the feature list :)
The main goals though are to make it easier to build your own tools on top of if, and to integrate with real build systems.
yeah, cool.
- a toydist convert that generates a setup.py file might be cool :)
toydist started like this, actually: you would write a setup.py file which loads the package from toysetup.info, and can be converted to a dict argument to distutils.core.setup. I have not updated it recently, but that's definitely on the TODO list for a first alpha, as it would enable people to benefit from the format, with 100 % backward compatibility with distutils.
yeah, cool. That would let you develop things incrementally too, and still have toydist be useful for the whole development period until it catches up with the features of distutils needed.
- arbitrary code execution happens when building or testing with toydist.
You are right for testing, but wrong for building. As long as the build is entirely driven by toysetup.info, you only have to trust toydist (which is not safe ATM, but that's an implementation detail), and your build tools of course.
If you execute build tools on arbitrary code, then arbitrary code execution is easy for someone who wants to do bad things. Trust and secondarily sandboxing are the best ways to solve these problems imho.
Obviously, if you have a package which uses an external build tool on top of toysetup.info (as will be required for numpy itself for example), all bets are off. But I think that's a tiny fraction of the interesting packages for scientific computing.
yeah, currently 1/5th of science packages use C/C++/fortran/cython etc (see http://pypi.python.org/pypi?:action=browse&c=40 110/458 on that page ). There seems to be a lot more using C/C++ compared to other types of pakages on there (eg zope3 packages list 0 out of 900 packages using C/C++). So the hight number of C/C++ science related packages on pypi demonstrate that better C/C++ tools for scientific packages is a big need. Especially getting compile/testing farms for all these packages. Getting compile farms is a big need compared to python packages - since C/C++ is MUCH harder to write/test in a portable way. I would say it is close to impossible to get code to work without quite good knowledge on multiple platforms without errors. There are many times with pygame development that I make changes on an osx, windows or linux box, commit the change, then wait for the compile/tests to run on the build farm ( http://thorbrian.com/pygame/builds.php ). Releasing packages otherwise makes the process *heaps* longer... and many times I still get errors on different platforms, despite many years of multi platform coding.
Sandboxing is particularly an issue on windows - I don't know a good solution for windows sandboxing, outside of full vms, which are heavy-weights.
yeah, VMs are the way to go. If only to make the copies a fresh install each time. However I think automated distributed building, and trust are more useful. ie, only build those packages where you trust the authors, and let anyone download, build and then post their build/test results. MS have given out copies of windows to some people to set up VMs for building to different members of the python community in the past. By automated distributed building, I mean what happens with mailing lists usually. Where people post their test results when they have a problem. Except in a more automated manner. Adding a 'Do you want to upload your build/test results?' at the end of a setup.py for subversion builds would give you dozens or hundreds of test results daily from all sorts of machines. Making it easy for people to set up package builders which also upload their packages somewhere gives you distributed package building, in a fairly safe automated manner. (more details here: http://renesd.blogspot.com/2009/09/python-build-bots-down-maybe-they-need.ht... )
- it should be possible to build this toydist functionality as a distutils/distribute/buildout extension.
No, it cannot, at least as far as distutils/distribute are concerned (I know nothing about buildout). Extending distutils is horrible, and fragile in general. Even autotools with its mix of generated sh scripts through m4 and perl is a breeze compared to distutils.
- extending toydist? How are extensions made? there are 175 buildout packages which extend buildout, and many that extend distutils/setuptools - so extension of build tools in a necessary thing.
See my answer earlier about interoperation with build tools.
I'm still not clear on how toydist will be extended. I am however, a lot clearer about its goals. cheers,