Mailman 3 People want CPAN :-) - Distutils-SIG

People want CPAN :-)

Guido van Rossum

6 Nov 2009 6 Nov '09

5:53 p.m.

I just found this comment on my blog. People have told me this in person too, so I believe it is real pain (even if the solution may be elusive and the suggested solutions may not work). But I don't know how to improve the world. Is the work on distutils-sig going to be enough? Or do we need some other kind of work in addition? Do we need more than PyPI? --Guido ---------- Forwarded message ---------- From: dalloliogm <noreply-comment@blogger.com> Date: Fri, Nov 6, 2009 at 8:01 AM Subject: [Neopythonic] New comment on Python in the Scientific World. To: gvanrossum@gmail.com dalloliogm has left a new comment on your post "Python in the Scientific World": Python is suffering a lot in the scientific word, because it has not a CPAN-like repository. PyPI is fine, but it is still far from the level of CPAN, CRAN, Bioconductor, etc.. Scientists who use programming usually have a lot of different interests and approaches, therefore it is really difficult to write a package that can be useful to everyone. Other programming language like Perl and R have repository-like structure which enable people to download packages easily, and to upload new ones and organize them withouth having to worry about having to integrate them into existing packages. This is what is happening to biopython now: it is a monolitic package that it is supposed to work for any bioinformatic problem; but this is so general that to accomplish that you would need to add a lot of dependencies, to numpy, networkx, suds, any kind of library. However, since easy_install is not as ready yet as the counterparts in other languages, if the biopython developers add too many dependencies, nobody will be able to install it properly, and nobody will use it. Posted by dalloliogm to Neopythonic at November 6, 2009 8:01 AM -- --Guido van Rossum (python.org/~guido)

Show replies by date

Alex Grönholm

6 Nov 6 Nov

10:14 p.m.

Guido van Rossum kirjoitti:

...

I just found this comment on my blog. People have told me this in person too, so I believe it is real pain (even if the solution may be elusive and the suggested solutions may not work). But I don't know how to improve the world. Is the work on distutils-sig going to be enough? Or do we need some other kind of work in addition? Do we need more than PyPI?

--Guido

---------- Forwarded message ---------- From: dalloliogm <noreply-comment@blogger.com> Date: Fri, Nov 6, 2009 at 8:01 AM Subject: [Neopythonic] New comment on Python in the Scientific World. To: gvanrossum@gmail.com

dalloliogm has left a new comment on your post "Python in the Scientific World":

Python is suffering a lot in the scientific word, because it has not a CPAN-like repository.

PyPI is fine, but it is still far from the level of CPAN, CRAN, Bioconductor, etc..

Scientists who use programming usually have a lot of different interests and approaches, therefore it is really difficult to write a package that can be useful to everyone. Other programming language like Perl and R have repository-like structure which enable people to download packages easily, and to upload new ones and organize them withouth having to worry about having to integrate them into existing packages.

This is what is happening to biopython now: it is a monolitic package that it is supposed to work for any bioinformatic problem; but this is so general that to accomplish that you would need to add a lot of dependencies, to numpy, networkx, suds, any kind of library. However, since easy_install is not as ready yet as the counterparts in other languages, if the biopython developers add too many dependencies, nobody will be able to install it properly, and nobody will use it.

I for one did not understand the problem. What does CPAN have that PyPI doesn't? It is natural for packages (distributions, in distutils terms) to have dependencies on each other. Why is this a problem?

...

Posted by dalloliogm to Neopythonic at November 6, 2009 8:01 AM

exarkun＠twistedmatrix.com

10:53 p.m.

On 10:14 pm, alex.gronholm@nextday.fi wrote:

...

Guido van Rossum kirjoitti:

...
I just found this comment on my blog. People have told me this in person too, so I believe it is real pain (even if the solution may be elusive and the suggested solutions may not work). But I don't know how to improve the world. Is the work on distutils-sig going to be enough? Or do we need some other kind of work in addition? Do we need more than PyPI?

--Guido

---------- Forwarded message ---------- From: dalloliogm <noreply-comment@blogger.com> Date: Fri, Nov 6, 2009 at 8:01 AM Subject: [Neopythonic] New comment on Python in the Scientific World. To: gvanrossum@gmail.com

dalloliogm has left a new comment on your post "Python in the Scientific World":

Python is suffering a lot in the scientific word, because it has not a CPAN-like repository.

PyPI is fine, but it is still far from the level of CPAN, CRAN, Bioconductor, etc..

Scientists who use programming usually have a lot of different interests and approaches, therefore it is really difficult to write a package that can be useful to everyone. Other programming language like Perl and R have repository-like structure which enable people to download packages easily, and to upload new ones and organize them withouth having to worry about having to integrate them into existing packages.

This is what is happening to biopython now: it is a monolitic package that it is supposed to work for any bioinformatic problem; but this is so general that to accomplish that you would need to add a lot of dependencies, to numpy, networkx, suds, any kind of library. However, since easy_install is not as ready yet as the counterparts in other languages, if the biopython developers add too many dependencies, nobody will be able to install it properly, and nobody will use it. I for one did not understand the problem. What does CPAN have that PyPI doesn't? It is natural for packages (distributions, in distutils terms) to have dependencies on each other. Why is this a problem?

I'm also not sure I see what problem CPAN is solving that PyPI is failing at. At most, it sounds like the OP is complaining that the software available to scientists in perl is better than the software available to scientists in python (for some definition of better - there's more of it, it solves their particular problems better, whatever). PyPI *does* let you download packages easily, it does let you upload new ones and organize them without having to worry about integrating them into existing packages. So the features of CPAN that are described as desirable are already available in PyPI. Jean-Paul

Bob Ippolito

10:56 p.m.

On Fri, Nov 6, 2009 at 2:53 PM, <exarkun@twistedmatrix.com> wrote:

...

On 10:14 pm, alex.gronholm@nextday.fi wrote:

...
Guido van Rossum kirjoitti:

...
I just found this comment on my blog. People have told me this in person too, so I believe it is real pain (even if the solution may be elusive and the suggested solutions may not work). But I don't know how to improve the world. Is the work on distutils-sig going to be enough? Or do we need some other kind of work in addition? Do we need more than PyPI?

--Guido

---------- Forwarded message ---------- From: dalloliogm <noreply-comment@blogger.com> Date: Fri, Nov 6, 2009 at 8:01 AM Subject: [Neopythonic] New comment on Python in the Scientific World. To: gvanrossum@gmail.com

dalloliogm has left a new comment on your post "Python in the Scientific World":

Python is suffering a lot in the scientific word, because it has not a CPAN-like repository.

PyPI is fine, but it is still far from the level of CPAN, CRAN, Bioconductor, etc..

Scientists who use programming usually have a lot of different interests and approaches, therefore it is really difficult to write a package that can be useful to everyone. Other programming language like Perl and R have repository-like structure which enable people to download packages easily, and to upload new ones and organize them withouth having to worry about having to integrate them into existing packages.

This is what is happening to biopython now: it is a monolitic package that it is supposed to work for any bioinformatic problem; but this is so general that to accomplish that you would need to add a lot of dependencies, to numpy, networkx, suds, any kind of library. However, since easy_install is not as ready yet as the counterparts in other languages, if the biopython developers add too many dependencies, nobody will be able to install it properly, and nobody will use it.

I for one did not understand the problem. What does CPAN have that PyPI doesn't? It is natural for packages (distributions, in distutils terms) to have dependencies on each other. Why is this a problem?

I'm also not sure I see what problem CPAN is solving that PyPI is failing at. At most, it sounds like the OP is complaining that the software available to scientists in perl is better than the software available to scientists in python (for some definition of better - there's more of it, it solves their particular problems better, whatever).

PyPI *does* let you download packages easily, it does let you upload new ones and organize them without having to worry about integrating them into existing packages. So the features of CPAN that are described as desirable are already available in PyPI.

It might be the CPAN user interface that makes the difference. Using CPAN you can search for and install applications from an interactive prompt. No web browser needed, just one command line invocation (perl -MCPAN -e shell). -bob

David Lyon

11:01 p.m.

On Fri, 6 Nov 2009 14:56:51 -0800, Bob Ippolito <bob@redivi.com> wrote:

...

It might be the CPAN user interface that makes the difference. Using CPAN you can search for and install applications from an interactive prompt. No web browser needed, just one command line invocation (perl -MCPAN -e shell).

That's definitely one thing. We have nothing like that yet. David

Ian Bicking

11:50 p.m.

On Fri, Nov 6, 2009 at 5:01 PM, David Lyon <david.lyon@preisshare.net> wrote:

...

On Fri, 6 Nov 2009 14:56:51 -0800, Bob Ippolito <bob@redivi.com> wrote:

...
It might be the CPAN user interface that makes the difference. Using CPAN you can search for and install applications from an interactive prompt. No web browser needed, just one command line invocation (perl -MCPAN -e shell).

That's definitely one thing. We have nothing like that yet.

...

From (server-side) development perspective, CPAN seems to be a less monolithic environment. The core of it (at least from what Perl

My understanding is that they have all the same problems we do -- version management, dependencies, isolation, varying quality, etc. When I've tried to install things from CPAN it's been surprisingly obtuse. This is not to say there aren't good things, but "we need CPAN" doesn't really get to the heart of it. people have told me) is really a very simple mirrored file repository, with loose naming conventions. Everything else is built up as added, optional, decentralized services from that. (I wonder if they are using mirroring as a way to distribute these services? E.g., register your I-test-everything mirror, and it'll get packages uploaded to it automatically.) We have some of the same things. Cheesecake is one example, and I believe borrowed ideas and terminology from CPAN's equivalent service. Cheesecake, as an example, seems to be largely forgotten and unused. It might be useful to unforget that product. There are also places where (ironically?) Perl has One Way To Do It, and benefits as a result. Documentation of course, though maybe now with Sphinx we can start doing vertical integration using that tool. Testing is a bit off; "setup.py test" hasn't caught on that well, and I believe it is more prescriptive than Perl's system (which is really just a script that outputs results in a specific format, while setup.py test is based on unittest). I think we'd do well to do *less* vertical integration for testing (though maybe it's fine, things like the nose entry point split the difference). In terms of how a setup.py should be structured, and other developer packaging issues, I think we're getting closer but we *really* need someone to document the conventions and lessons that are usually just spread through copy-and-paste and developer feedback. (In some communities paster create as had some positive influence, as a kind of documentation-through-code, but it's adoption is somewhat coincidental and goes along social lines.) In terms specifically of the CPAN interactive shell, I don't think there's any reason to duplicate it, as no one else makes interactive shell installers; I think it's an accident of history. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker

David Lyon

7 Nov 7 Nov

1:06 a.m.

On Fri, 6 Nov 2009 17:50:46 -0600, Ian Bicking <ianb@colorstudy.com> wrote:

...

My understanding is that they have all the same problems we do -- version management, dependencies, isolation, varying quality, etc.

Same problems. But CPAN does it a lot better.

...

When I've tried to install things from CPAN it's been surprisingly obtuse. This is not to say there aren't good things, but "we need CPAN" doesn't really get to the heart of it.

Works for me - it's like chalk and cheese - but python being the chalk in this case. Corporate contractor developers know the difference. Scientific dudes can obviously spot the difference. More than anything, it is a culture difference. CPAN is taken really seriously by those involved. There's so much passion to get things right and make things work smoothly. The "heart of it" is the CPAN culture. I never experienced the 'we don't support your platform' issue ever in perl. To CPAN, it is all just perl.

...

Testing is a bit off; "setup.py test" hasn't caught on that well..

In perl there is more incentive to provide this. In Python we don't have any place where this can be automatically activated on a package. So there is no incentive. The real world relevance would be some tests to ensure that the package can work on a number of different platforms and python versions before being accepted, and that the package has some tests that can be automatically run.

...

In terms of how a setup.py should be structured, and other developer packaging issues, I think we're getting closer but we *really* need someone to document the conventions and lessons that are usually just spread through copy-and-paste and developer feedback...

The individual components that we have are improving and we are getting closer to having something. But CPAN has an energy level behind it that we are yet to duplicate. David

Tarek Ziadé

6 Nov 6 Nov

11:50 p.m.

On Fri, Nov 6, 2009 at 11:56 PM, Bob Ippolito <bob@redivi.com> wrote: [...]

...

It might be the CPAN user interface that makes the difference. Using CPAN you can search for and install applications from an interactive prompt. No web browser needed, just one command line invocation (perl -MCPAN -e shell).

Although, we have an XML-RPC service for PyPI, and Yolk for example is a client that will let you query PyPI through a prompt. I agree it could be enhanced, but it exists. Maybe such a client could be added in Distutils ? Plus, once the work on PEP 390 (or an equivalent one) is over, we will be able to list a distribution's dependencies on a given target system just by querying PYPI. Tarek

Ian Bicking

11:57 p.m.

On Fri, Nov 6, 2009 at 5:50 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...

On Fri, Nov 6, 2009 at 11:56 PM, Bob Ippolito <bob@redivi.com> wrote: [...]

...
It might be the CPAN user interface that makes the difference. Using CPAN you can search for and install applications from an interactive prompt. No web browser needed, just one command line invocation (perl -MCPAN -e shell).

Although, we have an XML-RPC service for PyPI, and Yolk for example is a client that will let you query PyPI through a prompt. I agree it could be enhanced, but it exists. Maybe such a client could be added in Distutils ?

Plus, once the work on PEP 390 (or an equivalent one) is over, we will be able to list a distribution's dependencies on a given target system just by querying PYPI.

It's quite possible that pip will grow some of these features. At least search and querying the currently installed packages are planned in some fashion (someone has indicated the intent to do both these, I believe). -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker

Tarek Ziadé

11:46 p.m.

On Fri, Nov 6, 2009 at 11:53 PM, <exarkun@twistedmatrix.com> wrote: [..]

...

PyPI *does* let you download packages easily, it does let you upload new ones and organize them without having to worry about integrating them into existing packages. So the features of CPAN that are described as desirable are already available in PyPI.

imho, PyPI + PEP 381 + PEP 381 compliant clients == CPAN Tarek

Milind Khadilkar

7 Nov 7 Nov

12:35 a.m.

QUOTE imho, PyPI + PEP 381 + PEP 381 compliant clients == CPAN UNQUOTE In addition to PyPI, PEP381, PEP381 compliant clients, we need a dash of changed outlook to get CPAN. The Perl community is -- by hearsay -- oriented sympathetically towards people who are not primarily programmers. In particular, I have heard scientists and finance persons developing web applications say this when introduced to python. But the same people were more satisfied with the python stuff when somebody helped them through the technicalities. When someone encounters PyPI for the first time, is it easy to learn about it from the PyPI page itself? Hopeful clicks on the "tutorial" link brings up a bewildering CheeseShop Tutorial, without an explanation of what CheeseShop is. Reading through the options detailed in the page makes sense only if you already know about them. Definitely not the material for the beginner or for the technologically-just-adequate. By the way, the PyPI tutorial is just an example. The python community, while excellent in the support it provides to the technically competent, needs to be more friendly towards mere end-users. The attitude should be more charitable than "GOD helps them, and only them, who help themselves". Or, rather than between commuinities, is it actually the difference between Python and Perl as languages themselves? Regards Milind Khadilkar On Sat, Nov 7, 2009 at 5:16 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...

On Fri, Nov 6, 2009 at 11:53 PM, <exarkun@twistedmatrix.com> wrote: [..]

...
PyPI *does* let you download packages easily, it does let you upload new ones and organize them without having to worry about integrating them

into

...
existing packages. So the features of CPAN that are described as desirable are already available in PyPI.

imho,

PyPI + PEP 381 + PEP 381 compliant clients == CPAN

Tarek _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

Tarek Ziadé

1:19 a.m.

On Sat, Nov 7, 2009 at 1:35 AM, Milind Khadilkar <zedobject@gmail.com> wrote:

...

QUOTE imho,

PyPI + PEP 381 + PEP 381 compliant clients == CPAN UNQUOTE In addition to PyPI, PEP381, PEP381 compliant clients, we need a dash of changed outlook to get CPAN. The Perl community is -- by hearsay -- oriented sympathetically towards people who are not primarily programmers. In particular, I have heard scientists and finance persons developing web applications say this when introduced to python.

But the same people were more satisfied with the python stuff when somebody helped them through the technicalities.

When someone encounters PyPI for the first time, is it easy to learn about it from the PyPI page itself? Hopeful clicks on the "tutorial" link brings up a bewildering CheeseShop Tutorial, without an explanation of what CheeseShop is. Reading through the options detailed in the page makes sense only if you already know about them. Definitely not the material for the beginner or for the technologically-just-adequate.

By the way, the PyPI tutorial is just an example. The python community, while excellent in the support it provides to the technically competent, needs to be more friendly towards mere end-users. The attitude should be more charitable than "GOD helps them, and only them, who help themselves".

Or, rather than between commuinities, is it actually the difference between Python and Perl as languages themselves?

I don't know about the difference between Perl and Python, but I do know this about me (and this applies to other people in this room I believe) As a "packaging tools" developer I am totally plunged into the technical aspects of the project, and I unfortunately don't have all the time I wish I had to work on this more. Especially right now because there's a lot of work going on. So, for the documentation part, I am hiding behind the agile manifesto, which says, "working software over comprehensive documentation" but that's a shame :) Now, if you do want to help in the documentation so it's better for newcomers, (in PyPI, Distutils, etc), I think we would all be happy on such contributions Tarek

Kevin Teague

1:56 a.m.

Since part of my job includes assisting bioinformaticians with installing packages and providing libraries which they can use, I'll chime in with a couple of major thrusts behind the statement "CPAN > PyPI": 1) Automatic dependency handling. Python still doesn't officially support automatic dependency handling. The addition of the 'Requires' field specifying import names and not the project distribution names was unfortunate. Thankfully PJE gave us 'install_requires' and Tarek and others are helping to PEP that field into the official package metadata. So now it's possible to install many Python projects today with automatic dependency handling. However, for scientific python projects, there has been little movement towards this. Many projects still have setup.py scripts that are written such that they can only be installed with manual hand-holding. BioPython is one such example where running 'python setup.py install' asks a series of interactive questions on the command prompt ... which I imagine was written that way because Python didn't have any dependency-handling tools at they time the setup.py was created, so they essentially wrote a mini- dependency handler in setup.py. This is a common pattern in many scientific python projects (and it sucks!). This means that you can't write a python project, state that you require BioPython, and automatically install that dependency when your project is installed. With the CPAN client being ubiquitous in Perl, any author who wrote a Makefile.PL that behaved like that would be harangued until they fixed it. But things are moving the right direction in Python, so we'll get there, just a decade or so later than Perl. But better late than never! Client-wise, I think Python is starting to do really well (if you are fortunate enough to only need to install projects with well written setup.py scripts). Our bioinformaticians use Buildout or pip +virtualenv, depending up which tool is better suited for them (e.g. they just need some libraries to run against or a team collaborating on a more complex web application or a project that includes some non- python parts as well). In this area I think CPAN's first mover advantage is working against them, as inertia and the "CPAN is better" mantra has meant that tools that allow repeatable installations like Buildout or Pip either don't exist or are very immature and rarely used. I'd much rather declaratively state up-front (and store in version control) what packages and the configuration those packages may need, then grab a coffee while the tool does it's job, than sit there guiding an interactive session along like the CPAN client requires. Being able to run private PyPI servers is a nice PyPI advantage on the server side. Bioinformaticians need to share packages where the code either can't be pulbically released, or they simply don't spend enough time coding to want to release code which "anyone could see". 2) The python packaging "morass" as a scapegoat Yeah, so installing, creating and distributing Python packages is a bit of a morass. The fact that Python has a much larger Standard Library than Perl means that out-of-the box Python is easier, but folks using Perl are forced to learn how to install libraries much earlier on, so they get over that initial hump of "how to deal with 3rd party libraries" a bit earlier. And as many have noted the documentation surrounding Python packaging is not the greatest. However, in my experience scientists can be notorious for not really wanting to think about what code they are running or what version it is or how they should be deploying it. Their brains are already quite full trying to deal with the complex scientific problems they're working on, and there is no glamour or kudos for being "good at installating software" for a scientist. They often take the same attitude to automated testing, but at least the scientist who says, "ah, automated testing is a waste of time and adds little value" may feel quite a bit of peer pressure to change this opinion. But a scientist who says, "ah, packaging and installation is a morass and a waste of time." tends only be met with, "amen! You're better of not even trying you'll just be wasting your time". So it doesn't take much for them to throw their arms up and say, "augh! It's hopeless!" and they have no shortage of scapegoats in Python packaging right now! But our organization has Perl using-scientists and Ruby using-scientists and they love to say the same thing, "Library management is too hard with Perl" or "Ruby gems suck" and "Java CLASSPATH management is a nightmare".

...

Is the work on distutils-sig going to be enough? Or do we need some other kind of work in addition? Do we need more than PyPI?

I think the current works-in-progress will eventually get us equal to or greater than CPAN. But given the history and design of Distutils, it is sometimes tempting to say, "Ah, let's just chuck the whole thing out and start with something clean and well-designed." Of course, pragmatically I don't think such an effort would do much more than generate a lot of hand-wringing. However, if we really had the interests of the scientific community in mind, we'd be thinking about the fact that scientists write code in Python, Ruby, Perl, R and whatever language happens to be handy and gets the job done. And from their perspective the fact that if they happen to use two different programming languages, they need to learn two different packaging formats and two different package management tools. If packaging formats and installation formats were standardized enough that they worked for multiple languages, that would be something that many might consider jumping ship for. And it would provide a level playing fields for all F/OSS languages. It's a terribly ambitious project, but I can dream can't I? (and yes, we do use system package managers such as RPM+Yum and Dpkg +apt-get where it makes sense - but those tools and formats are really more geared towards sysadmins and I've never seen them used in a way that integrates well with the requirements and workflow of a developer or scientist)

David Lyon

4:07 a.m.

On Sat, 7 Nov 2009 02:19:02 +0100, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...

I unfortunately don't have all the time I wish I had to work on this more.

We all know that. It needs somebody new to take on the challenge.

...

Now, if you do want to help in the documentation so it's better for newcomers, (in PyPI, Distutils, etc), I think we would all be happy on such contributions

The problem isn't only about documentation. We need the following: - Overhaul the build system (simplify) - implement a simple install system (metadata install?) - specify an internal test framework for packages. - setup a buildbot testing framework on pypi After that, we could modify the documentation.

...

...
The Perl community is -- by hearsay -- oriented sympathetically towards people who are not primarily programmers.

That's what we don't have.... and need. I think this is just too big for distutils alone. Distutils and the building of packages is only one part of what needs to change to bring about the changes that Guido is requesting. David

Tarek Ziadé

11:12 a.m.

On Sat, Nov 7, 2009 at 5:07 AM, David Lyon <david.lyon@preisshare.net> wrote:

...

On Sat, 7 Nov 2009 02:19:02 +0100, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...
I unfortunately don't have all the time I wish I had to work on this more.

We all know that.

It needs somebody new to take on the challenge.

David, You don't understand at all what is going on I think. You have been keeping sending passive-agressive emails in python-dev and in distutils-sig everytime we mention the work that's going on in Distutils. This is happening since we said we didn't want to integrate your GUI system in Python. When I say "I unfortunately don't have all the time I wish I had to work on this more", it doesn't mean that we need to find somebody "new" to take on the challenge, it means that we can take more help from more people (and that includes you of course). Wether you like it or not, I am maintaining Distutils, so I am trying to coordinate the work that is going on for that matter. And this work is currently organized in several PEPs we are changing everytime someone provides some new feedback and insights. You have started some kind of "counter-PEP" to push another build system, and I really doubt this is useful in any ways. Discussing new ideas is useful of course, but changing Distutils for another build system in the stdlib will not happen, unless its proven to be sucessfully used by the community for some time. I will do at this point what other have been doing at python-dev and just ignore your mails for now on. Tarek

David Lyon

2:46 p.m.

Hi Tarek, On Sat, 7 Nov 2009 12:12:44 +0100, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...

You don't understand at all what is going on I think.

I guess so.

...

.. it means that we can take more help from more people (and that includes you of course).

I really do accept that. What's hard for me to understand exactly is what you will actually allow me to do.

...

Wether you like it or not, I am maintaining Distutils, so I am trying to coordinate the work that is going on for that matter. And this work is currently organized in several PEPs we are changing everytime someone provides some new feedback and insights.

Why would I mind that you're maintaining distutils ? I don't pretend to know it as well as you do. I've been working in software for over twenty years and I've seen many things but nothing really like distutils. I certainly couldn't maintain it the way you are doing. I have real trouble understanding it.

...

You have started some kind of "counter-PEP" to push another build system, and I really doubt this is useful in any ways.

You shouldn't take it personally that another person tried to submit a PEP. So many people helped me with my proposal on the mailing list - and I owe it to them to at least get their efforts taken into consideration as a PEP. In any case, if you feel strongly about this then I can revise my proposal so that it doesn't look like a proposal for a build system (it was for a metadata setup - not a build system).

...

Discussing new ideas is useful of course, ..

The implication there is you don't want me to submit PEPs because they somehow counter your work. Well if that's the case I can be more careful in the future. There's plenty of things that I can write PEPs for now... A metadata install was just the first.. And to my knowledge, there still isn't a PEP about that. Right? David

Tarek Ziadé

3:08 p.m.

On Sat, Nov 7, 2009 at 3:46 PM, David Lyon <david.lyon@preisshare.net> wrote: [..]

...

...
.. it means that we can take more help from more people (and that includes you of course).

I really do accept that.

What's hard for me to understand exactly is what you will actually allow me to do.

Gosh. I am not your boss, and I am not telling you what to do. This is open source, but this is also a community... You can't just come around and ignore some parts of what is being done. I am just telling you that if you want to help in the static metadata field, and if your goal is to help *Distutils* improve, you have to work with what has been started! [..]

...

...
You have started some kind of "counter-PEP" to push another build system, and I really doubt this is useful in any ways.

You shouldn't take it personally that another person tried to submit a PEP. So many people helped me with my proposal on the mailing list - and I owe it to them to at least get their efforts taken into consideration as a PEP.

In any case, if you feel strongly about this then I can revise my proposal so that it doesn't look like a proposal for a build system (it was for a metadata setup - not a build system). [..] The implication there is you don't want me to submit PEPs because they somehow counter your work. Well if that's the case I can be more careful in the future. There's plenty of things that I can write PEPs for now...

Neither Guido, neither Brett neither I are taking this personally. But at some point, if your goal is to help improving Distutils, you have to work with what has been started.

...

A metadata install was just the first..

And to my knowledge, there still isn't a PEP about that. Right?

If you work on a PEP that is related to any PEP started in the same area, I will strongly oppose against adding your PEP because it is a non-sense. Take and read the existing PEPs and help us improving them. We don't add new PEPs for the heck of it. If it's a credit issue, your name will be added in any PEP you will provide a valuable help on. Tarek

David Lyon

3:56 p.m.

On Sat, 7 Nov 2009 16:08:46 +0100, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...

Gosh. I am not your boss, and I am not telling you what to do.

otoh you're the boss of distutils. So you can direct people to work on certain things to help you along. That would have kept me much quieter with work.

...

I am just telling you that if you want to help in the static metadata field, and if your goal is to help *Distutils* improve, you have to work with what has been started!

Well talk about static metadata installation was started at pycon last year. Nothing happened till I started working on it. And I believe that I'm the only one working on it now.

...

If you work on a PEP that is related to any PEP started in the same area, I will strongly oppose against adding your PEP because it is a non-sense.

Same area? as in any packaging PEP? any distutils PEP? There are PEPs in every area. And PEPs are inter-related. Does rejection of PEPs on the above grounds apply to everyone? or just me? Other people seem to have PEPs in related areas.

...

We don't add new PEPs for the heck of it.

And people don't write them for the heck of it either. If I see something I can do to assist existing PEPs I'll try to do so. David

Floris Bruynooghe

8 Nov 8 Nov

7:19 p.m.

On Sat, Nov 07, 2009 at 10:56:24AM -0500, David Lyon wrote:

...

On Sat, 7 Nov 2009 16:08:46 +0100, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...
Gosh. I am not your boss, and I am not telling you what to do.

otoh you're the boss of distutils. So you can direct people to work on certain things to help you along. That would have kept me much quieter with work.

If you have time to burn and need to be told how to spend it: on top of my head a useful contribution would be to improve the PEP 386 reference implementation for example (I pick that one as I know most about it from all the PEP proposals currently). This is a PEP that should maybe be finished first of the bunch, so that's definitely useful. Useful things: * Read up on the PEP and all the documentation in the reference implementation * Check out previous discussions and make sure nothing is missed. * Add more test! The core class could be tested better, and suggest_rational_version is massively undertested. For this you need to read up on the old distutils version scheme as well as setuptools scheme and build lots of test cases for it. Also looking at all versions on the cheesshop to figure out tricky version numbers and test the function. All the "reasonable" version numbers used on PyPI should be tested really. * Check the history of the implementation. Maybe there have been checkins that correct things in one area but left similar bugs in other areas (e.g. a bug fixed in the class but not in suggest_rational_version). If so add tests. * If you managed to create tests that fail, see if you can figure out ways to make them pass. * Improve the documentation of the reference implementation, after you've done many of the above things you'll definitely have found some places where it could be improved. * Create patches out of the above work and submit it to the refrence implementation, if they're useful and good they'll get accepted. But don't be discouraged if a re-work is asked for initially. I'm not telling you the above things because I want to be your (or anyone's) boss and tell other people what to do. I'm telling you them as an example of how to contribute to the current work. The important thing is that I don't know anything more then you do, I haven't had secret conversations with a cabal or so (I'm not even on IRC usually). The only thing I do is read the mailing list and look at proposals I'm interested in. Finding and carrying out work like this is contributing. Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org

David Lyon

10:03 p.m.

On Sun, 8 Nov 2009 19:19:44 +0000, Floris Bruynooghe <floris.bruynooghe@gmail.com> wrote:

...

.. improve the PEP 386 reference implementation for example (I pick that one as I know most about it from all the PEP proposals currently).

I'm +1 on PEP-386. It makes sense to me.

...

This is a PEP that should maybe be finished first of the bunch

PEP-345 is perphaps more important. More depends on that. I think/hope Guido's post has changed things a little. I agree with those who say everybody should work together to some PEPs closed off and I also agree with the push to offer something that is of a more comparable standard to CPAN. To get to where Guido is asking for, I think there are some gaps in the PEP coverage. So we need to cover those bases also. David

David Lyon

7 Nov 7 Nov

4:38 a.m.

On Sat, 7 Nov 2009 00:46:01 +0100, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...

PyPI + PEP 381 + PEP 381 compliant clients == CPAN

Not sure about that.. You mustn't have used CPAN much. And if we try to pretend they are or cobble together a solution it's very much like adding a turbo-charger to a lada-niva in an attempt to come up with a competition rally car like a subaru-wrx.

...

PyPI + PEP 381 + PEP 381 compliant clients == still imaginary solution

CPAN == seriously good solution that's been working for years. Even if you implement PyPI + PEP 381 + PEP 381 tomorrow I promise you that you won't be anywhere close to CPAN. It's a much more serious challenge than perphaps you realise.. David

Jeff Rush

12:04 p.m.

David Lyon wrote:

...

Even if you implement PyPI + PEP 381 + PEP 381 tomorrow I promise you that you won't be anywhere close to CPAN.

It's a much more serious challenge than perphaps you realise..

I keep reading and I keep hearing you and others saying this, but as someone who has never used CPAN, I'm not seeing the large number of specific implementable tasks that CPAN clearly has and PyPI clearly does not. Without handwaving please, in a technical sense what does CPAN have that is so wonderful other than the items mentioned so far of: - buildbot on the server - mirrored. hierarchical servers - one widely known and accepted way of doing packaging If we implement those, if we implement PEP 381, in what specific way will we still fall *far short* as you suggest? I can't fix the social aspects so those are not of use to me. Only architecture. BTW at the recent pyTexas regional conference we had a good group discussion about packaging, with people offering analysis from the Perl, Java, Ruby and Haskell communities. It seems each language still only covers part of the solution, albeit different parts. -Jeff

Brad Allen

12:53 p.m.

On Sat, Nov 7, 2009 at 6:04 AM, Jeff Rush <jeff@taupro.com> wrote:

...

BTW at the recent pyTexas regional conference we had a good group discussion about packaging, with people offering analysis from the Perl, Java, Ruby and Haskell communities. It seems each language still only covers part of the solution, albeit different parts.

During that conversation Walker mentioned that he thought that Java's Maven packaging system had been really successful with the idea of having a metadata file stored outside each package. That made it easier for the metadata format to evolve; there is no need change every package in a repository every time the metadata format is improved. It also allows convenient inspection of dependencies between packages without having to download/decompress those packages. (Maven also stores a copy of the metadata inside the package, but the authoritative/complete copy is considered the one outside the package). I've cc'd Walker hoping he will comment more about the other advantages of externally stored metadata.

David Lyon

1:38 p.m.

On Sat, 7 Nov 2009 06:53:17 -0600, Brad Allen <bradallen137@gmail.com> wrote:

...

During that conversation Walker mentioned that he thought that Java's Maven packaging system had been really successful with the idea of having a metadata file stored outside each package. That made it easier for the metadata format to evolve; there is no need change every package in a repository every time the metadata format is improved. It also allows convenient inspection of dependencies between packages without having to download/decompress those packages. (Maven also stores a copy of the metadata inside the package, but the authoritative/complete copy is considered the one outside the package).

So too in python and pypi. I have used it and it works fine. http://wiki.python.org/moin/PyPiXmlRpc?action=show&redirect=CheeseShopXmlRpc The problem is not that the functionality doesn't exist, just that users are expected to do an xmlrpc call to get it... (more logical would be to have a pypi module) David

David Lyon

1:22 p.m.

Hi Jeff, On Sat, 07 Nov 2009 06:04:15 -0600, Jeff Rush <jeff@taupro.com> wrote:

...

I keep reading and I keep hearing you and others saying this, but as someone who has never used CPAN, I'm not seeing the large number of specific implementable tasks that CPAN clearly has and PyPI clearly does not.

ok. Fair enough. Let me clear up the terminology. In perl, CPAN means packages + local perl + repository In python, pypi means repository. If people complain about pypi versus cpan, the complaint actually has nothing to do with pypi. What they are actually complaining about is the package management support (or lack thereof) that is built into python. Let me say it a different way... "Where can I find the pypi module in python?" Of course there is no pypi module in python. That's why this whole thing is just so confusing. If there was such a thing as a pypi module, then users would automatically use that to access pypi and download and install packages. But what python offers is: - setuptools - distribute - pip - distutils In perl, there's only one word for everything, cpan. Even though it's components exist as libraries and command interfaces on the local system.

...

Without handwaving please, in a technical sense what does CPAN have that is so wonderful other than the items mentioned so far of:

- buildbot on the server - mirrored. hierarchical servers - one widely known and accepted way of doing packaging

+ Easy to find Modules and command interfaces to interface to the repository that exist on the local machine. All carrying the matching cpan name.

...

If we implement those, if we implement PEP 381, in what specific way will we still fall *far short* as you suggest?

Pypi itself isn't a problem. Buildbots for packages shouldn't go on pypi itself in any case. They should go on an entirely seperate server. So, let me sum up. Pypi itself has no major external problem. It's short on a package buildbot and local python has no pypi module. Add a package buildbot and add a pypi package support and we're pretty much caught up. David

Kaelin Colclasure

6 Nov 6 Nov

11:02 p.m.

On Nov 6, 2009, at 2:14 PM, Alex Grönholm wrote:

...

Guido van Rossum kirjoitti:

...
I just found this comment on my blog. People have told me this in person too, so I believe it is real pain (even if the solution may be elusive and the suggested solutions may not work). But I don't know how to improve the world. Is the work on distutils-sig going to be enough? Or do we need some other kind of work in addition? Do we need more than PyPI?

--Guido

---------- Forwarded message ---------- From: dalloliogm <noreply-comment@blogger.com> Date: Fri, Nov 6, 2009 at 8:01 AM Subject: [Neopythonic] New comment on Python in the Scientific World. To: gvanrossum@gmail.com

dalloliogm has left a new comment on your post "Python in the Scientific World":

Python is suffering a lot in the scientific word, because it has not a CPAN-like repository.

PyPI is fine, but it is still far from the level of CPAN, CRAN, Bioconductor, etc..

Scientists who use programming usually have a lot of different interests and approaches, therefore it is really difficult to write a package that can be useful to everyone. Other programming language like Perl and R have repository-like structure which enable people to download packages easily, and to upload new ones and organize them withouth having to worry about having to integrate them into existing packages.

This is what is happening to biopython now: it is a monolitic package that it is supposed to work for any bioinformatic problem; but this is so general that to accomplish that you would need to add a lot of dependencies, to numpy, networkx, suds, any kind of library. However, since easy_install is not as ready yet as the counterparts in other languages, if the biopython developers add too many dependencies, nobody will be able to install it properly, and nobody will use it.

I for one did not understand the problem. What does CPAN have that PyPI doesn't? It is natural for packages (distributions, in distutils terms) to have dependencies on each other. Why is this a problem?

As both a CPAN contributor and a recent PyPI contributor I think I can speak to this: When I packaged my first Perl module for CPAN distribution, there was no question about how to go about it. There was a clear recipe and it worked without a hitch the first time I tried. When I recently packaged my first Python module for PyPI I was a bit dismayed to learn that easy_install is just one of several ways to disseminate Python code. The documentation is fragmented... The stuff I read on easy_install basically assumed you already knew all about distutils, and I eventually had to reorganize my code from a simple module to a package just to get a few non-code files to be pulled in by the install. A less persistent would-be contributor might have given up in frustration. I joined this mailing list instead, and got my package working with some kind assistance. :-) But CPAN was just easier to contribute to. -- Kaelin

...

...
Posted by dalloliogm to Neopythonic at November 6, 2009 8:01 AM

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

Tarek Ziadé

11:54 p.m.

2009/11/7 Kaelin Colclasure <kaelin@acm.org>: [...]

...

As both a CPAN contributor and a recent PyPI contributor I think I can speak to this:

When I packaged my first Perl module for CPAN distribution, there was no question about how to go about it. There was a clear recipe and it worked without a hitch the first time I tried.

When I recently packaged my first Python module for PyPI I was a bit dismayed to learn that easy_install is just one of several ways to disseminate Python code. The documentation is fragmented... The stuff I read on easy_install basically assumed you already knew all about distutils, and I eventually had to reorganize my code from a simple module to a package just to get a few non-code files to be pulled in by the install. A less persistent would-be contributor might have given up in frustration.

I joined this mailing list instead, and got my package working with some kind assistance. :-) But CPAN was just easier to contribute to.

What was in particular, the pieces missing in Distutils (doc+code) that made you use Setuptools (doc+code) ? ++ Tarek

Kaelin Colclasure

7 Nov 7 Nov

12:01 a.m.

On Nov 6, 2009, at 3:54 PM, Tarek Ziadé wrote:

...

2009/11/7 Kaelin Colclasure <kaelin@acm.org>: [...]

...
As both a CPAN contributor and a recent PyPI contributor I think I can speak to this:

When I packaged my first Perl module for CPAN distribution, there was no question about how to go about it. There was a clear recipe and it worked without a hitch the first time I tried.

When I recently packaged my first Python module for PyPI I was a bit dismayed to learn that easy_install is just one of several ways to disseminate Python code. The documentation is fragmented... The stuff I read on easy_install basically assumed you already knew all about distutils, and I eventually had to reorganize my code from a simple module to a package just to get a few non-code files to be pulled in by the install. A less persistent would-be contributor might have given up in frustration.

I joined this mailing list instead, and got my package working with some kind assistance. :-) But CPAN was just easier to contribute to.

What was in particular, the pieces missing in Distutils (doc+code) that made you use Setuptools (doc+code) ?

Since I bootstrapped the environment I used to learn Python with easy_install, I naturally went straight to the easy_install docs to learn how to give back. I wasn't even aware of this separate thing called distutils until I read about it in the easy_install (err setuptools) documentation (sic). -- Kaelin

Tarek Ziadé

1:06 a.m.

On Sat, Nov 7, 2009 at 1:01 AM, Kaelin Colclasure <kaelin@acm.org> wrote:

...

On Nov 6, 2009, at 3:54 PM, Tarek Ziadé wrote:

...
2009/11/7 Kaelin Colclasure <kaelin@acm.org>: [...]

...
As both a CPAN contributor and a recent PyPI contributor I think I can speak to this:

When I packaged my first Perl module for CPAN distribution, there was no question about how to go about it. There was a clear recipe and it worked without a hitch the first time I tried.

When I recently packaged my first Python module for PyPI I was a bit dismayed to learn that easy_install is just one of several ways to disseminate Python code. The documentation is fragmented... The stuff I read on easy_install basically assumed you already knew all about distutils, and I eventually had to reorganize my code from a simple module to a package just to get a few non-code files to be pulled in by the install. A less persistent would-be contributor might have given up in frustration.

I joined this mailing list instead, and got my package working with some kind assistance. :-) But CPAN was just easier to contribute to.

What was in particular, the pieces missing in Distutils (doc+code) that made you use Setuptools (doc+code) ?

Since I bootstrapped the environment I used to learn Python with easy_install, I naturally went straight to the easy_install docs to learn how to give back. I wasn't even aware of this separate thing called distutils until I read about it in the easy_install (err setuptools) documentation (sic).

Yes, that the great thing *and* the bad thing about Setuptools. It provided missing features and boostraps what people needed on the top of Distutils. But in the meantime, it makes it fuzzy for the end-user that there are two projects, and for him to understand what's from the Distutils project and what's from the Setuptools project. What is clear for us though, is that we need to change Distutils w.r.t. the Setuptools experience feedback. For the code part, PEP 376 is one important work. (http://www.python.org/dev/peps/pep-0376/), it enhances Distutils with a lot of ideas that were taken from Setuptools (with its author help), and also adds missing features we need (like an uninstall command). For the documentation part I am afraid it will be messy for the end users trying to package apps in Python *until* all PEPs have made it into Python. Although, as Ian Bicking says: we could write today some kind of all-in-one tutorial so end-users can work out without having to run after the documenation in several places. Tarek

Georg Brandl

1:56 p.m.

Tarek Ziadé schrieb:

...

For the documentation part I am afraid it will be messy for the end users trying to package apps in Python *until* all PEPs have made it into Python.

Although, as Ian Bicking says: we could write today some kind of all-in-one tutorial so end-users can work out without having to run after the documenation in several places.

Big +1, and I would be more than willing to include it in the standard Python documentation, *even* if it mostly describes setuptools/distribute/pip. When people want to package a library, they *will* look for docs in Python, and at the moment they only find the distutils reference. While the latter is necessary, a more howto-like standard document is much needed. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

Lennart Regebro

8:39 a.m.

2009/11/7 Kaelin Colclasure <kaelin@acm.org>:

...

Since I bootstrapped the environment I used to learn Python with easy_install, I naturally went straight to the easy_install docs to learn how to give back. I wasn't even aware of this separate thing called distutils until I read about it in the easy_install (err setuptools) documentation (sic).

Yes, this seems to be a reasonably way to realize that packaging is done with distutils. What was then complicated with the distutils docs? It's so long ago I did this the first time I don't remember if I found it difficult. -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64

Kaelin Colclasure

4:42 p.m.

On Nov 7, 2009, at 12:39 AM, Lennart Regebro wrote:

...

2009/11/7 Kaelin Colclasure <kaelin@acm.org>:

...
Since I bootstrapped the environment I used to learn Python with easy_install, I naturally went straight to the easy_install docs to learn how to give back. I wasn't even aware of this separate thing called distutils until I read about it in the easy_install (err setuptools) documentation (sic).

Yes, this seems to be a reasonably way to realize that packaging is done with distutils. What was then complicated with the distutils docs? It's so long ago I did this the first time I don't remember if I found it difficult.

The setuptools docs I read left me with the impression that distutils was more about building C extensions and that if my package was pure Python source (which it was) I should not need anything more than setuptools. And this did prove true, eventually -- but only after I restructured my source into a package to be more like setuptools wanted it. What wasted quite a bit of time along the way was that I found other examples of setup.py files that were using setuptools but were also using some underlying distutils conventions for packaging a single module. This approach wasn't mentioned in the setuptools docs but it *looked* more like what I was trying to do. But then, it turned out there was no simple way to use the single-module spec *and* get my static data installed along with the code. HTH, -- Kaelin

...

-- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64

Pauli Virtanen

12:03 p.m.

la, 2009-11-07 kello 00:14 +0200, Alex Grönholm kirjoitti: [clip: problems in distributing scientific Python packages]

...

I for one did not understand the problem. What does CPAN have that PyPI doesn't? It is natural for packages (distributions, in distutils terms) to have dependencies on each other. Why is this a problem?

Personally, I've not so much had trouble with PyPi, but with the rest of the toolchain. What's special with scientific software is that - They're usually not pure-Python - Need support for not only for C, but e.g. Fortran compilers - It may be necessary to build them on platforms where libraries etc. are in non-standard places - It may be useful to be able to build them with non-gcc compilers - They may need to ship more data files etc. than plain Python modules - Python is a newcomer on the scientific scene. Not all people want to spend time to spend on installation problems. Not all people are experienced Python users. So it may be more likely that the following things hurt in distributing these Python modules: 1. Incomplete documentation for distutils. For example, where can you find out what `package_data` option of setup() wants as the input? What if you have your package in src/packagename and data files under data/? What are the paths given to it relative to? The Distribute documentation is starting to look quite reasonable -- so documentation is becoming less of a problem. But it seems still to assume that the reader is familiar with distutils. 2. Magic. For example, what decides which files are included by sdist? It appears this depends on (i) what's in the autogenerated *.egg-info/SOURCES.txt (ii) whether you are using SVN and are using setuptools (iii) possible package_data etc. options, (iv) MANIFEST or maybe MANIFEST.in. IMHO, the system is too byzantine in ordinary matters, which increases the number of things you need to learn. 3. Many layers: distutils, setuptools, numpy.distutils. Numpy has its own distutils extensions, primarily for Fortran support. 4. Inflexibility. The toolchain is a bit inflexible: suppose you need to do something "custom" during the build, say, detect sizeof(long double) and add a #define to build options according to it. Finding out how to do this properly again takes time. 5. Distutils, and tools derived from it have bad failure modes. This hurts most with building extension modules. Given the many layers, and the fact that the build is driven by software that few really understand, it's difficult to understand and fix even simple errors encountered. Suppose a build fails, because your C or Fortran compiler gets passed a flag it doesn't like. How do you work around this? Suppose you have a library installed in a non-standard location. How do you tell distutils to look for it in the correct place? (The answer is to use "build_ext" command separately and pass it -L, but this is difficult to find out, as "build" does not accept -L.) The last one is in practice quite annoying, as given the heterogenous environments, it's not easy to make your package buildable on all possible platforms where people might want to use it. When people run into problems, they are stumped by the complexity of distutils. The above concerns only building packages -- perhaps there is more to say also about other parts. Also, I don't really have much experience with CPAN or CRAN, so I can't say how much Python is better or worse off here. -- Pauli Virtanen

David Lyon

6 Nov 6 Nov

10:52 p.m.

On Fri, 6 Nov 2009 09:53:44 -0800, Guido van Rossum <guido@python.org> wrote:

...

I just found this comment on my blog. People have told me this in person too, so I believe it is real pain (even if the solution may be elusive and the suggested solutions may not work).

Hi Guido, I'm a mere application developer - developing things for companies as they require. Sometimes in perl more often in python. I've worked in Perl with CPAN and yes I can attest to how good CPAN is to the application developer. Anything you need is there. It's easy to pull down stuff and load it in. CPan have an 'over-the-top' philosophy towards developers. It works.

...

Is the work on distutils-sig going to be enough?

It's not on the roadmap circulated thus far - but to answer the question probably not. At the moment, we don't have our target set towards working towards that. CPAN actually does so much, things that we could do, but are not. For example, if Fred the developer makes a package 'microscope- calibrate', and uploads, it will get tested and analysed on CPAN with a buildbot. We don't do that yet. So the packages on CPAN are typically of a higher quality, simply because they've been machine checked. I like that. Then, one the client machine, packages are so easy to install. We don't have anything quite like that with Python yet but PJE did get us one third of the way with setuptools. The skills to get totally up with CPAN in this regard are there collectively with us now. Not in any one individual, but as a group. No one individual can do it - but one individual can say - 'make it so'

...

Or do we need some other kind of work in addition? Do we need more than PyPI?

Yes we do. The Perl/CPAN developer experience is so smooth. May I say that the current pypi website is 100% as good as cpan. As a developer, I can't find any significant fault with pypi. Apart from not having a buildbot running in the background for package smoke testing. It's just when a developer goes to download and install a package that things get complicated with python. In an ideal world, a developer would just like to get eggs or packages off pypi, install them and go back to work. It would be fantastic for us to forget about what platform we have and forget about what python version we have. CPAN does that. The totally best thing would be to see a package on pypi, click on it and have it installed automatically there and then. The technical framework to do all this exists or is in close to working order. Some things need doing but I would regard the work as bridging rather than doing things from scratch. To me, the things Tarek is doing seem pretty sound and sane. We just need to have a 'developer experience' program or something along those lines to connect the dots and make things run more smoothly. Then having something as good as CPAN is totally realistic. David

Guido van Rossum

11:14 p.m.

On Fri, Nov 6, 2009 at 2:52 PM, David Lyon <david.lyon@preisshare.net> wrote:

...

So the packages on CPAN are typically of a higher quality, simply because they've been machine checked. I like that.

Speaking purely on hearsay, I don't believe that. In fact, I've heard plenty of laments about the complete lack of quality control on CPAN. -- --Guido van Rossum (python.org/~guido)

David Cournapeau

7 Nov 7 Nov

7:49 a.m.

Hi Guido, Guido van Rossum wrote:

...

On Fri, Nov 6, 2009 at 2:52 PM, David Lyon <david.lyon@preisshare.net> wrote:

...
So the packages on CPAN are typically of a higher quality, simply because they've been machine checked. I like that.

Speaking purely on hearsay, I don't believe that. In fact, I've heard plenty of laments about the complete lack of quality control on CPAN.

I cannot speak for CPAN, as I have never used it. But CRAN (which is CPAN for R) works much better that PyPI today in practice. I am not sure what exactly makes it work better, but it has the following properties, both technical and more 'social': - R is a niche language, and targets mostly scientists. It is a smaller community, more focused. They can push solutions more easily. - There is an extensive doc on how to develop R extensions (you can download a 130 pages pdf). - R packages are much more constraints: there is a standard source organization, which makes for a more consistant experience - There are regular checks of the packages (all the packages are daily checked on a build farm on fedora and debian). It also has a machine to check windows. http://cran.r-project.org/web/checks/check_summary.html http://cran.r-project.org/bin/windows/contrib/checkSummaryWin.html I am obviously quite excited by Snakebite potential here. Concerning distutils, I think it is important to improve it, but I think it is inherently flawed for serious and repeatable packaging. I have written a quite extensive article on it from my point of view as a numpy/scipy core developer and release manager (http://cournape.wordpress.com/2009/04/01/python-packaging-a-few-observations...), I won't rehearse it here, but basically: - distutils is too complex for simple packages, and too inflexible for complex ones. Adding new features to distutils is a painful experience. Even autotools with its mix of 100 000 lines autogenerated shell code, perl, m4 is more pleasant. - Most simple packages could be "buildable" from purely declarative description. This is important IMHO because it means they are simple to package by OS vendors, and you can more easily automate building and testing. - it is hard to interact with other build/distribution tools, which is sometimes needed. - distutils is too flexible for some things where it should not be(like specifying the content of sdist tarballs), and that makes it very difficult to repeat things in different environments. Contrary to other people, I don't think a successor to distutils is that hard to develop, especially since good designs in other languages already exist. It would take time, especially to get a transition story right, but I think it would be worthwhile. cheers, David

David Lyon

12:27 p.m.

New subject: People want CPAN :-) / New Build system

Hi David, I have to agree with you on every point. On Sat, 07 Nov 2009 16:49:12 +0900, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:

...

Contrary to other people, I don't think a successor to distutils is that hard to develop, especially since good designs in other languages already exist. It would take time, especially to get a transition story right, but I think it would be worthwhile.

...

From my current understanding, the keys files in any

That's just so controversial.. But since Guido has asked, lets just have some good old fashioned pragmatic software engineering discussion. Let's toss it around as pure numbers (for what you are suggesting): - how many lines of code would it take ? (500, 5000, 10000, 20000) ? - how much of the existing library could be used ? (10%, 25%, 50%,75%, 90%) ? - how many months would it take ? (3,6,9,12,18,36,72) ? - how many people would it require ? (1,2,3,5,10) ? - Could a PEP be started ? or co-author with somebody else ? Are there existing PEPs? Would a PEP for this have any chance of success? Could Tarek help with a PEP ? He must have some ideas on this.. - What's the risk ? I'd suggest we not call it a distutils replacement system. Lets call it a stripped down simplified package builder. To me, it's either the existing code from distutils that builds what we call a source package. Or it's a new piece of code that builds a 'standard' python package. The end result needs to be oriented to a non-technical audience - so we need to keep it pretty simple - not too many options. python package seem to be sources.txt and PKG_INFO. So it could be possible, imo, to prompt for metadata on the console, write it to a PKG_INFO, scan for files and write the sources.txt, and then archive the whole lot up using the standard library. That's ultra simplistic... perphaps too much so. But imo it's worth doing estimations on. David

Georg Brandl

1:54 p.m.

David Cournapeau schrieb:

...

Hi Guido,

Guido van Rossum wrote:

...
On Fri, Nov 6, 2009 at 2:52 PM, David Lyon <david.lyon@preisshare.net> wrote:

...
So the packages on CPAN are typically of a higher quality, simply because they've been machine checked. I like that.

Speaking purely on hearsay, I don't believe that. In fact, I've heard plenty of laments about the complete lack of quality control on CPAN.

One thing about CPAN (and Haskell's libraries on hackage) that I think many people see favorably, even though it's only superficial, is the more-or-less consistent hierarchical naming for the individual packages (or the contained modules in Haskell). Compared with that, the Python package namespace looks untidy.

...

I cannot speak for CPAN, as I have never used it. But CRAN (which is CPAN for R) works much better that PyPI today in practice. I am not sure what exactly makes it work better, but it has the following properties, both technical and more 'social': - R is a niche language, and targets mostly scientists. It is a smaller community, more focused. They can push solutions more easily. - There is an extensive doc on how to develop R extensions (you can download a 130 pages pdf).

Note that the downloadable distutils manual has 94 pages and *should* be enough to explain the basics of packaging. It has to be updated, of course, once the more advanced mechanisms are part of the core.

...

- R packages are much more constraints: there is a standard source organization, which makes for a more consistant experience - There are regular checks of the packages (all the packages are daily checked on a build farm on fedora and debian). It also has a machine to check windows.

http://cran.r-project.org/web/checks/check_summary.html http://cran.r-project.org/bin/windows/contrib/checkSummaryWin.html

I am obviously quite excited by Snakebite potential here.

Me too. Though it would be Snakebite + serious sandboxing.

...

Concerning distutils, I think it is important to improve it, but I think it is inherently flawed for serious and repeatable packaging. I have written a quite extensive article on it from my point of view as a numpy/scipy core developer and release manager (http://cournape.wordpress.com/2009/04/01/python-packaging-a-few-observations...),

What you're saying there about Cabal is exactly my experience. It is very nice to work with, and I've not yet seen a conceptual failure. But we're not far from that, with a static metadata file.

...

I won't rehearse it here, but basically: - distutils is too complex for simple packages, and too inflexible for complex ones. Adding new features to distutils is a painful experience. Even autotools with its mix of 100 000 lines autogenerated shell code, perl, m4 is more pleasant.

Really? I would have assumed that even writing a whole new distutils command/build step can't be more painful than adding the equivalent to an autotools-style build system, being Python after all. However, I've never done such a thing, so I have to believe you. Coming back to Cabal, do you know how easy it is to customize its build steps?

...

- Most simple packages could be "buildable" from purely declarative description. This is important IMHO because it means they are simple to package by OS vendors, and you can more easily automate building and testing.

That's what we're heading towards, I think. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

David Cournapeau

8 Nov 8 Nov

1:22 p.m.

Georg Brandl wrote:

...

One thing about CPAN (and Haskell's libraries on hackage) that I think many people see favorably, even though it's only superficial, is the more-or-less consistent hierarchical naming for the individual packages (or the contained modules in Haskell). Compared with that, the Python package namespace looks untidy.

That's true, but there is not much we can do on this one, so I did not mention it.

...

Note that the downloadable distutils manual has 94 pages and *should* be enough to explain the basics of packaging. It has to be updated, of course, once the more advanced mechanisms are part of the core.

The manual is too complicated for simple tasks, and not very useful for complex ones. Mostly because distutils does not follow the "only one way to do things" mantra. I can help to improve the distutils documentation for the build part, which is mostly undocumented (things like how to create a new command to build ctypes extensions, etc...).

...

Me too. Though it would be Snakebite + serious sandboxing.

Sandboxing is of course needed, but that's a known problem, and people have already thought hard about it. The open suse build system, albeit linux specific, works quite well, for example. For environment sandboxing, chroot works on all unix I know (including mac os x) - security is more challenging, I don't have any expertise there. Windows is more difficult to handle, though (maybe windows people know good sandboxing solutions outside full-blown vm).

...

What you're saying there about Cabal is exactly my experience. It is very nice to work with, and I've not yet seen a conceptual failure.

But we're not far from that, with a static metadata file.

Several people have claimed this so far, but I don't understand why - could you expand on this ? My impression is that the focus is mostly on version specification and install/build requirements in the static data, but to me that's a tiny detail. I want something like .cabal files, where you can specify documentation, data, source files, etc... Something like what I started to prototype there: http://github.com/cournape/toydist/ To take an example you are well familiar with, you can fully describe sphinx with it, and the conversion is mostly automatic. This is not even 500 LOC. With this kind of design, you can use different build systems on top of it (there is for example unpublished code in toydist to use a scons-based build system instead of distutils as currently done).

...

...
I won't rehearse it here, but basically: - distutils is too complex for simple packages, and too inflexible for complex ones. Adding new features to distutils is a painful experience. Even autotools with its mix of 100 000 lines autogenerated shell code, perl, m4 is more pleasant.

Really?

Sure, the perl/shell/awk/m4 mix is painful, but at least the result is reasonably robust, and can be extended.

...

I would have assumed that even writing a whole new distutils command/build step can't be more painful than adding the equivalent to an autotools-style build system, being Python after all. However, I've never done such a thing, so I have to believe you.

I expand on that, because I think few people understand the problem here, and that's maybe the main source of frustration for core numpy developers as far as distutils is concerned. True, writing your own command is easy. But it has many failure modes: - if you extend an existing command, you have to take care whether you run under setuptools or distutils (and soon distribute will make this worse). Those commands will not work the same when you run them under paver either. - the division in subcommands is painful, and the abstraction does not make much sense IMHO. Recently, I needed to access simple things like library filename (foo ->libfoo.a/foo.lib/etc..), install prefix. But those are not accessible to each command. The install prefix was particularly painful, it took me several hours to get it work right with distutils inplace, develop mode on all platforms. All this is trvially easy to get with autotools, scons or waf. Every new feature I needed to add to numpy.distutils was an unpleasant experience. I had to read the distutils sources (for every supported python version), run it on several platforms, and got it working by trial an error. - if you want to add a new source file extension, you have to rewrite the build_ext or build_src command and you often cannot reuse the base class methods. - etc... Also, the distutils code is horrible: you don't really know what's public and what's not, most attributes are added at runtime (and sometimes differ depending on the platform). Often, you get strange errors with the exception swallowed, and that happens only on some platforms for some users; in that case, the only way to debug it is to be able to run their platform. When you write extensions to distutils, this contributes to the whole unpleasant experience.

...

Coming back to Cabal, do you know how easy it is to customize its build steps?

No, I don't. I know you have to use makefile/autoconf for complex packages (for example, gtk wrapper for haskell does not use cabal AFAIK). But I think the only thing which matters is to have a basic simple with which you can interoperate with. It would not make sense to require from standard build/packaging system to support fortran or most of what we need in numpy. For example, I have written numscons to get away from distutils: it enables the use of scons for all the building part, and can build complicated packages such as numpy and scipy on many platforms (including windows) but we cannot use it as our main tool because you can't easily interoperate with distutils (to get sdist, bdist_wininst, etc... working). A system which would make this possible would already be great - such a system would be both simpler and more reliable than current distutils IMHO.

...

That's what we're heading towards, I think.

Guido wanted to know how scientific python people feel about the whole situation, and my own impression is that we are going further from what we need. I don't think anything based on distutils can help us. This is not to criticize Tarek, PJE and other people's work: I understand that distutils and setuptools solve a lot of problems for many people, and I may just be a minority. David

Tarek Ziadé

6:01 p.m.

On Sun, Nov 8, 2009 at 2:22 PM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> [..]

...

- if you extend an existing command, you have to take care whether you run under setuptools or distutils (and soon distribute will make this worse).

No, because unlike setuptools, we want to remove the patching that is done on the Command class and on the DIstribution class, and make Distribute a good Distutils citizen in that matter. IOW you won't suffer anymore from what you've described.

...

Those commands will not work the same when you run them under paver either. - the division in subcommands is painful, and the abstraction does not make much sense IMHO. Recently, I needed to access simple things like library filename (foo ->libfoo.a/foo.lib/etc..), install prefix. But those are not accessible to each command. [...] - if you want to add a new source file extension, you have to rewrite the build_ext or build_src command and you often cannot reuse the base class methods. - etc...

This is going to be changed because I am currently refactoring the build_ext command so such methods are in a base "compiler" class and eventually in util.py You are welcome to help in this refactoring,

...

Also, the distutils code is horrible: you don't really know what's public and what's not, most attributes are added at runtime (and sometimes differ depending on the platform). Often, you get strange errors with the exception swallowed, and that happens only on some platforms for some users; in that case, the only way to debug it is to be able to run their platform. When you write extensions to distutils, this contributes to the whole unpleasant experience.

I agree the code is not modern, but things are getting changed through small steps. Although, I don't buy the "strange errors" part and things getting swallowed :) [..]

...

Guido wanted to know how scientific python people feel about the whole situation, and my own impression is that we are going further from what we need. I don't think anything based on distutils can help us. This is not to criticize Tarek, PJE and other people's work: I understand that distutils and setuptools solve a lot of problems for many people, and I may just be a minority.

My opinion is that you've build something else when Distutils was not evolving anymore. This is not true anymore. It's moving again. And I think that the work that is going on is heading in the right direction, even for your use cases imho. If projects that maintain distutils patched versions push those patches now in the Python issue tracker, diffing against the current trunk, and if those patches make sense and are with tests, that's by far the *easiest* way to help improving Distutils. And that's the easiest work for me : I'll just review them and commit them if they do improve Distutils. Tarek

David Cournapeau

9 Nov 9 Nov

2:03 a.m.

Tarek Ziadé wrote:

...

On Sun, Nov 8, 2009 at 2:22 PM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> [..]

...
- if you extend an existing command, you have to take care whether you run under setuptools or distutils (and soon distribute will make this worse).

No, because unlike setuptools, we want to remove the patching that is done on the Command class and on the DIstribution class, and make Distribute a good Distutils citizen in that matter.

I don't think that's possible, not without major changes of the distutils design at least. Right now, in numpy.distutils, we extend the Distribution class to hold a few additional data. We inherit from setuptools or distutils depending on whether setuptools has been imported. The fundamental problem is not so much that setuptools does monkey-patching, but that you need to inherit to extend. Once you have several possible packages which inherit independently from those classes, you have to special case each of them if you want to support all of them.

...

This is going to be changed because I am currently refactoring the build_ext command so such methods are in a base "compiler" class and eventually in util.py

Refactoring cannot help, unless you fundamentally change how it works. I gave an example of a simple feature which is hard to do with current distutils: obtaining the install prefix early in the build scheme. When you do: python setup.py install --prefix=foo the build_* commands don't know prefix, only install knows it. In numpy, I added various hack each more ugly than the other to fake running the install command before running the build commands to obtain this information. I would be surprised if subsequent refactor of distutils will not break it.

...

I agree the code is not modern, but things are getting changed through small steps.

Although, I don't buy the "strange errors" part and things getting swallowed :)

I don't understand what's there to buy. Several people reported distutils errors without any backtrace, though a fair shared of those were caused by our own extensions. Concerning changing into small steps: I understand that changing gradually is often better than starting from scratch, but my own experience with numpy.distutils and distutils convinced me that there is not much to save from distutils code. I consider the API, the UI and the implementation deeply flawed. For me, an "improved backward compatible" distutils is an oxymoron. More practically, I think the recent distutils-related changes in 2.6.3 and 2.6.4 will keep happening. In numpy.distutils, we depend a lot on internal details of distutils, because we need things not available in the API, and I don't think we are alone. I would mention that numpy is one of the package which has the most invested in distutils in terms of code (sloccount tells me that numpy.distutils is 10315 LOC, whereas python 2.6.2 distutils is 10648 LOC), so I am certainly aware of the cost of breaking backward compatibility.

...

My opinion is that you've build something else when Distutils was not evolving anymore.

It has nothing to do with distutils evolving. We have had our own extensions for years (I am a relatively newcommer, but some numpy.distutils code goes back to 2001), so we could have done pretty much what we wanted. We now have two build systems: one based on distutils, one based on scons. Every-time I added a build feature to numpy, it took me much more time to add it to distutils than to the scons build. The code needed to support scons build is ~ 2000 lines of code, is more robust, handle dependencies automatically, support reliable parallel builds, and is more maintainable. cheers, David

Robert Kern

4:28 a.m.

David Cournapeau wrote:

...

I don't understand what's there to buy. Several people reported distutils errors without any backtrace, though a fair shared of those were caused by our own extensions.

distutils specifically swallows exceptions and formats them for users by default. After all, it is trying to behave like a regular command line program that interacts with users, not developers. This is easily overridable for developers who are trying debug problems by setting the environment variable DISTUTILS_DEBUG=1. This will make distutils just give the traceback. Is this what you are referring to? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

David Cournapeau

4:51 a.m.

Robert Kern wrote:

...

After all, it is trying to behave like a regular command line program that interacts with users, not developers. This is easily overridable for developers who are trying debug problems by setting the environment variable DISTUTILS_DEBUG=1. This will make distutils just give the traceback.

I did not know about that option, thanks. The last example I remember was python 2.6 breaking our mingw support: the raised exception was an empty message, so you got a "error:" as your error message when the exception was caught in setup. I am not sure it is better than a traceback: maybe distutils could separate between 'expected' and 'unexpected' exceptions to make this clearer. David

Ian Bicking

5:28 a.m.

On Sun, Nov 8, 2009 at 10:28 PM, Robert Kern <robert.kern@gmail.com> wrote:

...

David Cournapeau wrote:

...
I don't understand what's there to buy. Several people reported distutils errors without any backtrace, though a fair shared of those were caused by our own extensions.

distutils specifically swallows exceptions and formats them for users by default. After all, it is trying to behave like a regular command line program that interacts with users, not developers. This is easily overridable for developers who are trying debug problems by setting the environment variable DISTUTILS_DEBUG=1. This will make distutils just give the traceback.

In the tools I've written, I generally give the traceback if the verbosity is turned up, and in a case like this (an unexpected exception -- for distutils that's any exception except a few that distutils defines) I include "use -v for more". I think distutils (or Distribute) could do the same. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker

David Lyon

6:19 a.m.

On Sun, 8 Nov 2009 23:28:42 -0600, Ian Bicking <ianb@colorstudy.com> wrote:

...

In the tools I've written, I generally give the traceback if the verbosity is turned up, and in a case like this (an unexpected exception -- for distutils that's any exception except a few that distutils defines) I include "use -v for more".

I think distutils (or Distribute) could do the same.

Hi Ian, I think PIP is quite an accomplishment. But don't you think that its a big ask to refactor distutils/distribute to redo their error messages for package building? imo the basic problem in setup.py and package building comes from ambiguity in the file specification section. Everything else is pretty much ok. With the declarative format David C is hinting at (from Haskell) it should be a lot easier to collect those files up and put them in a .tar.gz archive. I'm just wondering if you've ever thought about the tool chain on the other side from pip. Like how hard is it to create archive files of source with all the right stuff. So I guess what's your take on how close to being right is David C? Best Regards David

Ben Finney

7:09 a.m.

David Lyon <david.lyon@preisshare.net> writes:

...

On Sun, 8 Nov 2009 23:28:42 -0600, Ian Bicking <ianb@colorstudy.com> wrote:

...
In the tools I've written, I generally give the traceback if the verbosity is turned up, and in a case like this (an unexpected exception -- for distutils that's any exception except a few that distutils defines) I include "use -v for more".

I think distutils (or Distribute) could do the same.

Hi Ian,

I think PIP is quite an accomplishment.

But don't you think that its a big ask to refactor distutils/distribute to redo their error messages for package building?

I've just had a read through the code for ‘pip’; AFAICT, the “redo the error messages for package building” essentially amounts to using the ‘logging’ module. Is that a big ask? -- \ “We have to go forth and crush every world view that doesn't | `\ believe in tolerance and free speech.” —David Brin | _o__) | Ben Finney

Ian Bicking

7:50 a.m.

On Mon, Nov 9, 2009 at 1:09 AM, Ben Finney <ben+python@benfinney.id.au> wrote:

...

David Lyon <david.lyon@preisshare.net> writes:

...
On Sun, 8 Nov 2009 23:28:42 -0600, Ian Bicking <ianb@colorstudy.com> wrote:

...
In the tools I've written, I generally give the traceback if the verbosity is turned up, and in a case like this (an unexpected exception -- for distutils that's any exception except a few that distutils defines) I include "use -v for more".

I think distutils (or Distribute) could do the same.

Hi Ian,

I think PIP is quite an accomplishment.

But don't you think that its a big ask to refactor distutils/distribute to redo their error messages for package building?

I've just had a read through the code for ‘pip’; AFAICT, the “redo the error messages for package building” essentially amounts to using the ‘logging’ module. Is that a big ask?

pip doesn't use the logging module, it uses its own logger, which is intended more for managing the output of a command-line program and not just post-mortem debugging. I don't think changing distutils to improve error output would be hard at all. It looks like there's a line in distutils.core that catches these exceptions (and doesn't look like it actually catches all exceptions?), and that can just be fixed. Another topic that has come up: I do agree subclassing makes it really hard to have multiple lines of development (specifically something like Setuptools or Distribute, along with ad hoc development in setup.py files). But I think that can be resolved. Perhaps, for instance, Distribute can have implementations of commands like build, that can easily be customized or extended without subclassing (e.g., by pre-build or post-build functions). I'd really be shocked if a rewrite of distutils was necessary, or even necessary to simplify things. -- Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker

David Cournapeau

8:17 a.m.

Ian Bicking wrote:

...

I don't think changing distutils to improve error output would be hard at all. It looks like there's a line in distutils.core that catches these exceptions (and doesn't look like it actually catches all exceptions?), and that can just be fixed.

I agree this is one of the thing which can be improved without unintended consequences.

...

Another topic that has come up: I do agree subclassing makes it really hard to have multiple lines of development (specifically something like Setuptools or Distribute, along with ad hoc development in setup.py files). But I think that can be resolved. Perhaps, for instance, Distribute can have implementations of commands like build, that can easily be customized or extended without subclassing (e.g., by pre-build or post-build functions).

In numpy's case, we subclass the Distribution class mainly to add new data which are shared within commands (for example, to build pure C libraries made available to 3rd parties, or to keep track of the scons scripts). I have myself tried the pre/post hooks for commands (for numpy.distutils), but I did not go very far: the problem was almost always linked to commands which need to share knowledge from each other. OTOH, I tried this when I started pocking into numpy.distutils 2 years ago, maybe I could go further today. Some things are fixable in distutils: for example, to build things, you should be able to get rid of the imperative operations, and have instead of registry of extension -> action (ala scons/waf). This would make adding new tools (cython, assembly, etc...) easier, as you would add tools through a registry instead of overriding build_ext as currently done by e.g. numpy.distutils or cython.distutils. Doing so while keeping backward compatibility would be hard, though.

...

I'd really be shocked if a rewrite of distutils was necessary, or even necessary to simplify things.

That's the opposite of my own experience. I think I have given several reasonable examples of shortcomings of distutils: I would be glad to hear that each of them has a simple solution which is backward-compatible in distutils. But another way to look at it is to ask: What is there in distutils that you would consider important to keep ? It is important that people who maintain packages do not have to rewrite their setup to a new hypothetical system: asking thousand of developers to rewrite their setup would be insane. But this can be done without being tied to distutils API. David

Tarek Ziadé

10:51 p.m.

On 11/9/09, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote: [...]

...

Some things are fixable in distutils: for example, to build things, you should be able to get rid of the imperative operations, and have instead of registry of extension -> action (ala scons/waf).

What is a registery of extension exactly ? Distutils let you register your own commands, you can use through the CLI. Can you provide more details ?

...

...
I'd really be shocked if a rewrite of distutils was necessary, or even necessary to simplify things.

That's the opposite of my own experience. I think I have given several reasonable examples of shortcomings of distutils: I would be glad to hear that each of them has a simple solution which is backward-compatible in distutils.

I am in for making Distutils evolve, but I need very precise real world use cases not saying that Distutils shouldn't do imperative operations). Last, I am not sure why you want only backward-compatible changes in distutils. There's no plan to keep backward-compatibility if breaking it makes DIstutils better. We will have pending deprecation warnings, that's all. Tarek

David Cournapeau

10 Nov 10 Nov

1:31 a.m.

Tarek Ziadé wrote:

...

What is a registery of extension exactly ? Distutils let you register your own commands, you can use through the CLI.

Can you provide more details ?

Sure. Right now, if you want to deal with a new source or a new target, you need to create a new command or override one. For example, cython has distutils extension which subclass build_ext, we in numpy do the same. This causes several issues: - cython build_ext subclass distutils command, we do the same. How to use both at the same time ? That's one typical example of the subclassing issue Ian Bicking mentioned, - in the case of compiled extensions, the basic structure is to build all object files for each extension from the compiler class (compile method). The compile method structure is: for obj in objects: if src == "bla": do_bla() elif src == "blabla": do_blabla() ... else: raise some exception for unrecognied extension. If you want to support a new source/target (say assembler), you need to deal at this stage (or use hacks to deal with the new file extension you want to deal with, and remove it later...). So you need to copy build_ext, you cannot extend it. If you look at msvccompiler vs msvc9compiler, both compile methods are almost the same and copied from each other. Now, if instead you have a dictionary {src_extension: callable}, you would be able to add a new tool and extend existing code without touching or copying anything. You could have syntactic sugar to have something like: @extension(".asm") def assemble(....): do_assembler_stuff() All the assembler-related thing would be in a separate module, and you would only need to register for the tool explicitly when you need it (every tool could be the same, and current distutils tools would then be registered automatically).

...

I am in for making Distutils evolve, but I need very precise real world use cases not saying that Distutils shouldn't do imperative operations).

I have given very explicit examples in this discussion - I have written them on the wiki last time it was discussed as requested: http://wiki.python.org/moin/Distutils/PluginSystem I don't think it is accurate to summarize my critic as a vague "do imperative operations".

...

Last, I am not sure why you want only backward-compatible changes in distutils.

There's no plan to keep backward-compatibility if breaking it makes DIstutils better. We will have pending deprecation warnings, that's all.

What is the intended policy: deprecate something in v1 and break it in v1+1 ? I am not sure this works well when you need to correct deep design issues which impact a lot of code (and this will be required). For example, I would prefer not changing numpy.distutils several times just to make a few API cleaner. I guess I don't see the point of breaking things while keeping the current distutils codebase. I would rather do the exact contrary: throw away the codebase, but keep current setup.py working through a conversion script. Things like sandboxing pypi and other things become much easier that way. cheers, David

Tarek Ziadé

6:17 p.m.

On Tue, Nov 10, 2009 at 2:31 AM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote: [..]

...

If you want to support a new source/target (say assembler), you need to deal at this stage (or use hacks to deal with the new file extension you want to deal with, and remove it later...). So you need to copy build_ext, you cannot extend it.

If you look at msvccompiler vs msvc9compiler, both compile methods are almost the same and copied from each other.

So here's a refactoring proposal : - build_ext doesn't handle a compiler instance anymore (therefore it doesn't have a "compile_type" anymore) - instead, the Extension class handles this compile type - right now we have a dict in ccompiler (compiler_class) that has all the compiler classes. let's make it extensible through a registery using distutils.cfg (like the commands) so: when build_ext is run, it compiles the Extensions by using this registery of compilers, and the type of compiler contained in each extension. It keeps instances in a pool to reuse them if needed/ [..]

...

I have given very explicit examples in this discussion - I have written them on the wiki last time it was discussed as requested:

http://wiki.python.org/moin/Distutils/PluginSystem

I don't think it is accurate to summarize my critic as a vague "do imperative operations".

Sorry about that, I lost track of this.

...

...
Last, I am not sure why you want only backward-compatible changes in distutils.

There's no plan to keep backward-compatibility if breaking it makes DIstutils better. We will have pending deprecation warnings, that's all.

What is the intended policy: deprecate something in v1 and break it in v1+1 ? I am not sure this works well when you need to correct deep design issues which impact a lot of code (and this will be required). For example, I would prefer not changing numpy.distutils several times just to make a few API cleaner.

You just don't replace a stdlib package with its v2 like that, there is a deprecation process to have. Distutils is not living only through setup.py. It has some public APIs imported and used by people. While some deep changes are required, if setting up a deprecation is possible, I will do it.

...

I guess I don't see the point of breaking things while keeping the current distutils codebase. I would rather do the exact contrary: throw away the codebase, but keep current setup.py working through a conversion script. Things like sandboxing pypi and other things become much easier that way.

-1 I'm hearing this sometimes. That's what I thought too when I started to work with Distutils a year and a half ago. And I couldn't understand Guido when he was saying that redoing it from scratch would be a mistake. Now I agree with him. There's a *lot* of knowledge in Distutils. Throwing it away means entering a new cycle that will last for several years. Small refactoring can be done today, as long as the code is tested. While the code is not yet as modern as other modules in the stdlib, I invite you to compare Python 2.4's distutils with the current trunk, and realize the amount of work I've been doing in the past year. My goal was to have a good coverage before starting any big refactoring, and to fix the old pending bugs 2 years ago, the code was barely tested (the coverage was 18%) and now it's well tested (80%+ depending on the platform). It's now almost fully PEP 8, and I am now more confident when I start any refactoring. I understand the pain with build_ext. let's work it out. Tarek

David Cournapeau

11 Nov 11 Nov

2:08 a.m.

Tarek Ziadé wrote:

...

On Tue, Nov 10, 2009 at 2:31 AM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote: [..]

...
If you want to support a new source/target (say assembler), you need to deal at this stage (or use hacks to deal with the new file extension you want to deal with, and remove it later...). So you need to copy build_ext, you cannot extend it.

If you look at msvccompiler vs msvc9compiler, both compile methods are almost the same and copied from each other.

So here's a refactoring proposal :

- build_ext doesn't handle a compiler instance anymore (therefore it doesn't have a "compile_type" anymore)

- instead, the Extension class handles this compile type

- right now we have a dict in ccompiler (compiler_class) that has all the compiler classes. let's make it extensible through a registery using distutils.cfg (like the commands)

so: when build_ext is run, it compiles the Extensions by using this registery of compilers, and the type of compiler contained in each extension. It keeps instances in a pool to reuse them if needed/

What is important here is how to add new tools, without touching or impacting distutils in many places. In particular, what distutils expects from a tool should be clearly defined (right now it is implementation defined). Adding cython or assembly support are good use cases to refactor this I think.

...

You just don't replace a stdlib package with its v2 like that, there is a deprecation process to have.

Distutils is not living only through setup.py. It has some public APIs imported and used by people.

I am aware about the usage of distutils: I don't think it has public API, though. It has functions that people use, but it is far from clear what is public from what is private. Many things we do numpy.distutils are clearly dependent on implementation details of distutils. The problem with moving forward distutils is not deprecation, the problem is when you break/remove the old API. At this point, you will break people's code who rely on the old API. If the changes are big enough (as they should be to warrant breaking the old API in the first place), it will require significant effort from distutils API users. So people will see this as a trade-off: does it worth my time to spend time to use the new version of distutils, what will it bring to me ? Without significant new features, this will be difficult.

...

-1

I'm hearing this sometimes. That's what I thought too when I started to work with Distutils a year and a half ago. And I couldn't understand Guido when he was saying that redoing it from scratch would be a mistake. Now I agree with him.

IIRC, the main argument was that a new system would mean giving up existing setup.py, but that can be worked around. I don't know whether Guido's opinion has changed since, but he wondered last year wether dropping backward compatibility was an option (http://www.opensubscriber.com/message/distutils-sig@python.org/10291496.html). Otherwise, a new system would look nothing like distutils. One of the main argument to avoid rewrite is that you will end up doing the same mistakes, and old code is more complicated because it has been tested. But here, we know what a good design is like as other languages have vastly superior solutions to this problem. As far as compilation is concerned at least, the distutils knowledge is vastly overblown. First, most of it comes from autoconf on unices. You have the MSVC tools knowledge, but that can be easily reused (in numscons, I added msvc support from the python 2.6 msvc9compiler, this was not difficult). Most other tools are rather obsolete - and if you break any API in distribute there, you will most likely lose them as well anyway (I am thinking about OS/2, Metrowerk kind of tools). Again, I don't mean to say that working on distribute is a mistake, or criticize what you do in any way. I just don't think it will solve any significant issue for the scientific python community. But enough "stop-energy": at this point, I will just shut up, and continue working on my own idea - if they make sense, the scientific community is big enough so that the solution could be used there only, at least for some time. cheers, David

Greg Ewing

6:14 a.m.

David Cournapeau wrote:

...

One of the main argument to avoid rewrite is that you will end up doing the same mistakes, and old code is more complicated because it has been tested. But here, we know what a good design is like as other languages have vastly superior solutions to this problem.

Also, it seems to me that in this case, the basic architecture of distutils is already so full of mistakes that there just isn't an incremental way of getting to a better place, especially given the requirement of not breaking any existing setup.py scripts together with the fact that the API of distutils effectively consists of its *entire implementation*. So while complete rewrites are best avoided *if possible*, I don't think we have a choice in this case. -- Greg

David Cournapeau

6:05 a.m.

Greg Ewing wrote:

...

Also, it seems to me that in this case, the basic architecture of distutils is already so full of mistakes that there just isn't an incremental way of getting to a better place, especially given the requirement of not breaking any existing setup.py scripts together with the fact that the API of distutils effectively consists of its *entire implementation*.

Exactly. The fact that we in numpy consider distutils backward compatibility not worth the cost, even though we are most likely the most tied up with distutils, is quite telling about the state of affairs IMHO. David

Tarek Ziadé

9:11 a.m.

On Wed, Nov 11, 2009 at 7:05 AM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote: [..]

...

Exactly. The fact that we in numpy consider distutils backward compatibility not worth the cost, even though we are most likely the most tied up with distutils, is quite telling about the state of affairs IMHO.

That doesn't prove Distutils can't evolve. That just shows that numpy worked on its side. If the Numpy projects made some refactoring/improvement, why the project didn't try to push it back in Distutils itself ?

Lennart Regebro

9:39 a.m.

To the people who wants a rewrite, two things need to be asked: 1. Do you think the new PEPs in development should be followed? In that case, what is the benefit of rewriting, instead of fixing? 2. When are you done? :-) I'm not being rude, but this is open source. There is no "Someone" that can rewrite distutils from scratch, it must be done by those who thinks it should be done. Those who think distutils should be rewritten from scratch need to sit down and do it. If nobody is willing to write the code, the code is not needed. -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64

ssteinerX＠gmail.com

12:36 p.m.

On Nov 11, 2009, at 4:39 AM, Lennart Regebro wrote:

...

If nobody is willing to write the code, the code is not needed.

I think it would be more accurate to say that nobody is deciding that the "need" is sufficient to invest the resources to fill it. There are lots of things that many people would be willing and able to write, that would fill a great "need", and that many people would benefit from, but that aren't getting written because there's no one willing to invest the time or money. In come cases people are both willing and able, but unable to invest the time due to the pressing need to make a living. I'd love to work on open source all day, but my wife and kids get cranky if they don't eat for a couple of days. People don't always agree on what is the "rational" need to fill, and sometimes what we, as programmers "need", and know would make us more productive, is not what the people controlling the pursestrings are willing to finance, even if it would ultimately benefit them financially. S

Lennart Regebro

12:38 p.m.

2009/11/11 ssteinerX@gmail.com <ssteinerx@gmail.com>:

...

On Nov 11, 2009, at 4:39 AM, Lennart Regebro wrote:

...
If nobody is willing to write the code, the code is not needed.

I think it would be more accurate to say that nobody is deciding that the "need" is sufficient to invest the resources to fill it.

More accurate, but longer and redundant. :) -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64

ssteinerX＠gmail.com

1:46 p.m.

On Nov 11, 2009, at 7:38 AM, Lennart Regebro wrote:

...

2009/11/11 ssteinerX@gmail.com <ssteinerx@gmail.com>:

...
On Nov 11, 2009, at 4:39 AM, Lennart Regebro wrote:

...
If nobody is willing to write the code, the code is not needed.

I think it would be more accurate to say that nobody is deciding that the "need" is sufficient to invest the resources to fill it.

More accurate, but longer and redundant. :)

I don't think it's redundant because a lot of "needs" in Python go unmet, not due to people's inability or unwillingness, but due to a lack of time/funding which amount to the same thing. S

Lennart Regebro

3:45 p.m.

2009/11/11 ssteinerX@gmail.com <ssteinerx@gmail.com>:

...

I don't think it's redundant because a lot of "needs" in Python go unmet, not due to people's inability or unwillingness, but due to a lack of time/funding which amount to the same thing.

People *want* a lot of things. But if they truly *need* it, they'll do it. -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64

David Cournapeau

9:46 a.m.

Tarek Ziadé wrote:

...

On Wed, Nov 11, 2009 at 7:05 AM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote: [..]

...
Exactly. The fact that we in numpy consider distutils backward compatibility not worth the cost, even though we are most likely the most tied up with distutils, is quite telling about the state of affairs IMHO.

That doesn't prove Distutils can't evolve.

No, but that's not the point I was trying to make: I meant that we consider the distutils API not to be a significant asset, and would be happy to throw away all numpy.distutils for a significantly better system. We now have two build systems in numpy (one based on scons): I think it takes me 5 to 10 times more effort on average to add a feature to the distutils-based compared to the scons system. There are some features I can not implement because I have not find a solution to it in distutils. There are arbitrary limitations like the inability to call commands directly from setup.py, retrieve global informations from setup.py, classes which behave differently on different platforms. Example: how to retrieve the install prefix in setup.py. You need a good understanding of distutils to understand why it is so complicated, and the example shows almost everything that's wrong in distutils design. Many expectations are undocumented, like what method/class can be called where and when. All this implicit behavior is part of the API, and that's not documented anywhere that I know of. Also, what happened in python 2.6.3/2.6.4 w.r.t. distribute has happened quite often for numpy.distutils, and I consider it inherent to distutils way of working.

...

If the Numpy projects made some refactoring/improvement, why the project didn't try to push it back in Distutils itself ?

They are not improvements or refactoring for most part. They are things quite specific to our needs: fortran support, support for our internal code generator, swig, f2py, etc... A few things could be useful to distutils, like manifest support for mingw for 2.6 and later, as well as basic autoconf-like tests (checking for functions, type, type size, etc...). We would be happy to contribute patches to Distribute if this is considered useful. cheers, David

Tarek Ziadé

10:25 a.m.

On Wed, Nov 11, 2009 at 10:46 AM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote: [..]

...

Example: how to retrieve the install prefix in setup.py. You need a good understanding of distutils to understand why it is so complicated, and the example shows almost everything that's wrong in distutils design. Many expectations are undocumented, like what method/class can be called where and when. All this implicit behavior is part of the API, and that's not documented anywhere that I know of.

Ok, so I read this example as a lack of documentation and the lack of clear APIs to get the installation paths. Also, notice that we are in the process of adding a new python module in the stdlib, called "sysconfig", that will contains all installation paths scheme for all supported platforms. I have a branch in the svn going on for that. Knowing where to install things and what is up with a given platform is a wider problem that Distutils in fact, it concerns site.py as well, and having a sysconfig module that handles this will help. I also expect it to help the work people from Jython, PyPy, and IronPython will do in the future.

...

Also, what happened in python 2.6.3/2.6.4 w.r.t. distribute has happened quite often for numpy.distutils, and I consider it inherent to distutils way of working.

I fully agree that this particular example is demonstrating that build_ext is suffering from a lack of API documentation. I am still waiting for you comment on the solution to remove the compile code from build_ext and have it at the Extension level, with an option to add new compilers in a more easier way.

...

...
If the Numpy projects made some refactoring/improvement, why the project didn't try to push it back in Distutils itself ?

They are not improvements or refactoring for most part. They are things quite specific to our needs: fortran support, support for our internal code generator, swig, f2py, etc... A few things could be useful to distutils, like manifest support for mingw for 2.6 and later, as well as basic autoconf-like tests (checking for functions, type, type size, etc...). We would be happy to contribute patches to Distribute if this is considered useful.

If there's someone from the Numpy project that can help in isolating patches againts Distutils trunk in our issue tracker, I'd be more than happy to reduce the gap between the two projects. Notice that this may have been done in the past, I didn't manage to review all the distutils bugs yet (some are 5 years old) Tarek

David Cournapeau

12:31 p.m.

New subject: People want CPAN

Tarek Ziad√© <ziade.tarek <at> gmail.com> writes:

...

Also, notice that we are in the process of adding a new python module in the stdlib, called "sysconfig", that will contains all installation paths scheme for all supported platforms.

I don't think we are talking about the same thing. If the pb was just missing API, we could have just added it to numpy.distutils. We override pretty much every command of distutils and setuptools anyway. I needed to obtain the install prefix, be it the default one, or the one set through --prefix, or the current directory for in place build/develop install. I need to know it when building C libraries, in build_clib. AFAIK, there is only one way to get the information: since the --prefix is only known once install.finalize_options() has run, you need to call the method from build_clib. Naive code could be something like class MyBuildClib(build_clib): def run(self): install_cmd = self.distribution.get_command_obj("install") if not install_cmd.finalized == 1: install_cmd.finalize_options() if self.inplace == 1: top = "." else: top = install_cmd.install_libbase # top is the top directory where libraries should be installed build_clib.run(self) It is already quite ugly, but it looks like it could work. It actually does in the simple case, like python setup.py install. But if you call python setup.py build_clib install --prefix=something, it does not. The error message is a typical example of distutils style: "error: must supply either prefix/exec- prefix/home or install-base/install-platbase -- not both". The only way I managed to make this work is to replace: install_cmd = self.distribution.get_command_obj("install")) by install_cmd = copy.copy(self.distribution.get_command_obj("install")) That's not an API problem, or a documentation problem. That's a fundamental issue because --prefix is only known from install, and when you start running commands in an order different than expected by distutils, you get weird errors (the above error is actually almost sensical if you really know distutils code). I passed over the fact that in some conditions that elude me ATM and are platform specific, install_libbase does not exist and raises an attribute error. If you have a configure / build / install like the vast majority of build systems out there, this problem disappears. I don't see how the problem can be fixed without touching how commands work. Moreover, that's a typical example where the only way to be robust is to check that every attribute you are using actually exist before. At that point, you really want to run back to m4, perl, autogenerated makefiles and shell programming :)

...

I am still waiting for you comment on the solution to remove the compile code from build_ext and have it at the Extension level, with an option to add new compilers in a more easier way.

I will try to document how scons does it. I think the basic idea could be reused in distutils.

...

If there's someone from the Numpy project that can help in isolating patches againts Distutils trunk in our issue tracker, I'd be more than happy to reduce the gap between the two projects.

If that was not clear, I am that guy. I have been the main maintainer of numpy.distutils and of numpy/scipy build infrastructure for some time now, David

Tarek Ziadé

1:16 p.m.

New subject: People want CPAN

On Wed, Nov 11, 2009 at 1:31 PM, David Cournapeau <cournape@gmail.com> wrote: [..]

...

AFAIK, there is only one way to get the information: since the --prefix is only known once install.finalize_options() has run, you need to call the method from build_clib.

Naive code could be something like

class MyBuildClib(build_clib): def run(self): install_cmd = self.distribution.get_command_obj("install") if not install_cmd.finalized == 1: install_cmd.finalize_options()

if self.inplace == 1: top = "." else: top = install_cmd.install_libbase # top is the top directory where libraries should be installed build_clib.run(self)

It is already quite ugly, but it looks like it could work. It actually does in the simple case, like python setup.py install. But if you call python setup.py build_clib install --prefix=something, it does not. The error message is a typical example of distutils style: "error: must supply either prefix/exec- prefix/home or install-base/install-platbase -- not both". The only way I managed to make this work is to replace:

Ouch, That's not to be done. Something is wrong with your build_clib design here. You are roughly calling "install" as a sub command. If you want to have install options for your command, and if your command is about "installing", it means that your command has to be a subcommand of "install". Those get called once the options passed to install have been finalized. IOW, build_clib is not a subcommand of install, so you have troubles. subcommands are: install_lib, install_headers, install_scripts, install_data, install_egg_info (and I agree that it's not simple to extend this list) But, why in the first place, do you need the install --prefix in a build command ? For build_clib, if you need to build something, it goes in the build_dir, or in place., and this is not impacted by the install command.

...

If you have a configure / build / install like the vast majority of build systems out there, this problem disappears. I don't see how the problem can be fixed without touching how commands work.

I fail to understand your demonstration. Commands that are in charge of the *build* have nothing to do with commands that are in charge of *installing* file in various places in the target system. So I fail to understand why build_clib interacts with install, and why it has to impact it howsoever (or vice-versa) Now, if we take the generic use case (even if I don't think it should be used in your case): "a simple way to share options amongst commands" As a matter of fact, there's a hackish way to perform this, by using the distribution instance as a placeholder for these "common" options, so several command can read/write it. (as opposed to local options and global options) without having to get the command that manage the option. But, at the end, since an option is either global, either specific to a command, I guess a simple API in the command class should be enough to avoid this hack: get_option(command_name, option_name) This is similar to getting the command, (instanciate it + finalize it if it doesn't exists yet) and return a finalized option.

...

Moreover, that's a typical example where the only way to be robust is to check that every attribute you are using actually exist before. At that point, you really want to run back to m4, perl, autogenerated makefiles and shell programming :)

...
I am still waiting for you comment on the solution to remove the compile code from build_ext and have it at the Extension level, with an option to add new compilers in a more easier way.

I will try to document how scons does it. I think the basic idea could be reused in distutils.

But, you didn't answer: if we add the ability to define a different compiler for each Extension, will it solve your use case ?

...

...
If there's someone from the Numpy project that can help in isolating patches againts Distutils trunk in our issue tracker, I'd be more than happy to reduce the gap between the two projects.

If that was not clear, I am that guy. I have been the main maintainer of numpy.distutils and of numpy/scipy build infrastructure for some time now,

If you are willing to spend some time in that, I am the guy who can commit your patches in Python. Regards Tarek

David Cournapeau

1:48 p.m.

New subject: People want CPAN

On Wed, Nov 11, 2009 at 10:16 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...

Ouch, That's not to be done. Something is wrong with your build_clib design here. You are roughly calling "install" as a sub command.

Yes, it is "wrong" from distutils POV, but there is no other solution to get the prefix. The problem is that prefix is available in install, not that I want to know the prefix when building.

...

If you want to have install options for your command, and if your command is about "installing",

I did not say my command was about installing - it does not install anything. To be complete, we do not use this in build_clib, but in build_src , to generate pkg-config-like files (build_src is a numpy.distutils-specific command to build source files, for example automatically generated wrappers around fortran libraries, and is typically run before any other build_ command)

...

For build_clib, if you need to build something, it goes in the build_dir, or in place., and this is not impacted by the install command.

That's exactly the problem. build + install description is too simplistic for complex builds: you often need to know install options at build time. You need to know some build options when building configuration files, you need to pass configuration options to other subsequent steps, etc...

...

This is similar to getting the command, (instanciate it + finalize it if it doesn't exists yet) and return a finalized option.

This does not solve the issue IMHO. Since we both seem to like thinking about use-cases, consider this use-case: you have a python package with a complex extension built with make (say you have a build_make command which calls a makefile). How do you do it ? How to communicate path informations, compiler options between make and distutils ? How to handle relinking (changing rpath at install time) ?

...

...
I will try to document how scons does it. I think the basic idea could be reused in distutils.

But, you didn't answer: if we add the ability to define a different compiler for each Extension, will it solve your use case ?

I did not answer because my answer will take time to accurately summarize tools problems in build tools. We have discussed quite a bit about those issues in scons as well and I want to be sure that my answer takes this into account as well.

...

If you are willing to spend some time in that, I am the guy who can commit your patches in Python.

I will try to prepare a couple of patches against the hg repo this WE, David

Tarek Ziadé

2:13 p.m.

New subject: People want CPAN

On Wed, Nov 11, 2009 at 2:48 PM, David Cournapeau <cournape@gmail.com> wrote:

...

...
If you want to have install options for your command, and if your command is about "installing",

I did not say my command was about installing - it does not install anything. To be complete, we do not use this in build_clib, but in build_src , to generate pkg-config-like files (build_src is a numpy.distutils-specific command to build source files, for example automatically generated wrappers around fortran libraries, and is typically run before any other build_ command)

But you call it with "install" in your example, meaning that is is called at install time, right ? Or it is just that you want to get the "--prefix" value finalized and computed by the install command. If it's the later, I guess you will be able to use the upcoming "sysconfig" module, that gives you the install schemes, depending on sys.prefix/sys.exec_prefix. And I will probably add a way to override those prefix, meaning that you will be able to get from your command all install paths depending on a root prefix. If not, it means that you are doing a post or pre install step during installation. e.g. like RPM's pre/post commits hooks.

...

...
For build_clib, if you need to build something, it goes in the build_dir, or in place., and this is not impacted by the install command.

That's exactly the problem. build + install description is too simplistic for complex builds: you often need to know install options at build time. You need to know some build options when building configuration files, you need to pass configuration options to other subsequent steps, etc...

Sorry, I can see it yet, it still fuzzy. Does that mean your binary distribution will not be relocatable ? e.g. meaning that once you have done your build for a given --prefix, it won't be installable anywhere else ? In that case you would need to remove the --prefix option from install, and have it on your build command. Sounds like a pre/post install hook but I am not sure.

...

...
This is similar to getting the command, (instanciate it + finalize it if it doesn't exists yet) and return a finalized option.

This does not solve the issue IMHO. Since we both seem to like thinking about use-cases, consider this use-case: you have a python package with a complex extension built with make (say you have a build_make command which calls a makefile). How do you do it ? How to communicate path informations, compiler options between make and distutils ? How to handle relinking (changing rpath at install time) ?

I don't know for the first part. I have to try it out. Can you provide me such an extension ? For the relinking at installation time problem, it is obvious that something can be done. we could have a pre/post install option where an arbitrary command could be launch, as an install subcommand. pre: works on the build_dir built by install or a previous build. post: works on the installed file list [..]

...

...
If you are willing to spend some time in that, I am the guy who can commit your patches in Python.

I will try to prepare a couple of patches against the hg repo this WE,

yeah, thanks ! \o/ Tarek

David Cournapeau

2:47 p.m.

New subject: People want CPAN

On Wed, Nov 11, 2009 at 11:13 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...

Or it is just that you want to get the "--prefix" value finalized and computed by the install command.

Yes.

...

If it's the later, I guess you will be able to use the upcoming "sysconfig" module, that gives you the install schemes, depending on sys.prefix/sys.exec_prefix.

Where is the sysconfig sources ? I don't see it in bitbucket.

...

Sorry, I can see it yet, it still fuzzy. Does that mean your binary distribution will not be relocatable ?

Relinking was just an example, but yes, the binary would not relocatable in that case (although you can actually build relocatable binaries through $ORIGIN, but we are not here to talk about advanced deployment issues of binaries). Just to be clear, I am not advocating distutils to do it or even implement it at all, just to make it possible.

...

...
...
This is similar to getting the command, (instanciate it + finalize it if it doesn't exists yet) and return a finalized option.

This does not solve the issue IMHO. Since we both seem to like thinking about use-cases, consider this use-case: you have a python package with a complex extension built with make (say you have a build_make command which calls a makefile). How do you do it ? How to communicate path informations, compiler options between make and distutils ? How to handle relinking (changing rpath at install time) ?

I don't know for the first part. I have to try it out. Can you provide me such an extension ?

Not for make, but I can try to port numpy.distutils.command.scons to distutils (or distribute). The current code is damn ugly (I did most of it when I started digging into distutils), but you can get an idea here: http://github.com/cournape/numpy/blob/master/numpy/distutils/command/scons.p... It calls scons, and you can thus build any C extension using scons. Now, both distutils and scons makes this difficult (in particular, there is no way to call scons from distutils, you need to launch scons executable). For me, one of the core idea of an improved distutils would be to make this much easier. All compilers options form distutils would be in simple data files with simple API, no objects, no class with countless methods and complex protocol. Distutils v2 would have a default "dumb" build tool, but you could use whatever tool instead if desired. David

Tarek Ziadé

3:47 p.m.

New subject: People want CPAN

On Wed, Nov 11, 2009 at 3:47 PM, David Cournapeau <cournape@gmail.com> wrote:

...

On Wed, Nov 11, 2009 at 11:13 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...
Or it is just that you want to get the "--prefix" value finalized and computed by the install command.

Yes.

Ok. What is obvious to me now is that the "install" command does to much. Knowing the paths is something useful for any command. So "sysconfig" will help I think.

...

...
If it's the later, I guess you will be able to use the upcoming "sysconfig" module, that gives you the install schemes, depending on sys.prefix/sys.exec_prefix.

Where is the sysconfig sources ? I don't see it in bitbucket.

that's in python's svn, in a tarek_sysconfig branch. It's a revamp of distutils/sysconfig.py with the schemes from distutils/command/install.py (work in progress) [..]

...

...
I don't know for the first part. I have to try it out. Can you provide me such an extension ?

Not for make, but I can try to port numpy.distutils.command.scons to distutils (or distribute). The current code is damn ugly (I did most of it when I started digging into distutils), but you can get an idea here:

http://github.com/cournape/numpy/blob/master/numpy/distutils/command/scons.p...

It calls scons, and you can thus build any C extension using scons. Now, both distutils and scons makes this difficult (in particular, there is no way to call scons from distutils, you need to launch scons executable).

I see. I'll take a look asap. Are you coming to Pycon btw ?

...

For me, one of the core idea of an improved distutils would be to make this much easier. All compilers options form distutils would be in simple data files with simple API, no objects, no class with countless methods and complex protocol. Distutils v2 would have a default "dumb" build tool, but you could use whatever tool instead if desired.

The default compiler class exists for that, it's CCompiler, and is mostly a placeholder for options. But it's C oriented. In my mind, implementing a new compiler for Distutils means overriding it, and implementing, mainly: - preprocess() - compile() - create_static_lib() - link() Now that's quite complex, and we could probably have a single method (compile) that would do the required work of compiling an extension the way it wants to. So, yes, being able to register an arbitrary compiler class, with arbitrary options passed through the Extension could make it simpler: setup( .. ext_modules=[Extension('Foo', files=['foo.d'], compiler='pyd')], ..) where "pyd" is the name of the compiler that knows how to compile D files; This compiler would do whatever it wants, as long as it is done in a .compile() method: .compile(name, files, *args, **kw) Tarek

Pauli Virtanen

8:22 p.m.

New subject: People want CPAN

ke, 2009-11-11 kello 16:47 +0100, Tarek Ziadé kirjoitti: [clip]

...

...
...
If it's the later, I guess you will be able to use the upcoming "sysconfig" module, that gives you the install schemes, depending on sys.prefix/sys.exec_prefix.

Where is the sysconfig sources ? I don't see it in bitbucket.

That's in python's svn, in a tarek_sysconfig branch. It's a revamp of distutils/sysconfig.py with the schemes from distutils/command/install.py (work in progress)

What if the user passes a different install prefix via python setup.py install --prefix=FOO I believe David would like to know FOO here. Since sysconf is not a part of distutils, will it know what FOO is? [clip]

...

...
For me, one of the core idea of an improved distutils would be to make this much easier. All compilers options form distutils would be in simple data files with simple API, no objects, no class with countless methods and complex protocol. Distutils v2 would have a default "dumb" build tool, but you could use whatever tool instead if desired.

The default compiler class exists for that, it's CCompiler, and is mostly a placeholder for options. But it's C oriented. In my mind, implementing a new compiler for Distutils means overriding it, and implementing, mainly:

- preprocess() - compile() - create_static_lib() - link()

Now that's quite complex, and we could probably have a single method (compile) that would do the required work of compiling an extension the way it wants to.

I think one question here is that how do the different compilers speak to each other? Consider the following chain needed to compile a Cython file linking to some Fortran and C files to a Python module: - cython foo.pyx -> foo.c - compile foo.c -> foo.o - compile bar.c -> bar.o - compile quux.f90 -> quux.o - link foo.o bar.o quux.o -> foo.so This is a completely possible use-case, so it would be good if distutils could handle it. Also, dependency handling would be nice here. Changing bar.c or foo.pyx should trigger relinking foo.so etc.

...

So, yes, being able to register an arbitrary compiler class, with arbitrary options passed through the Extension could make it simpler:

setup( .. ext_modules=[Extension('Foo', files=['foo.d'], compiler='pyd')], ..)

Does this work easily out when there are multiple source files (in different languages) for a single extension module? Also, the option of determining the necessary compiler and compiler options from file extensions does not always work. For example, Fortran 90 files come in two flavors, fixed-form and free-form, which often are not differentiated by the file extension. However, they may require different compiler flags to compile. (Hopefully, however, nobody uses the fixed-form for F90 any more...) Cheers, -- Pauli Virtanen

Tarek Ziadé

8:42 p.m.

New subject: People want CPAN

On Wed, Nov 11, 2009 at 9:22 PM, Pauli Virtanen <pav@iki.fi> wrote:

...

ke, 2009-11-11 kello 16:47 +0100, Tarek Ziadé kirjoitti: [clip]

...
...
...
If it's the later, I guess you will be able to use the upcoming "sysconfig" module, that gives you the install schemes, depending on sys.prefix/sys.exec_prefix.

Where is the sysconfig sources ? I don't see it in bitbucket.

That's in python's svn, in a tarek_sysconfig branch. It's a revamp of distutils/sysconfig.py with the schemes from distutils/command/install.py (work in progress)

What if the user passes a different install prefix via

python setup.py install --prefix=FOO

I believe David would like to know FOO here. Since sysconf is not a part of distutils, will it know what FOO is?

I am not sure he wants FOO, I think he wants all installation paths, that gets built by the install command with the provided "FOO" root prefix. that could be in pseudo code:

...

...
...
get_install_paths('FOO')

And that's the API we want to add in sysconfig, roughly. [..]

...

I think one question here is that how do the different compilers speak to each other?

Consider the following chain needed to compile a Cython file linking to some Fortran and C files to a Python module:

- cython foo.pyx -> foo.c - compile foo.c -> foo.o - compile bar.c -> bar.o - compile quux.f90 -> quux.o - link foo.o bar.o quux.o -> foo.so

This is a completely possible use-case, so it would be good if distutils could handle it. Also, dependency handling would be nice here. Changing bar.c or foo.pyx should trigger relinking foo.so etc.

...
So, yes, being able to register an arbitrary compiler class, with arbitrary options passed through the Extension could make it simpler:

setup( .. ext_modules=[Extension('Foo', files=['foo.d'], compiler='pyd')], ..)

Does this work easily out when there are multiple source files (in different languages) for a single extension module?

Do you mean, an Extension that would require several compilers ? I was thinking of a one-to-one relation between an Extension and a compiler type, even if there are are multiple source files (in different languages) for this extension. Meaning that *one* compiler would have to handle of those files. Does that fit the cython use case ?

...

Also, the option of determining the necessary compiler and compiler options from file extensions does not always work. For example, Fortran 90 files come in two flavors, fixed-form and free-form, which often are not differentiated by the file extension. However, they may require different compiler flags to compile. (Hopefully, however, nobody uses the fixed-form for F90 any more...)

How do they do then ? Is file extensions, inline options in setup.py, and the environ, are enough ? regards Tarek

Pauli Virtanen

10:49 p.m.

New subject: People want CPAN

ke, 2009-11-11 kello 21:42 +0100, Tarek Ziadé kirjoitti: [clip]

...

Do you mean, an Extension that would require several compilers ?

I was thinking of a one-to-one relation between an Extension and a compiler type, even if there are are multiple source files (in different languages) for this extension.

Meaning that *one* compiler would have to handle of those files. Does that fit the cython use case?

It might. The distutils Cython compiler would however need to be smart enough to handle all languages used in the other source files by itself, or somehow delegate this back to distutils (which still sounds doable). Also, the f2py compiler does similar things. Ditto for Pyrex. How much code duplication would be needed between these? Source file preprocessing / generation is one thing that also comes useful. For example, Numpy has its own templated C code generator file format, and also generates some of the source files automatically from scratch during the build. These are compiled and linked as a part of ordinary C extension module. Currently, as I understand it, numpy.distutils does source generation in a separate build_src command.

...

...
Also, the option of determining the necessary compiler and compiler options from file extensions does not always work. For example, Fortran 90 files come in two flavors, fixed-form and free-form, which often are not differentiated by the file extension. However, they may require different compiler flags to compile. (Hopefully, however, nobody uses the fixed-form for F90 any more...)

How do they do then? Is file extensions, inline options in setup.py, and the environ, are enough?

numpy.distutils's Fortran compiler class reads the file and tries to analyze whether it's fixed-form or not. And chooses appropriate compiler flags. This is sort of a special "worst case" scenario for compiler selection, of course. *** Just to throw some wild, perhaps obvious, and definitely unasked-for ideas in the air (especially as I can't promise I can give any sustained help here :/ ): I suppose one option would be to factor *everything* related to extension module building into build_ext (and abolish build_clib): the rest of distutils would just say to build_ext """ Here's some info about the environment: dict(python_lib_path=/usr/lib/..., optimize=yes/no, python_lib_name=..., python_includes=/usr/include/..., install_prefix=/usr/lib/python2.6, ... ..., python_extension=..., build_temp_dir=...) Look the rest up from sysconfig. Please build any extensions and data files you like, and tell the file and directory names where you placed them and where (relative paths) they should go. """ Information needed how to build each component would be passed to build_ext subcomponent directly from setup.py or from a config file. Now, perhaps it is already like this --- I don't really know the internals of distutils --- and the build subcomponent is insulated from the others. In any case, something like this could make refactoring the build system easier. I think this idea quickly boils down more or less to David's idea about a pluggable build system -- implementing a good one takes a lot of work, so it might make sense to refactor distutils so that it would be possible [1] to use some of the existing ones (scons, waf, whatever, heck, even autoconf+make). The *default* build system could be a simple one, and backwards compatible. Especially so, since it seems to me that building extension modules is orthogonal to what distutils and setuptools want to do -- package and install Python modules in a Python-specific way, not really worry about how to properly call different compilers on obscure platforms. Anyway, even in the case pluggability is a bad idea, refactoring the build system out from the rest of distutils might make sense. .. [1] possible and easy -- I understand numpyscons is a stab at the possible, but it sounds like it was not easy to do. -- Pauli Virtanen

Tarek Ziadé

15 Nov 15 Nov

12:31 a.m.

New subject: People want CPAN

On Wed, Nov 11, 2009 at 11:49 PM, Pauli Virtanen <pav@iki.fi> wrote: [..]

...

Just to throw some wild, perhaps obvious, and definitely unasked-for ideas in the air (especially as I can't promise I can give any sustained help here :/ ):

I suppose one option would be to factor *everything* related to extension module building into build_ext (and abolish build_clib): the rest of distutils would just say to build_ext

""" Here's some info about the environment:

dict(python_lib_path=/usr/lib/..., optimize=yes/no, python_lib_name=..., python_includes=/usr/include/..., install_prefix=/usr/lib/python2.6, ... ..., python_extension=..., build_temp_dir=...)

Look the rest up from sysconfig.

Please build any extensions and data files you like, and tell the file and directory names where you placed them and where (relative paths) they should go. """

I ought to be something like that, but what is unclear to me is how to describe which compiler to use for what files. I had this "one extension == one compiler type" pattern in my head, but it seems more complex than that. IOW an extension can invoke several compilers and tools to be built. So "one extension == one extension builder" might best describe it. and I am wondering if we can't define a simple interface for these extension builders, from the simplest case (one tool uses one compiler) to the weirdest one (one tool uses a complex toolchain to create the extension) So at the end we would have: - Extension (the existing extension class, that takes a "extension_builder_type") - ExtensionBuilder (class in charge of creating an extension) - a registery for ExtensionBuilder subclasses And have the community create new ExtensionBuilder subclasses that could be registered like command. build_ext would then become an empty shell, just in charge of looping through the extensions, so each extension invokes its builder. [..]

...

I think this idea quickly boils down more or less to David's idea about a pluggable build system -- implementing a good one takes a lot of work, so it might make sense to refactor distutils so that it would be possible [1] to use some of the existing ones (scons, waf, whatever, heck, even autoconf+make). The *default* build system could be a simple one, and backwards compatible. Especially so, since it seems to me that building extension modules is orthogonal to what distutils and setuptools want to do -- package and install Python modules in a Python-specific way, not really worry about how to properly call different compilers on obscure platforms.

Anyway, even in the case pluggability is a bad idea, refactoring the build system out from the rest of distutils might make sense.

Agreed. it seems that the addition of the "configure" command, and the refactoring of the "build_ext" one, are the right things to do, together with the addition of the "sysconfig" stdlib module (which allows configure to get more info that what distutils.sysconfig provides) Now, if we can take back the work done in 4suite as suggested, or in scons, etc, that's even better. Tarek

Greg Ewing

6:24 a.m.

New subject: People want CPAN

Tarek Ziadé wrote:

...

And have the community create new ExtensionBuilder subclasses that could be registered like command.

I don't see a need for registering anything. You should just be able to explicitly say what tool to use for each stage of the process. I envisage something like this: from distutils import Extension, CCompile from pyrex.distutils import PyrexCompile foo_ext = Extension("foo", CCompile( PyrexCompile("foo.pyx"), "somelib.c")) Here Extension, CCompile and PyrexCompile are constructors for dependency graph nodes. Their responsibilities are: Extension -- takes compiled object files and libraries and links them into a Python extension. CCompile -- takes C source files and turns them into object files. PyrexCompile -- takes Pyrex source files and turns them into C source files. They would of course also take other relevant arguments such as compiler flags, search paths, etc. -- Greg

Tarek Ziadé

4:22 p.m.

New subject: People want CPAN

On Sun, Nov 15, 2009 at 7:24 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

Tarek Ziadé wrote:

...
And have the community create new ExtensionBuilder subclasses that could be registered like command.

I don't see a need for registering anything. You should just be able to explicitly say what tool to use for each stage of the process.

I envisage something like this:

from distutils import Extension, CCompile from pyrex.distutils import PyrexCompile

foo_ext = Extension("foo", CCompile( PyrexCompile("foo.pyx"), "somelib.c"))

Here Extension, CCompile and PyrexCompile are constructors for dependency graph nodes. Their responsibilities are:

Extension -- takes compiled object files and libraries and links them into a Python extension.

CCompile -- takes C source files and turns them into object files.

PyrexCompile -- takes Pyrex source files and turns them into C source files.

They would of course also take other relevant arguments such as compiler flags, search paths, etc.

The advantage of the registery is that a project can provide a compiler type, let's say "Pyrex". Then you can use in your own project setup.py this compiler without explicitely importing something. But the result is similar, and explicit imports should work too, so maybe registeries are just sugar on the top of something we first need to make work. Tarek

Greg Ewing

9:33 p.m.

New subject: People want CPAN

Tarek Ziadé wrote:

...

But the result is similar, and explicit imports should work too, so maybe registeries are just sugar on the top of something we first need to make work.

It's completely unnecessary sugar, if you ask me. I don't see what's bad about importing functionality you want to use. Where and how do you intend the registration to happen, anyway? Would it be done by the setup.py script? In that case I don't see how it saves you anything, since you would have to first import the thing you want to register anyway. Or are you envisaging that Pyrex or whatever tool is involved would somehow patch itself into distutils when you install it? I don't like that idea much, since it smacks of the kind of monkeypatching that setuptools is reviled for. -- Greg

Tarek Ziadé

10:22 p.m.

New subject: People want CPAN

On Sun, Nov 15, 2009 at 10:33 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

Tarek Ziadé wrote:

...
But the result is similar, and explicit imports should work too, so maybe registeries are just sugar on the top of something we first need to make work.

It's completely unnecessary sugar, if you ask me. I don't see what's bad about importing functionality you want to use.

Where and how do you intend the registration to happen, anyway? Would it be done by the setup.py script? In that case I don't see how it saves you anything, since you would have to first import the thing you want to register anyway.

Or are you envisaging that Pyrex or whatever tool is involved would somehow patch itself into distutils when you install it? I don't like that idea much, since it smacks of the kind of monkeypatching that setuptools is reviled for.

Patching ? No, I was thinking about a basic plugin registery, exactly like what we have *now* for commands with distutils.cfg, which is a simple configparser file where you can point packages that contains commands, so they are loaded when Distutils is run. (that's the "command-packages" option) So, using the same technique, we can explicitely list in such .cfg what are the compilers and where they are: [compilers] pyrex=pyrex.distutils:PyrexCompile Tarek

David Cournapeau

11 Nov 11 Nov

11:03 p.m.

New subject: People want CPAN

On Thu, Nov 12, 2009 at 5:42 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...

I am not sure he wants FOO, I think he wants all installation paths, that gets built by the install command with the provided "FOO" root prefix.

that could be in pseudo code:

...
...
...
get_install_paths('FOO')

And that's the API we want to add in sysconfig, roughly.

That does not solve the problem about getting FOO in the first place when calling get_install_path. Having a function which gives you all the build directories as well as the install paths would already be great, though. Right now, for numscons, I needed to reimplement all the logic, with the different paths in different modes (inplace vs standard, Release vs Debug on windows, etc...). Some of them are even python version dependent IIRC.

...

Do you mean, an Extension that would require several compilers ?

I was thinking of a one-to-one relation between an Extension and a compiler type, even if there are are multiple source files (in different languages) for this extension.

This cannot work - you need different tools at different stages. I don't think any imperative method can work here. Note also that extension alone is not enough to trigger the right tool: it should be overridable on a per extension-basis (actually, it should be overridable on a per source basis if needed). I think compiler class and the likes should simply be removed of the picture here. You need tasks to transform a source into a target, and tasks would use compiler configuration. There should not be any objects/classes for compilers, it it not flexible enough. Although the details differ in significant ways, both waf and scons use strings for command lines (waf "compile them" for performance reason), consisting of different parts which can be altered at will. In scons, the task to compile C code is something like $CC $CFLAGS $CPPDEFINES $CPPPATH -c $SOURCE $TARGET You need to be able to control those variable content in a very fine grained manner: prepending and appending may lead to different compiler behavior, for example. This is especially important when linking, where the wrong order may be the different between a working extension and a crashing extension. You cannot obtain this with classes and objects (especially when you start talking about performance: thousand of source files for one extension is not a crazy usercase). David

Tarek Ziadé

11:48 p.m.

New subject: People want CPAN

On Thu, Nov 12, 2009 at 12:03 AM, David Cournapeau <cournape@gmail.com> wrote:

...

On Thu, Nov 12, 2009 at 5:42 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...
I am not sure he wants FOO, I think he wants all installation paths, that gets built by the install command with the provided "FOO" root prefix.

that could be in pseudo code:

...
...
...
get_install_paths('FOO')

And that's the API we want to add in sysconfig, roughly.

That does not solve the problem about getting FOO in the first place when calling get_install_path.

Why that ? where "FOO" comes from ? if it's an option you provide at build time like you said, earlier, you just pass it to the API to get the paths. FOO is not coming from nowhere... [..]

...

...
Do you mean, an Extension that would require several compilers ?

I was thinking of a one-to-one relation between an Extension and a compiler type, even if there are are multiple source files (in different languages) for this extension.

This cannot work - you need different tools at different stages. I don't think any imperative method can work here. Note also that extension alone is not enough to trigger the right tool: it should be overridable on a per extension-basis (actually, it should be overridable on a per source basis if needed).

I think compiler class and the likes should simply be removed of the picture here. You need tasks to transform a source into a target, and tasks would use compiler configuration. There should not be any objects/classes for compilers, it it not flexible enough. Although the details differ in significant ways, both waf and scons use strings for command lines (waf "compile them" for performance reason), consisting of different parts which can be altered at will. In scons, the task to compile C code is something like

$CC $CFLAGS $CPPDEFINES $CPPPATH -c $SOURCE $TARGET

You need to be able to control those variable content in a very fine grained manner: prepending and appending may lead to different compiler behavior, for example. This is especially important when linking, where the wrong order may be the different between a working extension and a crashing extension.

You cannot obtain this with classes and objects (especially when you start talking about performance: thousand of source files for one extension is not a crazy usercase).

Sorry, I am getting confused here. This is getting all mixed up. What OOP has to do with performance in the first place ? OOP is useful to describe patterns and resuability, and that's what Distutils does. OOP is not a bottleneck here for speed. (everything is an object in Python anyways) Now we are saying that the compiler pattern in Distutils doesn't fit some requirement, fine, let's see what other design we can have. But that will be based on OOP with classes, and objects, and there ought to be an object that orchestrates the building of one given extension at the end. That might be the existing Extension class. This class is instanciated with a name for the extension, and a list of files. Here's a proposal to restrict the scope: let's drop the concept of "compiler class" and let's work only with the Extension class. This class is used by a distutils command to build a given extension. Let's say it is standalone, and it has a unique "build()" method. Could we take it from here and try to prototype this Extension class ? Tarek

Greg Ewing

12 Nov 12 Nov

1:48 a.m.

New subject: People want CPAN

Tarek Ziadé wrote:

...

On Thu, Nov 12, 2009 at 12:03 AM, David Cournapeau <cournape@gmail.com> wrote:

...

...
You cannot obtain this with classes and objects (especially when you start talking about performance: thousand of source files for one extension is not a crazy usercase).

Sorry, I am getting confused here. This is getting all mixed up. What OOP has to do with performance in the first place ?

As far as I can tell David seems to be saying that instantiating a class for every file in the system would be too much overhead. I'm not convinced about that -- make builds a dependency graph with a node for every target before it goes to work, and I think scons does something similar (although I might be wrong, I haven't looked at scons very closely yet).

...

...
$CC $CFLAGS $CPPDEFINES $CPPPATH -c $SOURCE $TARGET

You need to be able to control those variable content in a very fine grained manner: prepending and appending may lead to different compiler behavior

It's true that the current distutils compiler classes are lacking in flexibility here -- there's an option for "extra flags", for example, but it sticks them all at the end of the command. Sometimes that's wrong -- I've had trouble trying to use it for MacOSX -F arguments, for example. But that's not to say that a class couldn't be devised that allowed the required flexibility without degenerating into just a string with textual substitutions. -- Greg

David Cournapeau

4:30 a.m.

New subject: People want CPAN

Greg Ewing wrote:

...

As far as I can tell David seems to be saying that instantiating a class for every file in the system would be too much overhead.

I'm not convinced about that -- make builds a dependency graph with a node for every target before it goes to work, and I think scons does something similar

Yes, and scons has scalability problems because of this (both from CPU and memory POV). Waf has also an object per node, but it is aggressively optimized. If you are interested in the details, I can point you to the corresponding discussions in both scons and waf, where the main designers were involved.

...

But that's not to say that a class couldn't be devised that allowed the required flexibility without degenerating into just a string with textual substitutions.

I am not saying that's impossible, but both waf an scons use string susbtitution to do it (they do it differently though). So you would have to find a good reason not to do it IMHO. cheers, David

Tarek Ziadé

9:54 a.m.

New subject: People want CPAN

On Thu, Nov 12, 2009 at 5:30 AM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:

...

Greg Ewing wrote:

...
As far as I can tell David seems to be saying that instantiating a class for every file in the system would be too much overhead.

I'm not convinced about that -- make builds a dependency graph with a node for every target before it goes to work, and I think scons does something similar

Yes, and scons has scalability problems because of this (both from CPU and memory POV).

But you were saying in the discussion that Disutils has a *design* problem because it uses OOP... Anyways, let's not drop the other part of this thread:

...

...
...
...
...
get_install_paths('FOO')

And that's the API we want to add in sysconfig, roughly.

That does not solve the problem about getting FOO in the first place when calling get_install_path.

Why that ? where "FOO" comes from ? if it's an option you provide at build time like you said, earlier, you just pass it to the API to get the paths. FOO is not coming from nowhere...

Pauli Virtanen

11:31 a.m.

New subject: People want CPAN

Thu, 12 Nov 2009 10:54:41 +0100, Tarek Ziadé wrote: [clip]

...

...
...
...
...
...
get_install_paths('FOO')

And that's the API we want to add in sysconfig, roughly.

That does not solve the problem about getting FOO in the first place when calling get_install_path.

Why that ? where "FOO" comes from ? if it's an option you provide at build time like you said, earlier, you just pass it to the API to get the paths.

FOO is not coming from nowhere...

If this is not painfully clear already, the user passes FOO to the command python setup.py install --prefix=FOO Now, clearly the distutils install subcommand knows what FOO is. But does build_ext? Does sysconfig?

Tarek Ziadé

10:12 p.m.

New subject: People want CPAN

On Thu, Nov 12, 2009 at 12:31 PM, Pauli Virtanen <pav+sp@iki.fi> wrote:

...

Thu, 12 Nov 2009 10:54:41 +0100, Tarek Ziadé wrote: [clip]

...
...
...
...
...
> get_install_paths('FOO')

And that's the API we want to add in sysconfig, roughly.

That does not solve the problem about getting FOO in the first place when calling get_install_path.

Why that ? where "FOO" comes from ? if it's an option you provide at build time like you said, earlier, you just pass it to the API to get the paths.

FOO is not coming from nowhere...

If this is not painfully clear already, the user passes FOO to the command

python setup.py install --prefix=FOO

Now, clearly the distutils install subcommand knows what FOO is. But does build_ext? Does sysconfig?

The install command takes FOO, and build several paths with it: /FOO/lib/site-package/ /FOO/xxx The other commands that needs to get the same path can rebuild it using an API, that does: get_paths(scheme, vars={'prefix': 'FOO') Instead of doing what David did: $ python setup.py build_ext install --prefix=FOO They can do: $ python setup.py build_ext --prefix=FOO and don't require to use the install command anymore to get these paths cooked. Remember that the build step has nothing to do with the install step, and David in his example was doing a build that was not relocatable, e.g. receiving the install paths and applying them in the binaries. So instead of having this paths created by code in the install command, they are moved in an API than can be used by install or by any other command. Is that clearer ? Tarek

David Cournapeau

10:45 p.m.

New subject: People want CPAN

On Fri, Nov 13, 2009 at 7:12 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...

On Thu, Nov 12, 2009 at 12:31 PM, Pauli Virtanen <pav+sp@iki.fi> wrote:

...
Thu, 12 Nov 2009 10:54:41 +0100, Tarek Ziadé wrote: [clip]

...
...
...
...
>> get_install_paths('FOO')

And that's the API we want to add in sysconfig, roughly.

That does not solve the problem about getting FOO in the first place when calling get_install_path.

Why that ? where "FOO" comes from ? if it's an option you provide at build time like you said, earlier, you just pass it to the API to get the paths.

FOO is not coming from nowhere...

If this is not painfully clear already, the user passes FOO to the command

python setup.py install --prefix=FOO

Now, clearly the distutils install subcommand knows what FOO is. But does build_ext? Does sysconfig?

The install command takes FOO, and build several paths with it:

/FOO/lib/site-package/ /FOO/xxx

The other commands that needs to get the same path can rebuild it using an API, that does:

get_paths(scheme, vars={'prefix': 'FOO')

Instead of doing what David did:

$ python setup.py build_ext install --prefix=FOO

They can do:

$ python setup.py build_ext --prefix=FOO

and don't require to use the install command anymore to get these paths cooked.

I think I was confusing with my rpath example, which may be the source of our misunderstanding. I don't want to add a --prefix option to build_ext. I want to support the following user cases: python setup.py build # here, prefix would be whatever default, as available from sysconfig python setup.py install --prefix=foo # here, prefix would be the one as computer by install command python setup.py build_ext -i # here prefix is the current directory Requiring users to handle new options of commands is impractical IMHO, and a prefix option to build has a strange feeling to it.

...

Remember that the build step has nothing to do with the install step

Ideally, what we want here is like autoconf does it. You have a configure step, and then at build/install stages, you have access to *all* options. Those options can be customized at build through usual make mechanism, but this is taken into account as well without the makefile writer having to care. IOW, we need to know *all* the finalized options *before* the first build command is even run. David

Tarek Ziadé

11:47 p.m.

New subject: People want CPAN

On Thu, Nov 12, 2009 at 11:45 PM, David Cournapeau <cournape@gmail.com> wrote: [..]

...

I think I was confusing with my rpath example, which may be the source of our misunderstanding. I don't want to add a --prefix option to build_ext. I want to support the following user cases:

python setup.py build # here, prefix would be whatever default, as available from sysconfig python setup.py install --prefix=foo # here, prefix would be the one as computer by install command python setup.py build_ext -i # here prefix is the current directory

Requiring users to handle new options of commands is impractical IMHO, and a prefix option to build has a strange feeling to it.

I am not sure to follow here: let's forget about your example where you call build_src and install together. -> in the real world, how a --prefix passed to install is going to impact a build command and vice-versa ? install just copy files, where it's told to copy them, and build does some work with some options, and as a matter of fact you seem to use installation prefix at this stage. So, if you are not *installing*, it doesn't make sense to call the *install* command, and build could have its --prefix option in your case.

...

...
Remember that the build step has nothing to do with the install step

Ideally, what we want here is like autoconf does it. You have a configure step, and then at build/install stages, you have access to *all* options. Those options can be customized at build through usual make mechanism, but this is taken into account as well without the makefile writer having to care.

IOW, we need to know *all* the finalized options *before* the first build command is even run.

So, IOW, these options cannot be finalized by *any* command. So the proposal I've made to have global options that are common to all commands, and that can be used to create an execution environment through public APIs would help in there. So if the install command transforms "prefix" into "path1" "path2" and "path3", we need this transformation to occur outside install, so it can be done before the commands are run. that is: $ python setup.py --prefix=foo cmd1 cmd1 etc and the result would be in Distribution.options = {'path1': xxx, 'path2': xx} Tarek

David Cournapeau

13 Nov 13 Nov

12:08 a.m.

New subject: People want CPAN

On Fri, Nov 13, 2009 at 8:47 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...

I am not sure to follow here: let's forget about your example where you call build_src and install together.

-> in the real world, how a --prefix passed to install is going to impact a build command and vice-versa ?

In many cases. I gave the rpath example, but there is also the pkg-config-like files generation example, etc... It is not possible to foresee usage of this in a build tool, any sufficiently complex project will need to share those options, and different ones depending on the tool. I think it is a uncontroversial design decision followed by almost every build tool.

...

So, if you are not *installing*, it doesn't make sense to call the *install* command, and build could have its --prefix option in your case.

I don't *want* to call the install command, but I want to know the prefix option of install. I do not want a build specific prefix option, I want to know the global install option, whatever the user command lines are. prefix is only an example - as I mentioned in my previous email, I potentially need every option of install in build command.

...

$ python setup.py --prefix=foo cmd1 cmd1 etc

and the result would be in Distribution.options = {'path1': xxx, 'path2': xx}

This is a major change in distutils behavior, so we need to solve the following issues: - every user will have to change how to call distutils - what happens if people still use python setup.py install --option1=foo instead of python setup.py --prefix=option1 install cheers, David

Tarek Ziadé

12:16 a.m.

New subject: People want CPAN

On Fri, Nov 13, 2009 at 1:08 AM, David Cournapeau <cournape@gmail.com> wrote: [..]

...

...
$ python setup.py --prefix=foo cmd1 cmd1 etc

and the result would be in Distribution.options = {'path1': xxx, 'path2': xx}

This is a major change in distutils behavior, so we need to solve the following issues: - every user will have to change how to call distutils - what happens if people still use python setup.py install --option1=foo instead of python setup.py --prefix=option1 install

A deprecation warning would be added in install, if it finds a local option, rather than a global. Meaning both would work in 2.7/3.2. That would give them 18 to 24 months * 2 to go with the new style. Tarek

David Cournapeau

5:22 a.m.

New subject: People want CPAN

Tarek Ziadé wrote:

...

A deprecation warning would be added in install, if it finds a local option, rather than a global. Meaning both would work in 2.7/3.2.

If changing the command line in incompatible ways is acceptable, what do you think of scrapping the commands (at the UI level only) altogether ? This would be more consistent and easier to deal for the user, and easier to implement as well: python setup.py configure --option1 --option2=value2 .... python setup.py build python setup.py install We could then make this work: python setup.py install (would run both build and configure implicitly). Making all finalized options available at the build stage would then be easier. David

Tarek Ziadé

10:26 a.m.

New subject: People want CPAN

On Fri, Nov 13, 2009 at 6:22 AM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:

...

Tarek Ziadé wrote:

...
A deprecation warning would be added in install, if it finds a local option, rather than a global. Meaning both would work in 2.7/3.2.

If changing the command line in incompatible ways is acceptable, what do you think of scrapping the commands (at the UI level only) altogether ? This would be more consistent and easier to deal for the user, and easier to implement as well:

python setup.py configure --option1 --option2=value2 .... python setup.py build python setup.py install

We could then make this work:

python setup.py install (would run both build and configure implicitly). Making all finalized options available at the build stage would then be easier.

Is that scraping, or just preparing finalized options using "configure" ? Meaning other command would just have to get them when they run, if present ? How would that work ? configure would create a file ? It seems that you are pushing all the work in the "configure" option, wich is fine to me, but it also looks like you can already achieve this with the existing system, by changing the subcommands that are in the install command and their order. That is: install configure build all the install_* But if we want to see this working with "build" alone, configure has to be a subcommand of build: install build configure all the install_* IOW this would just require: 1/ adding a "configure" command 2/ inserting it as the first sub command of "build" 3/ make it possible, to share in the whole chain of commands, the passed arguments Tarek

Floris Bruynooghe

12:18 p.m.

New subject: People want CPAN

On Fri, Nov 13, 2009 at 11:26:23AM +0100, Tarek Ziadé wrote:

...

On Fri, Nov 13, 2009 at 6:22 AM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:

...
Tarek Ziadé wrote:

...
A deprecation warning would be added in install, if it finds a local option, rather than a global. Meaning both would work in 2.7/3.2.

If changing the command line in incompatible ways is acceptable, what do you think of scrapping the commands (at the UI level only) altogether ? This would be more consistent and easier to deal for the user, and easier to implement as well:

python setup.py configure --option1 --option2=value2 .... python setup.py build python setup.py install

We could then make this work:

python setup.py install (would run both build and configure implicitly). Making all finalized options available at the build stage would then be easier.

Is that scraping, or just preparing finalized options using "configure" ? Meaning other command would just have to get them when they run, if present ?

How would that work ? configure would create a file ?

That would be the obvious solution. Both autoconf and CPAN do this by my understanding so it's a pretty logical thing to do.

...

It seems that you are pushing all the work in the "configure" option, wich is fine to me, but it also looks like you can already achieve this with the existing system, by changing the subcommands that are in the install command and their order. That is:

install configure build all the install_*

But if we want to see this working with "build" alone, configure has to be a subcommand of build:

install build configure all the install_*

Would it not be harder to add new command (or "build tasks") that require information from the configure step in you structure it like this? Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org

Tarek Ziadé

12:46 p.m.

New subject: People want CPAN

On Fri, Nov 13, 2009 at 1:18 PM, Floris Bruynooghe <floris.bruynooghe@gmail.com> wrote: [..]

...

...
Is that scraping, or just preparing finalized options using "configure" ? Meaning other command would just have to get them when they run, if present ?

How would that work ? configure would create a file ?

That would be the obvious solution. Both autoconf and CPAN do this by my understanding so it's a pretty logical thing to do.

I find it logical too now. [..]

...

Would it not be harder to add new command (or "build tasks") that require information from the configure step in you structure it like this?

I was thinking about an API that would allow any command to read/write the configuration data, and using it from the "configure" command to write it, and from the others to read it. That would allow including this new behaviour in existing commands with a deprecation step (in today's distutils, a build is triggered when you call install in any case, when there are some extensions) Tarek

David Cournapeau

12:52 p.m.

New subject: People want CPAN

On Fri, Nov 13, 2009 at 9:46 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...

On Fri, Nov 13, 2009 at 1:18 PM, Floris Bruynooghe <floris.bruynooghe@gmail.com> wrote: [..]

...
...
Is that scraping, or just preparing finalized options using "configure" ? Meaning other command would just have to get them when they run, if present ?

How would that work ? configure would create a file ?

That would be the obvious solution. Both autoconf and CPAN do this by my understanding so it's a pretty logical thing to do.

I find it logical too now.

Scons and waf do it through a db kind of file, but that's because they are much more fancy (scons for example keeps so called signature of every command and nodes to know what has already been built or not). A first step as a plain file, which should clearly be marked as implementation-defined and only guaranteed to work through an API is the way to go I think. David

Tarek Ziadé

1:44 p.m.

New subject: People want CPAN

On Fri, Nov 13, 2009 at 1:52 PM, David Cournapeau <cournape@gmail.com> wrote:

...

On Fri, Nov 13, 2009 at 9:46 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...
On Fri, Nov 13, 2009 at 1:18 PM, Floris Bruynooghe <floris.bruynooghe@gmail.com> wrote: [..]

...
...
Is that scraping, or just preparing finalized options using "configure" ? Meaning other command would just have to get them when they run, if present ?

How would that work ? configure would create a file ?

That would be the obvious solution. Both autoconf and CPAN do this by my understanding so it's a pretty logical thing to do.

I find it logical too now.

Scons and waf do it through a db kind of file, but that's because they are much more fancy (scons for example keeps so called signature of every command and nodes to know what has already been built or not).

A first step as a plain file, which should clearly be marked as implementation-defined and only guaranteed to work through an API is the way to go I think.

Here's my proposal for this to happen, if you (and others) want to contribute: Let's build this new "configure" command in Distribute 0.7, together with the APIs to read/write the data. Then let's change the other commands consequently (without thinking about backward compat first) so we can try out this new behaviour. Once it's proven to work good, we could publish Distribute 0.7 with it, and depending on the community feedback, we could integrate it to Distutils and work on the backward compat part. this two phase step wouldn't be a problem imho, for early adopters that would use and test it. Tarek -- Tarek Ziadé | http://ziade.org | オープンソースはすごい! | 开源传万世，因有你参与

Jeremy Kloth

2:46 p.m.

New subject: People want CPAN

On Friday 13 November 2009 06:44:37 am Tarek Ziadé wrote:

...

Here's my proposal for this to happen, if you (and others) want to contribute:

Let's build this new "configure" command in Distribute 0.7, together with the APIs to read/write the data.

Then let's change the other commands consequently (without thinking about backward compat first) so we can try out this new behaviour.

Fourthought's 4Suite distutils extensions already provide this behavior. You may consider looking at how the 'config' command is handled there as a guide in implementing this. Or just copy it wholesale, for that matter, as it is my doing.

...

Once it's proven to work good, we could publish Distribute 0.7 with it, and depending on the community feedback, we could integrate it to Distutils and work on the backward compat part.

this two phase step wouldn't be a problem imho, for early adopters that would use and test it.

The exact thing being described has been done in 4Suite for 6 years (along with many other distutils improvements). Feel free to take or discuss or request help with any of the features/additions (like FHS layout of files) in the 4Suite distutils extensions. The code is available for browsing: http://cvs.4suite.org/viewcvs/4Suite/Ft/Lib/DistExt/ The way we've impl'd 'config' was as a prerequisite for 'build', just as 'build' is for 'install'. If any of the options stored by the 'config' command are overridden via 'build' or 'install' options, the 'config' command would be re-run to store the "new" choices. Any questions or just simple help with integrating a similar system, just let me know. Been there, done that. Jeremy -- Jeremy Kloth http://4suite.org/

Tarek Ziadé

15 Nov 15 Nov

12:14 a.m.

New subject: People want CPAN

On Fri, Nov 13, 2009 at 3:46 PM, Jeremy Kloth <jeremy.kloth@gmail.com> wrote: [..]

...

The exact thing being described has been done in 4Suite for 6 years (along with many other distutils improvements). Feel free to take or discuss or request help with any of the features/additions (like FHS layout of files) in the 4Suite distutils extensions. The code is available for browsing: http://cvs.4suite.org/viewcvs/4Suite/Ft/Lib/DistExt/

That's great, thanks for the pointer ! I'll look into it asap

...

The way we've impl'd 'config' was as a prerequisite for 'build', just as 'build' is for 'install'. If any of the options stored by the 'config' command are overridden via 'build' or 'install' options, the 'config' command would be re-run to store the "new" choices.

Any questions or just simple help with integrating a similar system, just let me know. Been there, done that.

So did you end up changing the way options are passed to the commands, or do you just have a specific "config" command that looks over other options passed to the other commands ? Tarek

Jeremy Kloth

2:44 a.m.

New subject: People want CPAN

On Saturday 14 November 2009 05:14:05 pm Tarek Ziadé wrote:

...

On Fri, Nov 13, 2009 at 3:46 PM, Jeremy Kloth <jeremy.kloth@gmail.com>

...
The way we've impl'd 'config' was as a prerequisite for 'build', just as 'build' is for 'install'. If any of the options stored by the 'config' command are overridden via 'build' or 'install' options, the 'config' command would be re-run to store the "new" choices.

Any questions or just simple help with integrating a similar system, just let me know. Been there, done that.

So did you end up changing the way options are passed to the commands, or do you just have a specific "config" command that looks over other options passed to the other commands ?

It is done with the 'config' command having all the options used by the other commands. The other commands would then look up their options' values from 'config' (if not supplied on the command-line). If, for example, `--prefix` was supplied to 'install', the 'install' command would then cause the 'config' to redo its stored options. If at all possible, I would eliminate the redundant options on build_*/install_* and leave them solely on 'config' as it greatly simplifies the interaction of the commands. Jeremy -- Jeremy Kloth http://4suite.org/

Tarek Ziadé

4:15 p.m.

New subject: People want CPAN

On Sun, Nov 15, 2009 at 3:44 AM, Jeremy Kloth <jeremy.kloth@gmail.com> wrote: [..]

...

...
So did you end up changing the way options are passed to the commands, or do you just have a specific "config" command that looks over other options passed to the other commands ?

It is done with the 'config' command having all the options used by the other commands. The other commands would then look up their options' values from 'config' (if not supplied on the command-line). If, for example, `--prefix` was supplied to 'install', the 'install' command would then cause the 'config' to redo its stored options.

If at all possible, I would eliminate the redundant options on build_*/install_* and leave them solely on 'config' as it greatly simplifies the interaction of the commands.

So, pratically speaking, if: $ python setup.py install is called, the install command will instanciate the configure command, that will return options that were stored in some file, in a previous call ? But, for the option redundancy problem, the simplest way I can see to have the "configure" options in "install", or to let the end user pass them along, would be to make "configure" the base class for all commands that are part of the configure-make-install story, so when they run they can read and write options in the stored file and use if needed the options passed in the command line. Tarek

Wolodja Wentland

11 Nov 11 Nov

3:32 p.m.

New subject: People want CPAN

On Wed, Nov 11, 2009 at 14:16 +0100, Tarek Ziadé wrote:

...

On Wed, Nov 11, 2009 at 1:31 PM, David Cournapeau <cournape@gmail.com> wrote: But, at the end, since an option is either global, either specific to a command, I guess a simple API in the command class should be enough to avoid this hack:

...

get_option(command_name, option_name)

+1 -- .''`. Wolodja Wentland <wentland@cl.uni-heidelberg.de> : :' : `. `'` 4096R/CAF14EFC `- 081C B7CD FF04 2BA9 94EA 36B2 8B7F 7D30 CAF1 4EFC

Tarek Ziadé

9:09 a.m.

On Wed, Nov 11, 2009 at 7:14 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

David Cournapeau wrote:

...
One of the main argument to avoid rewrite is that you will end up doing the same mistakes, and old code is more complicated because it has been tested. But here, we know what a good design is like as other languages have vastly superior solutions to this problem.

Also, it seems to me that in this case, the basic architecture of distutils is already so full of mistakes that there just isn't an incremental way of getting to a better place, especially given the requirement of not breaking any existing setup.py scripts together with the fact that the API of distutils effectively consists of its *entire implementation*.

So while complete rewrites are best avoided *if possible*, I don't think we have a choice in this case.

While "build_ext" is not handy, I don't buy the fact that Distutils is "full of mistakes". We have to work with use cases. David gave a use case: being able to compile cython or assembly files. I proposed a solution based on being able to define a compiler at the Extension level, rather that for the entire build_ext command. If the answer to that solution is just: "Distutils sucks anyways..", it is not really helpfull imho.. I don't see the point to write Distutils from scratch, instead of making it evolve. Tarek

Greg Ewing

9:02 p.m.

Tarek Ziadé wrote:

...

If the answer to that solution is just: "Distutils sucks anyways..", it is not really helpfull imho..

I don't see the point to write Distutils from scratch, instead of making it evolve.

If you can see a way to get from the current distutils code to something with a well-designed and well-documented API and a clean implementation, that would be fine by me. When I say it's fundamentally broken, I'm really talking about the API. My idea of what an API for a build system should be like is more like make or scons, which slice the functionality up in a completely orthogonal direction to the way distutils does. Maybe it would be possible to plug such a system in under the existing build_ext class. I don't know. I think I would like the same philosophy applied to other areas of distutils, not just compiling extensions. Otherwise it would feel like two incompatible systems bolted together. What we need right now, I think, is some discussion about a new API, unconstrained by any considerations of backwards compatibility or reuse of existing distutils code. Once we know where we're going, then we can think about the best way to get there. -- Greg

Robert Kern

9:36 p.m.

On 2009-11-11 15:02 PM, Greg Ewing wrote:

...

Tarek Ziadé wrote:

...
If the answer to that solution is just: "Distutils sucks anyways..", it is not really helpfull imho..

I don't see the point to write Distutils from scratch, instead of making it evolve.

If you can see a way to get from the current distutils code to something with a well-designed and well-documented API and a clean implementation, that would be fine by me.

When I say it's fundamentally broken, I'm really talking about the API. My idea of what an API for a build system should be like is more like make or scons, which slice the functionality up in a completely orthogonal direction to the way distutils does.

Maybe it would be possible to plug such a system in under the existing build_ext class. I don't know.

In fact, David has. http://pypi.python.org/pypi/numscons/

...

I think I would like the same philosophy applied to other areas of distutils, not just compiling extensions. Otherwise it would feel like two incompatible systems bolted together.

It does feel something like that. The build system is just one of the problems with distutils' internals, in my experience. You can think of the rest of distutils as a little application framework for command line utilities. I think this framework simply fails to provide in very fundamental ways, like the "extend commands by subclassing" design pattern. That choice makes it fundamentally difficult to combine extensions together. I really don't see a way to evolve away from that (and believe me, over the last decade, I've tried). You just need to redesign the internals if you want to get away from that. You can't get from any point A to any point B by evolving in small steps that are functional (not to mention backwards compatible!) all the way through. With all respect to Greg Ward and the rest of the original distutils authors, it was a fantastic improvement over the state of affairs ten years ago, but we've learned a lot about application frameworks and about building software since then. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Tarek Ziadé

9:59 p.m.

On Wed, Nov 11, 2009 at 10:36 PM, Robert Kern <robert.kern@gmail.com> wrote: [..]

...

It does feel something like that. The build system is just one of the problems with distutils' internals, in my experience. You can think of the rest of distutils as a little application framework for command line utilities. I think this framework simply fails to provide in very fundamental ways, like the "extend commands by subclassing" design pattern. That choice makes it fundamentally difficult to combine extensions together. I really don't see a way to evolve away from that (and believe me, over the last decade, I've tried). You just need to redesign the internals if you want to get away from that. You can't get from any point A to any point B by evolving in small steps that are functional (not to mention backwards compatible!) all the way through.

I am very surprised about this statement. What did you tried for the paste decade and failed to do ? I hear some complaints since a week, but beside's David examples I didn't read any other precise use cases. We're looking through the build_ext use case, and we are making some improvement on the other thread. So why not doing this in other issues ? Let's discuss your use case. And if it means adding new options to run arbitrary commands like post/pre hooks to a given command, to avoid subclassing an existing command, let's do it. And let's drop the backward compat issues in these discussions, so we don't burn out in details. Tarek

Robert Kern

10:22 p.m.

On 2009-11-11 15:59 PM, Tarek Ziadé wrote:

...

On Wed, Nov 11, 2009 at 10:36 PM, Robert Kern<robert.kern@gmail.com> wrote: [..]

...
It does feel something like that. The build system is just one of the problems with distutils' internals, in my experience. You can think of the rest of distutils as a little application framework for command line utilities. I think this framework simply fails to provide in very fundamental ways, like the "extend commands by subclassing" design pattern. That choice makes it fundamentally difficult to combine extensions together. I really don't see a way to evolve away from that (and believe me, over the last decade, I've tried). You just need to redesign the internals if you want to get away from that. You can't get from any point A to any point B by evolving in small steps that are functional (not to mention backwards compatible!) all the way through.

I am very surprised about this statement.

What did you tried for the paste decade and failed to do ? I hear some complaints since a week, but beside's David examples I didn't read any other precise use cases.

We're looking through the build_ext use case, and we are making some improvement on the other thread. So why not doing this in other issues ?

Let's discuss your use case. And if it means adding new options to run arbitrary commands like post/pre hooks to a given command, to avoid subclassing an existing command, let's do it.

http://svn.scipy.org/svn/numpy/trunk/numpy/distutils/command/ All of it. Now consider that here we are also trying to play nicely with the setuptools extensions, and Pyrex, and David is working on integrating Cython support. To get to one real specific problem, let's consider build_src. build_src is a new subcommand in numpy.distutils that builds C extension sources from other files. We use this to hook in f2py's wrapper generator and other more ad hoc forms of generating wrappers. When build_ext uses --inplace, we need build_src to use --inplace, too, because it will often output some "final" products in addition to intermediate wrapper sources. In order to integrate this with setuptools' develop command (which invokes build_ext --inplace but not build_src --inplace because setuptools knows nothing about numpy.distutils), we need to create a subclass of setuptool's develop command that will reinitialize build_src with the appropriate option. Then we need to conditionally place the develop command into the set of command classes so as not to introduce a setuptools dependency on those people who don't want to use it. This is nuts. numpy.distutils shouldn't have to know anything about setuptools to accomplish this if the framework were properly designed. And this doesn't even get into the fact that many of the numpy.distutils command classes that are shared with setuptools are conditional subclasses and probably still buggily cobbled together. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Tarek Ziadé

12 Nov 12 Nov

12:19 a.m.

On Wed, Nov 11, 2009 at 11:22 PM, Robert Kern <robert.kern@gmail.com> wrote: [..]

...

To get to one real specific problem, let's consider build_src. build_src is a new subcommand in numpy.distutils that builds C extension sources from other files. We use this to hook in f2py's wrapper generator and other more ad hoc forms of generating wrappers. When build_ext uses --inplace, we need build_src to use --inplace, too, because it will often output some "final" products in addition to intermediate wrapper sources. In order to integrate this with setuptools' develop command (which invokes build_ext --inplace but not build_src --inplace because setuptools knows nothing about numpy.distutils), we need to create a subclass of setuptool's develop command that will reinitialize build_src with the appropriate option. Then we need to conditionally place the develop command into the set of command classes so as not to introduce a setuptools dependency on those people who don't want to use it.

This is nuts.

This clearly indicates that we should be able to extend build_ext behaviour without subclassing it. And having the ability to drive a specific compiling from within an Extension subclass can solve this issue.

...

numpy.distutils shouldn't have to know anything about setuptools to accomplish this if the framework were properly designed. And this doesn't even get into the fact that many of the numpy.distutils command classes that are shared with setuptools are conditional subclasses and probably still buggily cobbled together.

Same goes with "install", and I've proposed in the past the ability to run arbitrary commands as pre/post hooks for it. So we can "configure" this command instead of replacing it.

Greg Ewing

1:17 a.m.

Robert Kern wrote:

...

With all respect to Greg Ward and the rest of the original distutils authors, it was a fantastic improvement over the state of affairs ten years ago, but we've learned a lot about application frameworks and about building software since then.

I think we knew more about it *before* then as well. Make has been around for a lot longer, but distutils ignores everything we've learned from it. -- Greg

David Lyon

11 Nov 11 Nov

11:17 p.m.

Hi Greg, On Thu, 12 Nov 2009 10:02:18 +1300, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

What we need right now, I think, is some discussion about a new API, unconstrained by any considerations of backwards compatibility ....

A new API isn't so hard, but like anything it takes time and effort to do. Already this year, *a lot* of discussion went into such a thing. For example, doing metadata based installs as an option to traditional installs. Backwards compatibility is going to be an issue that *does* need to be addressed but even that I don't believe is so hard. There's two halves to "backwards" compatability. At the build side and at the install side. It's confusing I know. Where distutils/setup is confusing is at the build side in the way that all the files have to be specified for being picked up. On the install side, setup.py is much simpler and hardly* does anything at all except copy files in from the sources.txt. So compatability there is easier. It's my opinion that a boilerplate setup.py could completely replace 9/10 user written setup.py files.

...

When I say it's fundamentally broken, I'm really talking about the API. My idea of what an API for a build system should be like is more like make or scons, which slice the functionality up in a completely orthogonal direction to the way distutils does.

yes. Broken on the build side.

...

Maybe it would be possible to plug such a system in under the existing build_ext class.

For even that to happen, there needs to be some work done. As in fleshing out some code and examples and documentation. Here's a configuration file for an alternate build system that I'm working on at: http://bitbucket.org/djlyon/pypi-package-builder Any help thrown at getting this working would be welcomed. Regards David ----setup.cfg-------------------- [setup] name = artistflair version = 1.2 description = artistflair is a package written in spare time to show colour variations. [dependencies] packages = pyopengl objectrelationalmapper >= 1.6 [dependencies python<=2.4] packages = lxml >= 2.5 [dependencies windows] packages = win32com [dependencies linux] packages = pyodbc [Application] scripts = artisticflairgui.py [configuration] files = artisticflair.conf [datafiles] files = artisticdb.db directory = db [Documentation] directory = docs url = index.html [preinstall] script = checksystem.py [postinstall linux] script = builddocumentation.py [uninstall] script = uninstallcleanup.py ---------------------------------

David Lyon

11:20 a.m.

On Wed, 11 Nov 2009 19:14:42 +1300, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

Also, it seems to me that in this case, the basic architecture of distutils is already so full of mistakes that there just isn't an incremental way of getting to a better place, especially given the requirement of not breaking any existing setup.py scripts together with the fact that the API of distutils effectively consists of its *entire implementation*.

So while complete rewrites are best avoided *if possible*, I don't think we have a choice in this case.

Fwiw, I can't see any way to make that work. Even if I said that I didn't like distutils, which btw isn't true because my knowledge of it is so limited, it's just not practical to rewrite such a complex tool within any reasonable timeframe. Anyway, moving towards CPAN isn't about rewriting all of distutils, only a few key parts. Or restructuring. The parts that shouldn't be rewritten (because the people that wrote it were too clever - and the degree of difficulty to high) is any C interface. That's an important part of distutils that is likely to end up worse rather than better. If there are candidates for 'evolution', it has to be in the areas such as package creation (collecting all the files and putting them in a .tar.gz or .egg or .zip or .exe) for upload to pypi. In CPAN, one wouldn't dream of there being so many possibilities. As I've said on python-dev, we should just revise and call a package an egg. In science, it wouldn't be tolerated to call a hydrogen atom using 6 different synonyms. But in python we do. Then we wonder why people get confused... There's a big difference between a rewrite and an evolution. Rewrite isn't viable, but an evolution is.

...

From science, we even know that it only takes two generations to become nearly immune from strong radiation as shown in chernoble. So when we talk about evolution lets use it in the modern sense, not in the darwinian.

Evolution is better than revolution is better than nothing happening at all. David

Greg Ewing

9:25 p.m.

New subject: Build_ext refactoring (was: People want CPAN)

David Lyon wrote:

...

The parts that shouldn't be rewritten (because the people that wrote it were too clever - and the degree of difficulty to high) is any C interface.

Not exactly sure what you mean by that. If you mean the knowledge of how to call the C compiler to compile an extension module on various platforms, I agree that this is important knowledge that should be preserved. But if the only way to preserve it is to keep the actual existing distutils code that implements it, we have a problem, because that IMO is the part of distutils that *most* needs improvement. It needs to be broken out of the monolithic build_ext class somehow so that the build_ext process can be extended more selectively.

...

Evolution is better than revolution is better than nothing happening at all.

I don't agree that evolution is better in and of itself than revolution. They're both means to an end -- getting something better than we have now. The problem with distutils is that evolution doesn't seem to have worked. It has just grown randomly with no clear design and ended up painting itself into a corner. -- Greg

Tarek Ziadé

9:35 p.m.

New subject: Build_ext refactoring (was: People want CPAN)

On Wed, Nov 11, 2009 at 10:25 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote: [..]

...

I don't agree that evolution is better in and of itself than revolution. They're both means to an end -- getting something better than we have now. The problem with distutils is that evolution doesn't seem to have worked. It has just grown randomly with no clear design and ended up painting itself into a corner.

We are making progress in the thread we are having with David C. in the build_ext refactoring. I am suggesting to keep the 'Build_ext refactoring' topic in that thread. Tarek

Tarek Ziadé

8:59 a.m.

On Wed, Nov 11, 2009 at 3:08 AM, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote: [..]

...

What is important here is how to add new tools, without touching or impacting distutils in many places. In particular, what distutils expects from a tool should be clearly defined (right now it is implementation defined).

Yes, and in that case, it means writing a new compiler class. [...]

...

...
Distutils is not living only through setup.py. It has some public APIs imported and used by people.

I am aware about the usage of distutils: I don't think it has public API, though. It has functions that people use, but it is far from clear what is public from what is private. Many things we do numpy.distutils are clearly dependent on implementation details of distutils.

A public api doesn't have a "_" prefix. [..]

...

Otherwise, a new system would look nothing like distutils. One of the main argument to avoid rewrite is that you will end up doing the same mistakes, and old code is more complicated because it has been tested. But here, we know what a good design is like as other languages have vastly superior solutions to this problem.

As far as compilation is concerned at least, the distutils knowledge is vastly overblown. First, most of it comes from autoconf on unices. You have the MSVC tools knowledge, but that can be easily reused (in numscons, I added msvc support from the python 2.6 msvc9compiler, this was not difficult). Most other tools are rather obsolete - and if you break any API in distribute there, you will most likely lose them as well anyway (I am thinking about OS/2, Metrowerk kind of tools).

Again, I don't mean to say that working on distribute is a mistake, or criticize what you do in any way. I just don't think it will solve any significant issue for the scientific python community. But enough "stop-energy": at this point, I will just shut up, and continue working on my own idea - if they make sense, the scientific community is big enough so that the solution could be used there only, at least for some time.

I asked you for your use cases so we could work on changing things, but it's evident at this point that you don't want to use Distutils or you don't think it can evolve. I don't think the scientific community is so different from any other Python community either, in what they need. And I don't think Distutils is a lost case, as you seem to think. Tarek

Ben Finney

7 Nov 7 Nov

8:20 a.m.

Guido van Rossum <guido@python.org> writes:

...

On Fri, Nov 6, 2009 at 2:52 PM, David Lyon <david.lyon@preisshare.net> wrote:

...
So the packages on CPAN are typically of a higher quality, simply because they've been machine checked. I like that.

Speaking purely on hearsay, I don't believe that. In fact, I've heard plenty of laments about the complete lack of quality control on CPAN.

That's not inconsistent with CPAN having higher average quality than the average quality of PyPI packages. It could merely mean that, as a result of the awareness that CPAN packages are tested on upload, CPAN users *expect* higher quality, and complain louder than PyPI users when they find a low-quality package :-) I think buildbot-style test runs for PyPI packages would raise average package quality on PyPI. -- \ “Everything you read in newspapers is absolutely true, except | `\ for that rare story of which you happen to have first-hand | _o__) knowledge.” —Erwin Knoll | Ben Finney

Lennart Regebro

8:35 a.m.

I love how this created a flurry of speculation on what people who say "Python doens't have a CPAN" mean. Wouldn't it be easer to *ask* them? :-) Just-wondering-ly -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64

ssteinerX＠gmail.com

2:30 p.m.

On Nov 7, 2009, at 3:20 AM, Ben Finney wrote:

...

Guido van Rossum <guido@python.org> writes:

...
On Fri, Nov 6, 2009 at 2:52 PM, David Lyon <david.lyon@preisshare.net> wrote:

...
I think buildbot-style test runs for PyPI packages would raise average package quality on PyPI.

Please excuse the cross-post but I wanted to make sure that all these "CPAN for Python" discussions got this message and I've lost track of which list which part of what discussion had occurred on. We are currently extending our distutils/Distribute test system to include installation of a broad range of packages as part of the pre- release process for a future release of Distribute and as part of our "smoke" test for distutils/Distribute. Eventually, the goal is to integrate this with our buildbot system but that's a ways off. Our goal is to install a range of packages and, where practicable, actually run and record any errors with the packages' individual test suites. Right now, our "smoke" test only does Twisted and numpy. We've discussed how to collect test results from Twisted trial and we'll be working on similar things for other test runners (nose et al.). For Twisted, we're going to install and test both the current release version and an svn checkout from trunk. It would be an extension of that concept to install and test *all* packages from PyPI but would, obviously, take considerable horsepower (and time) to run such an exhaustive test (especially if we're talking about 2.4?, 2.5, 2.6, 2.7, and 3.1+. Right now I'm extending the configuration file for our smoke test to allow for various test runners (e.g. nose, twisted trial, etc.) so we can "smoke out" more installation problems and/or failed tests after installation. For the first pass, I'm just focusing on Twisted and trial, then numpy, then finding packages that support nose so that I can collect the data on what ran, what passed, and what didn't. I'm planning on collecting this all in a database and making some simple API so that it can be mined by very simple apps later. At the point where that infrastructure is in place, we could pretty easily mine the data to do all kinds of crazy things people have mentioned like: * A ranking system of test coverage * Complexity analysis * Test coverage * Run pylint, pyflakes, 2to3, whatever automated measurement tools over the code * Send test failure messages to maintainers (maybe with opt-in in the new meta-data). * Whatever! We're actively working on this right now; anyone who wants to lend a hand is welcome to contact me off-list and we can talk about what types of things we are needing and where we could use a hand. All in all, I think this could be a big leap forward for the Python distribution ecosystem whether or not we eventually write the PyPan I wished for as a new Perl refugee. Thanks, S

Jesse Noller

3:08 p.m.

New subject: [Python-Dev] People want CPAN :-)

On Sat, Nov 7, 2009 at 9:30 AM, ssteinerX@gmail.com <ssteinerx@gmail.com> wrote:

...

On Nov 7, 2009, at 3:20 AM, Ben Finney wrote:

...
Guido van Rossum <guido@python.org> writes:

...
On Fri, Nov 6, 2009 at 2:52 PM, David Lyon <david.lyon@preisshare.net> wrote:

...
I think buildbot-style test runs for PyPI packages would raise average package quality on PyPI.

Please excuse the cross-post but I wanted to make sure that all these "CPAN for Python" discussions got this message and I've lost track of which list which part of what discussion had occurred on.

We are currently extending our distutils/Distribute test system to include installation of a broad range of packages as part of the pre-release process for a future release of Distribute and as part of our "smoke" test for distutils/Distribute. Eventually, the goal is to integrate this with our buildbot system but that's a ways off.

Who is "we"?

ssteinerX＠gmail.com

3:10 p.m.

New subject: [Python-Dev] People want CPAN :-)

On Nov 7, 2009, at 10:08 AM, Jesse Noller wrote:

...

On Sat, Nov 7, 2009 at 9:30 AM, ssteinerX@gmail.com <ssteinerx@gmail.com

...
wrote: On Nov 7, 2009, at 3:20 AM, Ben Finney wrote:

...
Guido van Rossum <guido@python.org> writes:

...
On Fri, Nov 6, 2009 at 2:52 PM, David Lyon <david.lyon@preisshare.net

...
wrote:

...
I think buildbot-style test runs for PyPI packages would raise average package quality on PyPI.

Please excuse the cross-post but I wanted to make sure that all these "CPAN for Python" discussions got this message and I've lost track of which list which part of what discussion had occurred on.

We are currently extending our distutils/Distribute test system to include installation of a broad range of packages as part of the pre- release process for a future release of Distribute and as part of our "smoke" test for distutils/Distribute. Eventually, the goal is to integrate this with our buildbot system but that's a ways off.

Who is "we"?

We is the people working on Distribute/distutils. S

Michael Sparks

10 Nov 10 Nov

1:12 a.m.

New subject: [Python-Dev] People want CPAN :-)

[ I'm posting this comment in reply to seeing this thread: * http://thread.gmane.org/gmane.comp.python.distutils.devel/11359 Which has been reposted around - and I've read that thread. I lurk on this list, in case anything comes up that I'd hope to be able to say something useful to. I don't know if this will be, but that's my reason for posting. If this is the wrong place, my apologies, I don't sub to distutils-sig :-/ ] On Sat, Nov 7, 2009 at 2:30 PM, ssteinerX@gmail.com <ssteinerx@gmail.com> wrote:

...

On Nov 7, 2009, at 3:20 AM, Ben Finney wrote:

...
Guido van Rossum <guido@python.org> writes:

...
On Fri, Nov 6, 2009 at 2:52 PM, David Lyon <david.lyon@preisshare.net> wrote:

...
[ lots of snippage ] ... All in all, I think this could be a big leap forward for the Python distribution ecosystem whether or not we eventually write the PyPan I wished for as a new Perl refugee.

Speaking as someone who left the perl world for the python world, many years ago now, primarily due to working on one project, the thing I really miss about Perl is CPAN. It's not the fact that you know you do perl Makefile.PL && make && make test && make install. Nor the fact that it's trivial to set up a skeleton package setup that makes that work for you. It's not the fact that there's an installer than can download & track dependencies. The thing that makes the difference IMHO is two points: * In a language which has a core ethos "There's more than one way to do it", packaging is the one place where there is one, and only one obvious way to do it. (Oddly with python, with packaging this is flipped - do I as a random project use distutils? pip? setuptools? distribute? virtualenv?) * It has a managed namespace or perhaps better - a co-ordinated namespace. CPAN may have lots of ills, and bad aspects about it (I've never really trusted the auto installer due to seeing one too many people having their perl installation as a whole upgraded due to a bug that was squashed 6-8 years ago), but these two points are pretty much killer. All the other aspects like auto download, upload, dependency tracking, auto doc extraction for the website etc, really follow from the managed namespace really. I realise that various efforts like easy_install & distribute & friends make that sort of step implicitly - since there can only be one http://pypi.python.org/pypi/flibble . But it's not quite the same - due to externally hosted packages. For more detail about this aspect: * http://www.cpan.org/modules/04pause.html#namespace I'm really mentioning this because I didn't see it listed, and I really think that it's very easy to underestimate this aspect of CPAN. IMHO, it is what matters the most about CPAN. The fact that they've nabbed the CTAN idea of having an archive network for storing, mirroring and grabbing stuff from is by comparison /almost/ irrelevant IMHO. It is the sort of thing that leads to the DBI::DBD type stuff that is being simple to use, because of the encouragement to talk and share a namespace. The biggest issue with this is retrofitting this to an existing world. Personal opinion, I hope it's useful, and going back into lurk mode (I hope :). If this annoys you, please just ignore it. Michael.

Tarek Ziadé

6 Nov 6 Nov

11:44 p.m.

On Fri, Nov 6, 2009 at 6:53 PM, Guido van Rossum <guido@python.org> wrote:

...

I just found this comment on my blog. People have told me this in person too, so I believe it is real pain (even if the solution may be elusive and the suggested solutions may not work). But I don't know how to improve the world. Is the work on distutils-sig going to be enough? Or do we need some other kind of work in addition? Do we need more than PyPI?

I don't think PyPI has a structural problem. I do think the biggest issue is related to the fact that it's not really possible to describe a distribution and its dependencies today with a plain Distutils (==stdlib). They can use Setuptools or Distribute, third party tools. But then, there are using different installation standards. Changes are being done in PEP 345 + PEP 376 to address this. The goal is to make sure Distutils provides what is required to describe dependencies, a standard installation format, and a way to query installed distributions. Now, there's one thing CPAN has and PyPI don't: a network of package repositories and an easy way to set your own repository to serve distributions. IOW, one can sets its own CPAN server, or add its server to an existing network. That's useful for example if you want to have your own private PyPI server and combine it with the public PyPI. In PyPI, everything is centralized, and while this has some benefits, it also makes it a single point of failure (SPF) when you want to download distributions. We have suffered for that problem in some sub communities like Plone, that gets hundreds of packages at PyPI to get built. If PyPI is down, there's nothing we can do unless we have a local cache of packages. CPAN is way better in that field. One work has been started to try to fix this SPF: a mirroring standard, see PEP 381 - http://www.python.org/dev/peps/pep-0381/ In parallel, some server-side softwares are now allowing people to register and upload packages using Distutils, like what they would do with PyPI. For instance, you can use the "register" and "upload" command to push your distribution at plone.org That was made possible after some changes to the .pypirc configuration file (which is now allowing people to define several servers) The next step is to add fail-over capabilities to tools like Pip or Distribute, and ways to merge several sources of distributions repositories: pypi and any other server that implements the same protocol. Tarek

Chris Withers

7 Nov 7 Nov

1:52 a.m.

This feels like a post relative to the catalog-sig too, or maybe even moreso than distutils-sig... Chris Guido van Rossum wrote:

...

I just found this comment on my blog. People have told me this in person too, so I believe it is real pain (even if the solution may be elusive and the suggested solutions may not work). But I don't know how to improve the world. Is the work on distutils-sig going to be enough? Or do we need some other kind of work in addition? Do we need more than PyPI?

--Guido

---------- Forwarded message ---------- From: dalloliogm <noreply-comment@blogger.com> Date: Fri, Nov 6, 2009 at 8:01 AM Subject: [Neopythonic] New comment on Python in the Scientific World. To: gvanrossum@gmail.com

dalloliogm has left a new comment on your post "Python in the Scientific World":

Python is suffering a lot in the scientific word, because it has not a CPAN-like repository.

PyPI is fine, but it is still far from the level of CPAN, CRAN, Bioconductor, etc..

Scientists who use programming usually have a lot of different interests and approaches, therefore it is really difficult to write a package that can be useful to everyone. Other programming language like Perl and R have repository-like structure which enable people to download packages easily, and to upload new ones and organize them withouth having to worry about having to integrate them into existing packages.

This is what is happening to biopython now: it is a monolitic package that it is supposed to work for any bioinformatic problem; but this is so general that to accomplish that you would need to add a lot of dependencies, to numpy, networkx, suds, any kind of library. However, since easy_install is not as ready yet as the counterparts in other languages, if the biopython developers add too many dependencies, nobody will be able to install it properly, and nobody will use it.

Posted by dalloliogm to Neopythonic at November 6, 2009 8:01 AM

-- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk

Glyph Lefkowitz

9:11 a.m.

On Nov 6, 2009, at 12:53 PM, Guido van Rossum wrote:

...

I just found this comment on my blog. People have told me this in person too, so I believe it is real pain (even if the solution may be elusive and the suggested solutions may not work). But I don't know how to improve the world. Is the work on distutils-sig going to be enough? Or do we need some other kind of work in addition? Do we need more than PyPI?

In my experience, when users say this, they just mean "I tried easy_install and it broke". PyPI doesn't have some deep, fundamental, architectural issue that prevents it from working. The user experience of it is just buggy. Consider the difference between these two pages: http://docs.webfaction.com/software/perl.html http://docs.webfaction.com/software/python.html Note that the 'python' page is more than twice as long, lists a ton of different installation options, and includes a gigantic "troubleshooting" section that apparently isn't necessary for perl. Note also that the Perl page is just a series of steps describing how to invoke the one installation mechanism, but the Python page is a hodgepodge of qualified instructions describing different possible mechanisms you can try. It also appears that webfaction has modified the default environment's configuration to make their "troubleshooting" section *shorter* than it would have to be for more general Python software installation instructions. The default behavior of most Python installation mechanisms - to my knowledge, 'python setup.py install', 'easy_install', and 'pip', will all raise exceptions by default on UNIX-y platforms, unless you're root. On Windows (since a higher percentage of the user population walks around with admin rights all the time), the default invocations described by many project web pages will work if the installation is pure-python or if the author remembered to provide a Windows binary egg, but a common failure mode is "you don't have a compiler". Similarly, on a Mac, you have to have Xcode installed, although Python itself works fine if you don't, so it seems like you don't. Many of these tools *would* work by default with a small amount of configuration, a couple of environment variables, and clearer error messages that indicate (A) *that* you need to install a C compiler and (B) *where* you need to go to get a C compiler. One project that would help a lot is just a "easy python setup" documentation project that describes, as simply as possible, in large fonts, how to get a working Python setup that adheres to a few conventions. Just include the 2 lines of .bashrc and explain how to add them; don't debate the merits of ~/bin vs. ~/.local/bin vs. ~/opt/ bin (although come on, ~/.local/bin/ is _clearly_ the right name for it), just pick one for each platform and provide clear step-by-step instructions for getting it to work. "put this in your ~/.bashrc: <really big PRE tag with shell setup in it>. restart your shell." Anybody who has selected an alternate shell or done some other prior configuration will need to adjust their expectations, but we can worry about supporting unusual configurations when the community has a good answer for the default configuration. (Although this is a good reason to do this as documentation and not attempt to write an autoconfigurating tool: an autoconfigurating tool needs to understand every possible nuance of the environment, but advanced users can notice which parts of the short document might not apply to them.) I feel like I might be repeating this a bit much, but it's a really important point: many of the things I'm talking here are *not* about getting the code right as part of a Python tool, but in providing an easy, manageable way to integrate with _other_ tools that are outside of our collective control as Python package authors: the dynamic linker, the shell, the file manager and the C compiler (or lack thereof). By providing a default user-home-directory installation location, Python itself is already doing almost as much as it can; if easy_install started installing things into that location by default *without* any of this bootstrapping documentation (or a very, very carefully written tool to do the bootstrapping for you) then importing pure Python packages might work great but scripts would be broken and any external shared libraries required by Python modules (even if they built correctly) would just give you an ImportError. Once we have some kind of working consensus on this setup, the tools can change to support it: easy_install can default to installing things in the user's home directory in the case that (A) the environment is set up for it and (B) the user isn't an administrator. If the environment *isn't* set up, instead of spitting out twelve paragraphs explaining how really you should have read-write access to the location where your pth files are stored and a link into the middle of some dense technical reference documentation, it can just say "read this page" and link to something that says "HERE IS A .REG FILE THAT WILL FIX YOUR PYTHON. CLICK ON IT AND DO NOT ASK QUESTIONS." If such a document existed, it would also be good for PyPI to link to it prominently so that if a user comes across a package by way of a Google search that ends at PyPI, there is a clear set of instructions as to how to get it on their computer. There's also something that everyone on this list can do today: every python package author should have a clean user account, OS install, virtual machine, or whatever; some otherwise pristine environment where they try to follow their own installation instructions *without* any of their usually available tools or tricks for setting up a python development environment, and without administrator access. One thing that always strikes me about Python hackers is that we all have existing setups to support our own personal style of development, and so we are somewhat insulated from the new-user-experience pain. I think if everyone on this list did this regularly on a package with a dozen or so dependencies, the situation would rapidly improve regardless of my other recommendations :).

Paul Moore

11:27 a.m.

2009/11/7 Glyph Lefkowitz <glyph@twistedmatrix.com>:

...

One project that would help a lot is just a "easy python setup" documentation project that describes, as simply as possible, in large fonts, how to get a working Python setup that adheres to a few conventions. [...]

+1 Strong conventions would help a lot here, IMHO. At least as useful would be a similar document explaining how to write a Python package (contents of setup.py, doc/test subdirectories, registering on PyPI, what to upload - bdist_wininst, egg, source, ...) This would make the build/install experience much more consistent for new packages. Paul.

Andreas Jung

2:57 p.m.

Am 06.11.09 18:53, schrieb Guido van Rossum:

...

I just found this comment on my blog. People have told me this in person too, so I believe it is real pain (even if the solution may be elusive and the suggested solutions may not work). But I don't know how to improve the world. Is the work on distutils-sig going to be enough? Or do we need some other kind of work in addition? Do we need more than PyPI? My 2 cents after reading and ignoring the whole thread:

- PyPI provides a good functionality so far What is annoying about PyPI: - some package maintainers have a certain ignorance and arrogance by misusing PyPI - for uploading packages without or broken metadata - for uploading packages of doubtful quality - for uploading packages to PyPI as a replacement for a private egg server - supports too much different versioning schemas. Both schema supported by setuptools and the one proposed by Tarek in some PEP are totally over-engineered. A simple and *enforced* versioning schema is what I want to see. - no more external hosting of packages. If people want their packages listed on Pypi, they should be required to upload their packages on PyPI (no more issues with non-available external server, no more issues with mirroring external servers, no more issues with wrong download URLs within package metadata) - better checks on uploaded packages. A source code release should be made using the 'sdist' command. We don't need source eggs of a package for Python 2.4-2.6 containing Python source code only. The solution for a better PyPI: - more checks, more restrictions - every package maintainer uploading something to PyPI should have a certain attitude that PyPI is a public resource where the content should met certain quality criteria and where each package has a certain responsibility to Python community. Andreas

Alex Grönholm

3:13 p.m.

Andreas Jung kirjoitti:

...

Am 06.11.09 18:53, schrieb Guido van Rossum:

...
I just found this comment on my blog. People have told me this in person too, so I believe it is real pain (even if the solution may be elusive and the suggested solutions may not work). But I don't know how to improve the world. Is the work on distutils-sig going to be enough? Or do we need some other kind of work in addition? Do we need more than PyPI?

My 2 cents after reading and ignoring the whole thread:

- PyPI provides a good functionality so far

What is annoying about PyPI:

- some package maintainers have a certain ignorance and arrogance by misusing PyPI

- for uploading packages without or broken metadata - for uploading packages of doubtful quality - for uploading packages to PyPI as a replacement for a private egg server

- supports too much different versioning schemas. Both schema supported by setuptools and the one proposed by Tarek in some PEP are totally over-engineered. A simple and *enforced* versioning schema is what I want to see.

- no more external hosting of packages. If people want their packages listed on Pypi, they should be required to upload their packages on PyPI (no more issues with non-available external server, no more issues with mirroring external servers, no more issues with wrong download URLs within package metadata)

- better checks on uploaded packages. A source code release should be made using the 'sdist' command. We don't need source eggs of a package for Python 2.4-2.6 containing Python source code only.

...

The solution for a better PyPI:

- more checks, more restrictions

- every package maintainer uploading something to PyPI should have a certain attitude that PyPI is a public resource where the content should met certain quality criteria and where each package has a certain responsibility to Python community.

...

Andreas

_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig

Tarek Ziadé

3:37 p.m.

On Sat, Nov 7, 2009 at 3:57 PM, Andreas Jung <lists@zopyx.com> wrote: [..]

...

- supports too much different versioning schemas. Both schema supported by setuptools and the one proposed by Tarek in some PEP are totally over-engineered. A simple and *enforced* versioning schema is what I want to see.

Unfortunately, as long as we have release candidates, and development versions, we need a more complex scheme that MAJOR.MINOR.MICRO. If you think of any ways to simplify PEP 386, please help us, Last, we can encourage people to use it, but we can't enforced it: I know people that are happily using dates for their versions, and we can't forbid them to push their work on pypi just because of that. We can try to educate then, but that's their pick at the end I think. An enterprise PyPI could do enforce it, but not our community PyPI imho [...]

...

The solution for a better PyPI:

- more checks, more restrictions - every package maintainer uploading something to PyPI should have a certain attitude that PyPI is a public resource where the content should met certain quality criteria and where each package has a certain responsibility to Python community.

More checks would be nice, so we can provide QA rates or something similar. I don't think we should enforce any policy whatsoever though at PyPI. We can't force people that upload distributions to comply with some strict QA rules imho (no binary distro allowed if no sdist is present for example). Tarek

Andreas Jung

3:56 p.m.

Am 07.11.09 16:37, schrieb Tarek Ziadé:

...

On Sat, Nov 7, 2009 at 3:57 PM, Andreas Jung <lists@zopyx.com> wrote: [..]

...
- supports too much different versioning schemas. Both schema supported by setuptools and the one proposed by Tarek in some PEP are totally over-engineered. A simple and *enforced* versioning schema is what I want to see.

Unfortunately, as long as we have release candidates, and development versions, we need a more complex scheme that MAJOR.MINOR.MICRO.

Do we need/want development on PyPI? At least not me. MAJOR.MINOR.MICRO.PICO + |a-c]1..N should be good enough.

...

Last, we can encourage people to use it, but we can't enforced it:

Of course we can..

...

I know people that are happily using dates for their versions, and we can't forbid them to push their work on pypi just because of that.

..one must not accept and support a whole zoo of private numbering schemes. Agree on a common and minimal standard and enforce the standard.

...

We can try to educate then, but that's their pick at the end I think.

Teaching is a good thing...

...

An enterprise PyPI could do enforce it, but not our community PyPI imho

"community" does not imply that we can not agree on certain rules and standards for PyPI - otherwise PyPI remains as it sometimes appears - an unflashed package toilet. Python as a quality programming language needs a package repository with some minimum standards - I completely disagree with "community" as a synonym for "we must make everyone happy". Andreas

exarkun＠twistedmatrix.com

4:13 p.m.

On 03:56 pm, lists@zopyx.com wrote:

...

Am 07.11.09 16:37, schrieb Tarek Ziad�:

...
On Sat, Nov 7, 2009 at 3:57 PM, Andreas Jung <lists@zopyx.com> wrote: [..]

...
- supports too much different versioning schemas. Both schema supported by setuptools and the one proposed by Tarek in some PEP are totally over-engineered. A simple and *enforced* versioning schema is what I want to see. Unfortunately, as long as we have release candidates, and development versions, we need a more complex scheme that MAJOR.MINOR.MICRO. Do we need/want development on PyPI? At least not me.

MAJOR.MINOR.MICRO.PICO + |a-c]1..N

should be good enough.

Please be considerate of the time of other people and read the previous threads on this list about versioning schemes before proposing a new one. If you have done this, then please make it evident by describing how your proposed scheme addresses the problems raised in previous discussions, or how those problems are not important/real/whatever. Thanks. Jean-Paul

Tarek Ziadé

4:13 p.m.

On Sat, Nov 7, 2009 at 4:56 PM, Andreas Jung <lists@zopyx.com> wrote: [..]

...

Do we need/want development on PyPI? At least not me.

MAJOR.MINOR.MICRO.PICO + |a-c]1..N

should be good enough.

PEP 386 is about providing the version scheme so we can compare versions in Distutils when we want to know if a dependency is met (like what setuptools does). So its wider than PyPI : people need to be able to compare development versions as well. So for example, zc.buildout can rely on it for your daily work. [...]

...

"community" does not imply that we can not agree on certain rules and standards for PyPI - otherwise PyPI remains as it sometimes appears - an unflashed package toilet. Python as a quality programming language needs a package repository with some minimum standards - I completely disagree with "community" as a synonym for "we must make everyone happy".

But the philosophy of Python is to provide a multi-paradigm language I think, without forcing any strong rule like this. (unlike Java I guess) My mother (sorry that's the example I have in my mind) is using Python in her university math /statistics lab, and they don't really care about QA. But she might push her software at PyPI one day. She won't if its rejected because she doesn't follow a version scheme, or push a binary release rather than a source one. Its good too have industrial-strength conventions, so we can build industrial-level applications, but I think we need to be careful about the ticket entry for PyPI. Wouldn't be better to set these enforcements in subcommunity like plone.org where it would make a lot of sense to enforce QA for plone packages ? (plone.org has PyPI support) Tarek

Andreas Jung

4:45 p.m.

Am 07.11.09 17:13, schrieb Tarek Ziadé:

...

On Sat, Nov 7, 2009 at 4:56 PM, Andreas Jung <lists@zopyx.com> wrote: [..]

...
Do we need/want development on PyPI? At least not me.

MAJOR.MINOR.MICRO.PICO + |a-c]1..N

should be good enough.

PEP 386 is about providing the version scheme so we can compare versions in Distutils when we want to know if a dependency is met (like what setuptools does).

So its wider than PyPI : people need to be able to compare development versions as well. So for example, zc.buildout can rely on it for your daily work.

ACK for a more necessity of one more complex versioning schema in general (but we don't need to support all variants) but we don't need to support dev packages on PyPI - that's why a stronger version check should be enforced.

...

[...]

...
"community" does not imply that we can not agree on certain rules and standards for PyPI - otherwise PyPI remains as it sometimes appears - an unflashed package toilet. Python as a quality programming language needs a package repository with some minimum standards - I completely disagree with "community" as a synonym for "we must make everyone happy".

But the philosophy of Python is to provide a multi-paradigm language I think, without forcing any strong rule like this. (unlike Java I guess)

My mother (sorry that's the example I have in my mind) is using Python in her university math /statistics lab, and they don't really care about QA.

But she might push her software at PyPI one day. She won't if its rejected because she doesn't follow a version scheme, or push a binary release rather than a source one.

I think your mother (and most others) are smart enough to understand and support a simple versioning schema. Bringing it to the point in different way: "community" does not mean anarchy. Package maintainers have a lot of freedom but as said also a responsibility for their software - otherwise redeclare PyPI as package t***** (I mentioned the word already)

...

Its good too have industrial-strength conventions, so we can build industrial-level applications, but I think we need to be careful about the ticket entry for PyPI.

Wouldn't be better to set these enforcements in subcommunity like plone.org where it would make a lot of sense to enforce QA for plone packages ? (plone.org has PyPI support)

I don't care about subcommunities at this point. PyPI is a central resource to Python. It is essentional for my daily work. It is essential for me that the packages having reasonable metadata. It is essential for me that the packages are available all the time. A certain quality and standards are especially essential to non-professional Python users and developers - nothing is more frustrating for those people than dealing with non-functional packages, undocumented packages or packages of pre-alpha quality. Andreas

Sridhar Ratnakumar

9 Nov 9 Nov

9:32 p.m.

On Sat, 07 Nov 2009 07:37:37 -0800, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...

...
The solution for a better PyPI:

- more checks, more restrictions - every package maintainer uploading something to PyPI should have a certain attitude that PyPI is a public resource where the content should met certain quality criteria and where each package has a certain responsibility to Python community.

More checks would be nice, so we can provide QA rates or something similar. I don't think we should enforce any policy whatsoever though at PyPI. We can't force people that upload distributions to comply with some strict QA rules imho (no binary distro allowed if no sdist is present for example).

I suggest that we check for valid metadata on the uploaded sdists at the least. If you visit http://pypm.activestate.com/ - most failed packages are due to the fact the sdist uploaded by the author misses certain files such as README.txt (that is read by setup.py) or setup.py/PKG-INFO itself. Without such quality policing, I can't see how tools like pip/easy_install could even install the package (let alone doing it in an user-friendly way). -srid

David Lyon

9:48 p.m.

On Mon, 09 Nov 2009 13:32:55 -0800, "Sridhar Ratnakumar" <sridharr@activestate.com> wrote:

...

I suggest that we check for valid metadata on the uploaded sdists at the

...

least. If you visit http://pypm.activestate.com/ - most failed packages are due to the fact the sdist uploaded by the author misses certain files

...

such as README.txt (that is read by setup.py) or setup.py/PKG-INFO itself.

...

Without such quality policing, I can't see how tools like

That may be out of the scope of distutils. Not sure. CPAN does do checks and we've taken it upon ourselves to start writing something similar. Our target is to set up to run daily and package validate run on a server farm. It's located at : http://bitbucket.org/djlyon/pypi-package-testbot/ pip/easy_install

...

could even install the package (let alone doing it in an user-friendly way).

It's a community issue, sure. And it affects everyone when something breaks because of some very minor fault. David

Tarek Ziadé

10:42 p.m.

...

I suggest that we check for valid metadata on the uploaded sdists at the least. If you visit http://pypm.activestate.com/ - most failed packages are due to the fact the sdist uploaded by the author misses certain files such as README.txt (that is read by setup.py) or setup.py/PKG-INFO itself.

Unfortunately we can't run arbitrary code on PyPI. So if someone ships a broken setup.py, there's nothing much we can do unless we are able to run it in some kind of jail. Some work was started with Steve Steiner on that topic, but we're using a buildbot. It's still experimental because running an arbitrary setup.py can fail for many reasons. Another thing: once PEP 345 has the required changes (having metadata fields with platform conditions) we will be able to do some checks without having to run any code for any field located in PKG-INFO In any case, I am still not convinced that these checks should be forced on PyPI side when the sdist is uploaded. I see this as a QA rating, because even if a project's setup.py is great, other things can be wrong in the project's code itself. Tarek On 11/9/09, Sridhar Ratnakumar <sridharr@activestate.com> wrote:

...

On Sat, 07 Nov 2009 07:37:37 -0800, Tarek Ziadé <ziade.tarek@gmail.com> wrote:

...
...
The solution for a better PyPI:

- more checks, more restrictions - every package maintainer uploading something to PyPI should have a certain attitude that PyPI is a public resource where the content should met certain quality criteria and where each package has a certain responsibility to Python community.

More checks would be nice, so we can provide QA rates or something similar. I don't think we should enforce any policy whatsoever though at PyPI. We can't force people that upload distributions to comply with some strict QA rules imho (no binary distro allowed if no sdist is present for example).

I suggest that we check for valid metadata on the uploaded sdists at the least. If you visit http://pypm.activestate.com/ - most failed packages are due to the fact the sdist uploaded by the author misses certain files such as README.txt (that is read by setup.py) or setup.py/PKG-INFO itself.

Without such quality policing, I can't see how tools like pip/easy_install could even install the package (let alone doing it in an user-friendly way).

-srid

-- Tarek Ziadé | http://ziade.org | オープンソースはすごい! | 开源传万世，因有你参与

P.J. Eby

7 Nov 7 Nov

3:39 p.m.

At 03:57 PM 11/7/2009 +0100, Andreas Jung wrote:

...

- supports too much different versioning schemas. Both schema supported by setuptools and the one proposed by Tarek in some PEP are totally over-engineered. A simple and *enforced* versioning schema is what I want to see.

- no more external hosting of packages. If people want their packages listed on Pypi, they should be required to upload their packages on PyPI (no more issues with non-available external server, no more issues with mirroring external servers, no more issues with wrong download URLs within package metadata)

Do note that at least these two requirements of yours are likely to be opposed by some with at least as much force (if not more so) than you are proposing them with.

Kevin Teague

6:12 p.m.

On Nov 7, 2009, at 6:57 AM, Andreas Jung wrote:

...

- no more external hosting of packages. If people want their packages listed on Pypi, they should be required to upload their packages on PyPI (no more issues with non-available external server, no more issues with mirroring external servers, no more issues with wrong download URLs within package metadata)

Amen! Externally hosted packages are a royal PITA and it makes it so much harder to automatically install dependencies. This field should at the least be marked in the Distutils docs with a big "DEPRECATED: DO NOT USE" note.

Kevin Teague

11:18 p.m.

One other really nice thing about the Perl packaging ecosystem is that their standard library is packaged! If there is a bug found in the Perl standard library, it's trivial to upgrade it to a newer release with a bug fix. For example, the recent little distutils snafu would have been a lot less painful for the end user's if we'd been able to get the bug fix with a simple: $ pip install --upgrade distutils In this respect, from an end user perspective, it really feels like you're getting hit with a stick, "Need that distutils fix? Hmm, well, OK, but you're going to have wait another month until we do a full Python release, and then accept all these other unrelated changes if you want that ... " CPAN even informs you if there's a newer release of itself available, and suggests you might like to upgrade: There's a new CPAN.pm version (v1.9402) available! [Current version is v1.7602] You might want to try install Bundle::CPAN reload cpan without quitting the current session. It should be a seamless upgrade while we are running... Buildout has a similar upgrade notification feature, but I don't think pip or easy_install does? One nice thing about setuptools (or now distribute) not having been merged into the standard library is that they're easy to update to newer releases. As we push more of this code down into Distutils, we are making it more difficult to get updates :(

Jannis Leidel

11:32 p.m.

Am 08.11.2009 um 00:18 schrieb Kevin Teague:

...

One other really nice thing about the Perl packaging ecosystem is that their standard library is packaged!

If there is a bug found in the Perl standard library, it's trivial to upgrade it to a newer release with a bug fix. For example, the recent little distutils snafu would have been a lot less painful for the end user's if we'd been able to get the bug fix with a simple:

$ pip install --upgrade distutils

In this respect, from an end user perspective, it really feels like you're getting hit with a stick, "Need that distutils fix? Hmm, well, OK, but you're going to have wait another month until we do a full Python release, and then accept all these other unrelated changes if you want that ... "

CPAN even informs you if there's a newer release of itself available, and suggests you might like to upgrade:

There's a new CPAN.pm version (v1.9402) available! [Current version is v1.7602] You might want to try install Bundle::CPAN reload cpan without quitting the current session. It should be a seamless upgrade while we are running...

Buildout has a similar upgrade notification feature, but I don't think pip or easy_install does? One nice thing about setuptools (or now distribute) not having been merged into the standard library is that they're easy to update to newer releases. As we push more of this code down into Distutils, we are making it more difficult to get updates :(

Oh, intriguing idea, has moving distutils out of Python core been considered before? Jannis

Kevin Teague

8 Nov 8 Nov

12:10 a.m.

On Nov 7, 2009, at 3:32 PM, Jannis Leidel wrote:

...

Oh, intriguing idea, has moving distutils out of Python core been considered before?

To be clear, I'm not suggesting moving anything in or out of the standard library. Just taking what's in the standard library and packaging it up, and allowing for releases of these packages to happen on PyPI (or for people to more easily include VCS checkouts of these packages in their Buildouts or Pip-outs). The main Python interpreter download would still include the same standard library packages (or whatever the python-dev wants to be in there), and they would still be installed in the same way, just with the addition of .egg-info files to make them PEP 376 compliant (right now wsgiref in the standard library has an .egg-info but it's an exception). It's been considered before (Chris Withers was recently asking for it), and perhaps before that. I was kvetching about this on the stdlib- sig last month, where I outlined some of the other problems that not having standard packaging for the standard library presents: http://mail.python.org/pipermail/stdlib-sig/2009-October/000721.html

ssteinerX＠gmail.com

12:23 a.m.

On Nov 7, 2009, at 7:10 PM, Kevin Teague wrote: Being able to update individual modules within the standard library would absolutely rock my world and would have removed the urgency from 2.6.3/4 updates. Just update what's necessary. What a concept. Yes, please. The sooner the better. As long as the testing is at least as thorough, and reverting is very easy, why _wouldn't_ this be a good idea? I gotta say, all this talk of CPAN reminds me of when I first started using Python and was mystified/mortified at the total lack of something similar. I kind of got used to it but this is bringing back the itch in a major way... S

Rakotomandimby Mihamina

9 Nov 9 Nov

6:37 a.m.

11/06/2009 08:53 PM, Guido van Rossum:

...

Python is suffering a lot in the scientific word, because it has not a CPAN-like repository.

One thing I think is important too: Perl has "helpers" that help to distribution-package easily. Python must have too. For example, debian packaging what is on pypi must be straighforward. (same for RPM based distribution and other systems) -- Architecte Informatique chez Blueline/Gulfsat: Administration Systeme, Recherche & Developpement +261 33 11 207 36

Ben Finney

6:57 a.m.

Rakotomandimby Mihamina <mihamina@gulfsat.mg> writes:

...

For example, debian packaging what is on pypi must be straighforward. (same for RPM based distribution and other systems)

Packaging for Debian is much more about following the policy, which deliberately involves human intervention and judgement, and can't very well be automated. RPM, as I understand it, is more lax and a simple RPM can be generated quite automatically. Can you give an example of what you mean: * How straightforward do you find performing the Debian packaging for a Perl package, and what tools do you use to do it? * Would you consider it sufficient for the same (or equivalent) process to apply for Debian packaging of a Python package? -- \ “If consumers even know there's a DRM, what it is, and how it | `\ works, we've already failed.” —Peter Lee, Disney corporation, | _o__) 2005 | Ben Finney

David Cournapeau

7:06 a.m.

Ben Finney wrote:

...

Rakotomandimby Mihamina <mihamina@gulfsat.mg> writes:

...
For example, debian packaging what is on pypi must be straighforward. (same for RPM based distribution and other systems)

Packaging for Debian is much more about following the policy, which deliberately involves human intervention and judgement, and can't very well be automated. RPM, as I understand it, is more lax and a simple RPM can be generated quite automatically.

A .deb can also be automated up to a certain level. Even for packages like numpy, there is not much needed. Also, the official debian policy is mandatory for adoption into official repositories - but if I need to push a simple .deb to colleagues for a new package of mine, I don't want/need to spend that much time. For example, if distutils were to support the --*dir options of autoconf plus all the related metadata to tag files accordingly, it would make the task quite simple. Even for complex package, this would make packaging and package update simpler for package maintainer. If I look at the debian dir for the numpy package (in Ubuntu), there is really not much needed: the .install files to tag which files to install for which package, adding debian specific files (README.debian and co). Most of it if not all could be removed with the corresponding distutils support (which would not be .deb specific in any way, and would help rpm, .pkg and even windows packaging as well though .e.g. nsis files). David

Rakotomandimby Mihamina

7:26 a.m.

11/09/2009 09:57 AM, Ben Finney:

...

...
For example, debian packaging what is on pypi must be straighforward. (same for RPM based distribution and other systems) Packaging for Debian is much more about following the policy, which deliberately involves human intervention and judgement, and can't very well be automated.

As far as I know (I might know nothing;)), and as far as I follow the debian-mentors ML, most of the "problems" is about filesystem hierarchy Python packages are pretty clear on their FS hierarchy, normalizing it should not be that hard. Once it's clear about that, Debian packaging straighforwardness is 80% done.

...

Can you give an example of what you mean: * How straightforward do you find performing the Debian packaging for a Perl package, and what tools do you use to do it? * Would you consider it sufficient for the same (or equivalent) process to apply for Debian packaging of a Python package?

Give me just some time to give a try on an examples. My tries might invalidate my assumptions, but I need to test that on a bunch of Perl modules to confirm. Not just on 2-3. -- Architecte Informatique chez Blueline/Gulfsat: Administration Systeme, Recherche & Developpement +261 33 11 207 36

Ram Rachum

10 Nov 10 Nov

1:39 a.m.

New subject: People want CPAN

Guido van Rossum <guido <at> python.org> writes:

...

I just found this comment on my blog. People have told me this in person too, so I believe it is real pain (even if the solution may be elusive and the suggested solutions may not work). But I don't know how to improve the world. Is the work on distutils-sig going to be enough? Or do we need some other kind of work in addition? Do we need more than PyPI?

--Guido

I didn't see this mentioned in the other comments: Sometimes easy_install just gets stuck with an IncompleteRead error or something like that. (I may be butchering the error's name, sorry.) Then some part of the html of the page gets printed up to a seemingly arbitrary point, and the install fails. I'm not sure what causes that, but I've seen it happening at random for at least a couple of months. Sometimes it works, sometimes it fails with the incomplete read thing, and then if you try again it will fail at the same point of the html page. Try it a few hours later, it might work then. Not nice. (I've seen it happen on both Linux and Windows.) Ram.

Sridhar Ratnakumar

11 Nov 11 Nov

7:32 p.m.

In line with this discussion, I found a document that details the aspects of CPAN that can be used for writing packaging systems in other languages. The author says: over the years people from at least Python, Ruby, and Java communities have approached me or other core CPAN people to ask basically "How did we do it? http://www.cpan.org/misc/ZCAN.html -srid On Fri, 06 Nov 2009 09:53:44 -0800, Guido van Rossum <guido@python.org> wrote:

...

I just found this comment on my blog. People have told me this in person too, so I believe it is real pain (even if the solution may be elusive and the suggested solutions may not work). But I don't know how to improve the world. Is the work on distutils-sig going to be enough? Or do we need some other kind of work in addition? Do we need more than PyPI?

--Guido

---------- Forwarded message ---------- From: dalloliogm <noreply-comment@blogger.com> Date: Fri, Nov 6, 2009 at 8:01 AM Subject: [Neopythonic] New comment on Python in the Scientific World. To: gvanrossum@gmail.com

dalloliogm has left a new comment on your post "Python in the Scientific World":

Python is suffering a lot in the scientific word, because it has not a CPAN-like repository.

PyPI is fine, but it is still far from the level of CPAN, CRAN, Bioconductor, etc..

Scientists who use programming usually have a lot of different interests and approaches, therefore it is really difficult to write a package that can be useful to everyone. Other programming language like Perl and R have repository-like structure which enable people to download packages easily, and to upload new ones and organize them withouth having to worry about having to integrate them into existing packages.

This is what is happening to biopython now: it is a monolitic package that it is supposed to work for any bioinformatic problem; but this is so general that to accomplish that you would need to add a lot of dependencies, to numpy, networkx, suds, any kind of library. However, since easy_install is not as ready yet as the counterparts in other languages, if the biopython developers add too many dependencies, nobody will be able to install it properly, and nobody will use it.

Posted by dalloliogm to Neopythonic at November 6, 2009 8:01 AM

5521

Age (days ago)

5530

Last active (days ago)

List overview

Download

141 comments

36 participants

participants (36)

Alex Grönholm
Andreas Jung
Ben Finney
Bob Ippolito
Brad Allen
Chris Withers
David Cournapeau
David Cournapeau
David Lyon
exarkun＠twistedmatrix.com
Floris Bruynooghe
Georg Brandl
Glyph Lefkowitz
Greg Ewing
Guido van Rossum
Ian Bicking
Jannis Leidel
Jeff Rush
Jeremy Kloth
Jesse Noller
Kaelin Colclasure
Kevin Teague
Lennart Regebro
Michael Sparks
Milind Khadilkar
P.J. Eby
Paul Moore
Pauli Virtanen
Pauli Virtanen
Rakotomandimby Mihamina
Ram Rachum
Robert Kern
Sridhar Ratnakumar
ssteinerX＠gmail.com
Tarek Ziadé
Wolodja Wentland

People want CPAN :-)

David Lyon

David Lyon

David Lyon

David Lyon

David Lyon

Floris Bruynooghe

David Lyon

David Lyon

David Lyon

David Lyon

Kaelin Colclasure

Kaelin Colclasure

Kaelin Colclasure

David Lyon

David Cournapeau

David Lyon

David Cournapeau

David Cournapeau

David Cournapeau

David Lyon

David Cournapeau

David Cournapeau

David Cournapeau

David Cournapeau

ssteinerX＠gmail.com

ssteinerX＠gmail.com

David Cournapeau

David Cournapeau

Pauli Virtanen

David Cournapeau

Floris Bruynooghe

Wolodja Wentland

David Lyon

David Lyon

ssteinerX＠gmail.com

ssteinerX＠gmail.com

Sridhar Ratnakumar

David Lyon

ssteinerX＠gmail.com

Rakotomandimby Mihamina

David Cournapeau

Rakotomandimby Mihamina

Sridhar Ratnakumar

tags

participants (36)