
About sixteen months ago, I launched the SciPy Documentation Project and its Marathon. Dozens pitched in and now numpy docs are rapidly approaching a professional level. The "pink wave" ("Needs Review" status) is at 56% today! There is consensus among doc writers that much of the rest can be labeled in the "unimportant" category, so we're close to starting the review push (hold your fire, there is a web site mod to be done first). We're also nearing the end of the summer, and it's time to look ahead. The path for docs is clear, but the path for SciPy is not. I think our weakest area right now is organization of the project. There is no consensus-based plan for improvement of the whole toward a stated goal, no centralized coordination of work, and no funded work focused on many of our weaknesses, notwithstanding my doc effort and what Enthought does for code. I define success as popular adoption in preference to commercial packages. I believe in vote-with-your-feet: this goal will not be reached until all aspects of the package and its presentation to the world exceed those of our commercial competition. Scipy is now a grass roots effort, but that takes it only so far. Other projects, such as OpenOffice and Sage, don't follow this model and do produce quality products that compete with commercial offerings, at least on open-source platforms. Before we can even hope for that, we have to do the following: - Docs - Rest of numpy reference pages reviewed and proofed or marked unimportant - Scipy reference pages - User manual for the whole toolstack - Multiple commercial books - Packaging - Personal Package Archive or equivalent for every release of every OS for the full toolstack (There are tools that do this but we don't use them. NSF requires Metronome - http://nmi.cs.wisc.edu/ - for funding most development grants, so right now we're not even on NSF's radar.) - Track record of having the whole toolstack installation "just work" in a few command lines or clicks for *everyone* - Regular, scheduled releases of numpy and scipy - Coordinated releases of numpy, scipy, and stable scikits into PPA system - Public communication - A real marketing plan - Executing on that plan - Web site geared toward multiple audiences, run by experts at that kind of communication - More webinars, conference booths, training, aimed at all levels - Demos, testimonials, topical forums, all showcased - Code - A full design review for numpy 2.0 - No more inconsistencies like median(), lacking "out", degrees option for angle functions? - Trimming of financial functions, maybe others, from numpy? - Package structure review (eliminate "fromnumeric"?) - Goal that this be the last breakage for numpy API (the real 1.0) - Scipy - Is it maintainable? should it be broken up? - Clear code addition path (or decide never to add more) - Docs (see above) - Add-on packages - Both existence of and good indexing/integration/support for field-specific packages - Clearer development path for new packages - Central hosting system for packages (svn, mailing lists, web, build integration, etc.) - Simultaneous releases of stable packages along with numpy/scipy I posted a basic improvement plan some years back. The core ideas have not changed; it is linked from the bottom of http://scipy.org/Developer_Zone. I chose our major weakness to begin with and started the doc project, using some money I could justify spending simply for the utility of docs for my own research. I funded the work of two doc coordinators, one each this summer and last. Looking at http://docs.scipy.org/numpy/stats/, you can see that when a doc coordinator was being paid (summers), work got done. When not, then not. Without publicly announcing what these guys made, I'll be the first to admit that it wasn't a lot. Yet, those small sums bought a huge contribution to numpy through the work of several dozen volunteers and the major contributions of a few. My conclusion is that active and constant coordination is central to motivating volunteer work, and that without a salary we cannot depend on coordination remaining active. On the other hand, I have heard Enthought's leaders bemoan the high cost of devoting employee time to this project, and the low returns available from selling support to universities and non-profit research institutes. Their leadership has moved us forward, particularly in the area of code, but has not provided the momentum necessary to carry us forward on all fronts. It is time for the public and education sectors to kick in some resources and organizational leadership. We are, after all, benefitting immensely. Since the cost of employee time is not so high for us in the public and education sectors, I propose to continue hiring people like Stefan and David as UCF employees or contractors, and to expand to hiring others in areas like packaging and marketing, provided that funding for those hires can be found. However, my grant situation is no longer as rich as it has been the past two years, and the needs going forward are greater than in the past if we're now to tackle all the points above. So, I will not be hiring another doc guru from my research grants next year. I am confident that others are willing to pitch in financially, but few will pitch in a full FTE, and we need several. We can (and will) set up a donations site, but donation sites tend to receive pizza money unless a sugar daddy comes along. Those benefitting most from the software, notably education, non-profit research, and government institutions, are *forbidden* from making donations by the terms of their grants. NSF doesn't give you money so you can give it away. We need to provide services they can buy on subcontract and a means for handling payments from them. Selling support does not solve the problem, as that requires spending most of the income on servicing that particular client. Rather, we need to sell a chunk of documentation or the packaging of a particular release, and then provide the product not just to that client but to everyone. We can also propose directly for federal and corporate grant funds. I have spoken with several NASA and NSF program managers and with Google's Federal Accounts Representative, and the possibilities for funding are good. But, I am not going to do this alone. We need a strong proposal team to be credible. So, I am seeking a group that is willing to work with me to put up the infrastructure of a funded project, to write grant proposals, and to coordinate a financial effort. Members of this group must have a track record of funded grants, business success, foundation support, etc. We might call it the SciPy Foundation. It could be based at UCF, which has a low overhead rate and has infrastructure (like an HR staff), or it might be independent if we can find a good director willing to devote significant time for relatively low pay compared to what they can likely make elsewhere. I would envision hiring permanent coordinators for docs, packaging, and marketing communications. Enthought appears to have code covered by virtue of having hired Travis, Robert, etc.; how to integrate that with this effort is an open question but not a difficult one, I think, as code is our strongest asset at this point. I invite discussion of this approach and the task list above on the scipy-dev@scipy.org mailing list. If you are seeing this post elsewhere, please reply only on scipy-dev@scipy.org. If you are eligible to lead funding proposals and are interested in participating in grant writing and management activities related to work in our weak areas, please contact me directly. Thanks, --jh-- Prof. Joseph Harrington Planetary Sciences Group Department of Physics MAP 414 4000 Central Florida Blvd. University of Central Florida Orlando, FL 32816-2385 jh@physics.ucf.edu planets.ucf.edu

On Fri, Jul 31, 2009 at 12:06, Joe Harrington<jh@physics.ucf.edu> wrote:
Enthought appears to have code covered by virtue of having hired Travis, Robert, etc.;
Eh, what? We work on numpy and scipy in our spare time, just like everyone else. There are rare occasions when a client wants to fund a particular feature, or we need to fix a bug in the course of our work, but that's a far cry from having "code covered". -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Interesting... Now I'm curious to know how many others thought Enthought employees were paid to "keep the code covered"? DG --- On Fri, 7/31/09, Robert Kern <robert.kern@gmail.com> wrote:
From: Robert Kern <robert.kern@gmail.com> Subject: Re: [SciPy-dev] [SciPy-User] SciPy Foundation To: jh@physics.ucf.edu, scipy-dev@scipy.org, "SciPy Users List" <scipy-user@scipy.org> Date: Friday, July 31, 2009, 12:27 PM On Fri, Jul 31, 2009 at 12:06, Joe Harrington<jh@physics.ucf.edu> wrote:
Enthought appears to have code covered by virtue of having hired Travis, Robert, etc.;
Eh, what? We work on numpy and scipy in our spare time, just like everyone else. There are rare occasions when a client wants to fund a particular feature, or we need to fix a bug in the course of our work, but that's a far cry from having "code covered".
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

On Fri, Jul 31, 2009 at 1:37 PM, David Goldsmith <d_l_goldsmith@yahoo.com>wrote:
Interesting... Now I'm curious to know how many others thought Enthought employees were paid to "keep the code covered"?
I always figured they had to scramble to pay the bills. Making a small company go isn't easy-peasy. Chuck

Understood and agreed, but is your point that since code maintenance and generation would fall under the category of "capital," not "operations," consequently our default assumption as outsiders should be that they do not invest in it (except when "operations" necessitate)? DG --- On Fri, 7/31/09, Charles R Harris <charlesr.harris@gmail.com> wrote:
From: Charles R Harris <charlesr.harris@gmail.com> Subject: Re: [SciPy-dev] [SciPy-User] SciPy Foundation To: "SciPy Developers List" <scipy-dev@scipy.org> Date: Friday, July 31, 2009, 12:59 PM
On Fri, Jul 31, 2009 at 1:37 PM, David Goldsmith <d_l_goldsmith@yahoo.com> wrote:
Interesting... Now I'm curious to know how many others thought Enthought employees were paid to "keep the code covered"?
I always figured they had to scramble to pay the bills. Making a small company go isn't easy-peasy.
Chuck
-----Inline Attachment Follows-----
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

On Fri, Jul 31, 2009 at 2:13 PM, David Goldsmith <d_l_goldsmith@yahoo.com>wrote:
Understood and agreed, but is your point that since code maintenance and generation would fall under the category of "capital," not "operations," consequently our default assumption as outsiders should be that they do not invest in it (except when "operations" necessitate)?
I believe they host the svn servers and pay for the bandwidth, so that is a significant investment. They also hire folks from the community who, after all, need to make a living. As to direct investment in code development I think Robert covered it. But I don't know much about what Enthought does, so if you want a definitive statement you will need to ask them. Chuck

Robert wrote:
On Fri, Jul 31, 2009 at 12:06, Joe Harrington<jh@physics.ucf.edu> wrote:
Enthought appears to have code covered by virtue of having hired Travis, Robert, etc.;
Eh, what? We work on numpy and scipy in our spare time, just like everyone else. There are rare occasions when a client wants to fund a particular feature, or we need to fix a bug in the course of our work, but that's a far cry from having "code covered".
Then please accept my profusest apologies! Eric mentioned to me that Enthough had paid significantly for scipy development and I thought that meant a portion of developers' time. Perhaps this was just in the past. --jh--

On Fri, Jul 31, 2009 at 16:04, Joe Harrington<jh@physics.ucf.edu> wrote:
Robert wrote:
On Fri, Jul 31, 2009 at 12:06, Joe Harrington<jh@physics.ucf.edu> wrote:
Enthought appears to have code covered by virtue of having hired Travis, Robert, etc.;
Eh, what? We work on numpy and scipy in our spare time, just like everyone else. There are rare occasions when a client wants to fund a particular feature, or we need to fix a bug in the course of our work, but that's a far cry from having "code covered".
Then please accept my profusest apologies! Eric mentioned to me that Enthough had paid significantly for scipy development and I thought that meant a portion of developers' time. Perhaps this was just in the past.
Still do; it's just not part of our daily duties and is usually focused on what we need, not general maintenance. Not to mention the infrastructure support. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

I am going to play the devil's advocate here -- I'm not into this in order to make myself enemies, I just have some sincere questions. Joe Harrington wrote:
I define success as popular adoption in preference to commercial packages. I believe in vote-with-your-feet: this goal will not be reached until all aspects of the package and its presentation to the world exceed those of our commercial competition. Scipy is now a grass roots effort, but that takes it only so far. Other projects, such as OpenOffice and Sage, don't follow this model and do produce quality products that compete with commercial offerings, at least on open-source platforms. Before we can even hope for that, we have to do the following:
<snip>
- Public communication - A real marketing plan - Executing on that plan - Web site geared toward multiple audiences, run by experts at that kind of communication - More webinars, conference booths, training, aimed at all levels - Demos, testimonials, topical forums, all showcased
A thing OpenOffice.org and Sage both have is a very clear sense of direction and a clearly stated goal. SciPy might also have that for all I know, but I must admit I haven't understood what it is in the past year following the SciPy and NumPy lists, and reading the SciPy site. But I have seen email threads asking what the SciPy goal is, without any clear resolution (?). The website says this: "SciPy is open-source software for mathematics, science, and engineering." Which of course says nothing at all. Someone asked me what SciPy is the other day, and while I more or less "know" when I'd try to look in SciPy for an algorithm (instead of going to, say, R, or netlib.org, or whatever), I was more or less forced to say that it is a "dumping ground for various algorithms people have found useful, with the link being them being either written in Python or wrapped for Python". That's probably an unfair description -- the point is: If one needs to formulate a two- or three-liner about SciPy, what would it be? Is it a goal to reimplement stuff in SciPy that's (for instance) already thriving in the open source R community, or is that not a goal? And so on. You might feel this is going off-topic, but I somehow feel that a very clear sense of direction is paramount when talking of these issues -- just look at the Sage project. Dag Sverre

On Sat, Aug 1, 2009 at 5:57 AM, Dag Sverre Seljebotn<dagss@student.matnat.uio.no> wrote:
I am going to play the devil's advocate here -- I'm not into this in order to make myself enemies, I just have some sincere questions.
Joe Harrington wrote:
I define success as popular adoption in preference to commercial packages. I believe in vote-with-your-feet: this goal will not be reached until all aspects of the package and its presentation to the world exceed those of our commercial competition. Scipy is now a grass roots effort, but that takes it only so far. Other projects, such as OpenOffice and Sage, don't follow this model and do produce quality products that compete with commercial offerings, at least on open-source platforms. Before we can even hope for that, we have to do the following:
<snip>
- Public communication - A real marketing plan - Executing on that plan - Web site geared toward multiple audiences, run by experts at that kind of communication - More webinars, conference booths, training, aimed at all levels - Demos, testimonials, topical forums, all showcased
A thing OpenOffice.org and Sage both have is a very clear sense of direction and a clearly stated goal.
SciPy might also have that for all I know, but I must admit I haven't understood what it is in the past year following the SciPy and NumPy lists, and reading the SciPy site. But I have seen email threads asking what the SciPy goal is, without any clear resolution (?).
The website says this: "SciPy is open-source software for mathematics, science, and engineering."
Which of course says nothing at all. Someone asked me what SciPy is the other day, and while I more or less "know" when I'd try to look in SciPy for an algorithm (instead of going to, say, R, or netlib.org, or whatever), I was more or less forced to say that it is a "dumping ground for various algorithms people have found useful, with the link being them being either written in Python or wrapped for Python".
I think scipy is a pretty much the same as a collection of matlab tool boxes, either with more enhanced basic numerical algorithms (linalg, special, optimize, interpolate, sparse, fft, spatial) or toolboxes with wider applicability (stats including cluster, odr and maxentropy, signal, ndimage+stsci?). This misses weave. Which algorithms are actually included and some of the structure still reflects the "dumping ground for various algorithms people have found useful". And some parts don't look very used. There is still a lot of cleaning and testing to do, but the description as analogy to matlab toolboxes is pretty accurate, if a description by analogy is allowed. E.g. to understand more of scipy.signal, I started to read the help for matlabs signal toolbox. That's my impression of scipy after working my way through some parts of it in the last year.
That's probably an unfair description -- the point is: If one needs to formulate a two- or three-liner about SciPy, what would it be? Is it a goal to reimplement stuff in SciPy that's (for instance) already thriving in the open source R community, or is that not a goal? And so on.
For stats, I consider matlab and maybe gauss for econometrics as benchmark, not the coverage of a specialized language/package like R, but I'm no statistician and I don't know anyone personally that uses R. Josef
You might feel this is going off-topic, but I somehow feel that a very clear sense of direction is paramount when talking of these issues -- just look at the Sage project.
Dag Sverre
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

Dag wrote:
I have seen email threads asking what the SciPy goal is, without any clear resolution (?).
How's this for a goal/mission statement (for SciPy, IDL, and Matlab): (The toolstack) is a professional-quality numerical computation and visualization environment that supports convenient handling of numerical arrays, provides a rich set of basic tools and algorithms for science and engineering, and supports a variety of both general and discipline-specific application software. It is easy for numerically savvy teens to learn, but rich enough to support the most complex of professional applications. It can be run both non-interactively and interactively, with the latter featuring both GUI and rich command-line interfaces. It comes with full documentation, is easy to install and run on all popular platforms, has a strong online user community spanning all disciplines, and has commercial support and consulting. For SciPy, I'd replace the part after the last comma with "is free and open-source, supports cloud computing, and has options for commercial user support and consulting." One could add to the list of general features, such as symbolic manipulation, parallel processing, etc., but it's already getting long. For SciPy, some of this, of course, is not yet true, which is the point of the current thread. Another way of looking at it: For me, SciPy is a replacement for IDL that improves on it in some areas. No more, but no less. That doesn't say what it *is*, since it just begs the question, "what is IDL", but it does identify the space I'd like to see SciPy occupy. It occupies most of the space IDL occupied for me now, except for a few crucial areas. The main one is that enough of my colleagues use it that I can exchange codes with them. A code written in an interpreted language that your colleague does not use is not useful to them. If it's not useful to them, then the interest in your contribution is that much smaller. So, my goal is to make SciPy (the toolstack, not the package) *to them* be what IDL is to them today. That is a lot more than what IDL is to me, since I have more of a knack for computers than most of my colleagues. They need a one-touch install, hold-your-hand docs, GUIs, and so forth. They are also less interested in the linguistic improvements of Python over IDL. Or, they are until they really get coding, which is long after they make the decision to give it a spin. This is a good thing in a way, since it means that once they try it, they *really* like it. Most current SciPy users, I think, are savvy enough about computers that we can work around the shortcomings, but the next round of adopters will always be less savvy than the last, on the whole, hence the need for better and lower-level docs, professional packaging on all platforms, etc. --jh--

On Aug 1, 2009, at 12:20 PM, Joe Harrington wrote:
Dag wrote:
I have seen email threads asking what the SciPy goal is, without any clear resolution (?).
For me, SciPy is a replacement for IDL that improves on it in some areas. No more, but no less.
I have been using python, numpy and matplotlib for a few years as part of my astronomy research. While I find numpy and matplotlib extremely useful, scipy just don't seem to help me much. I think the problem is that it is very unfocused. To me scipy is not a replacement of IDL, it is a python implementation of Numerical Recipes, but it because of its lack of focus it has become very chaotic. So far I have only found use for the integrate.leastsq and spatial.KDTree packages from scipy. Packages like pyfits, pyraf, AstLib, etc. take care of the more astronomy related problems. So I would personally like to see scipy become a package that binds the numpy package to the more field specific packages, by providing numerical methods that are broadly applicable in many fields (i.e. least square minimization, KDTree implementation, Runga-Kutta and other type of integration schemes, differential equation solvers and so on). Making scipy into a tool for science and engineering is in my opinion a to broad a goal. Making into a set of tools that are useable in many fields and thus supporting development of field specific packages is in again my opinion the way to go. It narrows the focus and makes the project more self contained. Cheers Tommy Grav + ----------------------------------------------------------------------------+ Associate Researcher @ Dept. of Physics and Astronomy Johns Hopkins University + ----------------------------------------------------------------------------+ tgrav@pha.jhu.edu (410) 516-7683 http://web.mac.com/tgrav/Astronomy/Welcome.html + ----------------------------------------------------------------------------+

--- On Sat, 8/1/09, Tommy Grav <tgrav@mac.com> wrote:
Making scipy into a tool for science and engineering is in my opinion a to broad a goal. Making into a set of tools that are useable in many fields and thus supporting development of field specific packages is in again my opinion the way to go.
Please clarify what you see as the difference between these two - to me, on the surface of it, your goal statement is no more "focused" nor "self-contained" than Joe's. Perhaps if you clarify what you see as the differences, we all may discover that your vision and Joe's actually aren't that far apart. DG
It narrows the focus and makes the project more self contained.
Cheers Tommy Grav + ----------------------------------------------------------------------------+ Associate Researcher @ Dept. of Physics and Astronomy Johns Hopkins University + ----------------------------------------------------------------------------+ tgrav@pha.jhu.edu (410) 516-7683 http://web.mac.com/tgrav/Astronomy/Welcome.html + ----------------------------------------------------------------------------+
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

On Aug 1, 2009, at 5:56 PM, David Goldsmith wrote:
--- On Sat, 8/1/09, Tommy Grav <tgrav@mac.com> wrote:
Making scipy into a tool for science and engineering is in my opinion a to broad a goal. Making into a set of tools that are useable in many fields and thus supporting development of field specific packages is in again my opinion the way to go.
Please clarify what you see as the difference between these two - to me, on the surface of it, your goal statement is no more "focused" nor "self-contained" than Joe's. Perhaps if you clarify what you see as the differences, we all may discover that your vision and Joe's actually aren't that far apart.
I don't think that Joe and I are that far apart either. My point (very badly formulated) was that trying to make scipy be a replacement for IDL or matlab is in my opinion not the right goal. IDL in particular has a lot of field specific code available in it. I would like to see a structure where scipy provides the underlaying code needed by many fields (like the Numerical Recipes codes) but stay away from providing field specific code. Also scipy should not venture into GUI or provide an interactive environment like IDL (there are other packages that provide this). Just my opinion Tommy Grav

Joe Harrington wrote:
Dag wrote:
I have seen email threads asking what the SciPy goal is, without any clear resolution (?).
For me, SciPy is a replacement for IDL that improves on it in some areas. No more, but no less. That doesn't say what it *is*, since it just begs the question, "what is IDL", but it does identify the space I'd like to see SciPy occupy. It occupies most of the space IDL occupied for me now, except for a few crucial areas. The main one is that enough of my colleagues use it that I can exchange codes with them. A code written in an interpreted language that your colleague does not use is not useful to them. If it's not useful to them, then the interest in your contribution is that much smaller. So, my goal is to make SciPy (the toolstack, not the package) *to them* be what IDL is to them today. That is a lot more than what IDL is to me, since I have more of a knack for computers than most of my colleagues. They need a one-touch install, hold-your-hand docs, GUIs, and so forth. They are also less interested in the linguistic improvements of Python over IDL. Or, they are until they really get coding, which is long after they make the decision to give it a spin. This is a good thing in a way, since it means that once they try it, they *really* like it. Most current SciPy users, I think, are savvy enough about computers that we can work around the shortcomings, but the next round of adopters will always be less savvy than the last, on the whole, hence the need for better and lower-level docs, professional packaging on all platforms, etc.
I really, really want what you seem to want too. BUT, I'll continue my criticism, in the hope that something may come out of it. What you mention above seem to be A LOT of work (in particular "professional packaging on all platforms"), and as others have mentioned partly in conflict with the way people tend to view SciPy currently, and so on. As you say it is indeed the whole stack that is important. Still, part of what you write seems to be an effort to do what many are doing already: - EPD - Sage (currently maths focused, but it does bundle SciPy and integrating it better would ) - SPD (Sage without some of the math libs) - Python(x,y) These all bundle SciPy, but also sets up the whole stack, and can focus on the whole picture. Are you saying that you just want to do it better than these, through a foundation? Wouldn't it be better to direct any funding through one of these existing candidates? This post I've written on the Sage list is very related and is about SciPy vs. Sage: http://groups.google.com/group/sage-devel/msg/78e2a2032042d35b The parent thread is a bit long but lots of related material in there: http://groups.google.com/group/sage-devel/browse_thread/thread/bef2010f45984... Dag Sverre

On Sat, Aug 01, 2009 at 07:49:21PM +0200, Dag Sverre Seljebotn wrote:
As you say it is indeed the whole stack that is important. Still, part of what you write seems to be an effort to do what many are doing already: - EPD - Sage (currently maths focused, but it does bundle SciPy and integrating it better would ) - SPD (Sage without some of the math libs) - Python(x,y)
These all bundle SciPy, but also sets up the whole stack, and can focus on the whole picture.
Are you saying that you just want to do it better than these, through a foundation? Wouldn't it be better to direct any funding through one of these existing candidates?
This post I've written on the Sage list is very related and is about SciPy vs. Sage: http://groups.google.com/group/sage-devel/msg/78e2a2032042d35b
I am jumping in this discussion (something that I have been trying to avoid, because such discussions are very hard to drive to a useful point). I'll try to write a clear e-mail, to the point, however, as the previous discussion you are pointing to does not reflect my needs. On the various usecases and users =================================== I think that the discussion on the Sage mailing list, and a few points of the last e-mails I have seen on this mailing list, miss a very important point for many users of the scipy stack that I see around me: We want a tool, or a set of tools, to build our own entry points. We want more than an IDE like Matlab, Mathematica. We want to be able to use the tools separated, to do data mining on servers log, to build custom applications for eg medical image analysis, or to control a physics experiment (there are a lot of talks at the scipy conference this year on this). Most of the scipy users are "even more applied than applied math" (golly, this sounds almost dirty ;> ). Building a reusable stack is why we need tools to be broken up separating features. Scipy as a community and an umbrella project may benefit from an IDE, like matlab, or a web interface like the amazing one Sage has, but we don't want to bundle these features with the core numerical tools of scipy. Now this might actually concern only a fractions of users. Many users (including me) mostly use the scipy tool stack as a matlab/mathematica replacement. However, these users are not the main code contributors. If somebody develops an algorithm he wants to ship or to share, chances are he wants it not to be bound to a heavy platform, but more to a light core (hey, numpy is even shipped by default on macOSX and many linux distributions nowadays). An integrated environment as an entry point ============================================== Besides building a good set of tools and their documentation, we need to address two separate issues to make life easier for users: building an integrated environment (what I call an entry point) and building distributions. It is tempting to do both at the same time, however, I think that if we collapse the two problems, we are going in the wrong directions: I want to be able to reuse the underlying technology of the integrated environment, for instance to build an astronomic-specific IDE, and I want to be able to contribute modules to it even if those modules are not distributed together. Like many people, my working environment is IPython. It suits my needs, and I get scientific results using it. However, I can see that it is not the best solution to guide a beginner. Inspired by matlab, IDL or mathematica, we have been dreaming of having an IDE for a long while. Last year, Enthought has payed me to start work on making IPython GUI-friendly to plug one of the missing bricks to assembling the tool stack in an IDE. I have been unable to work on this for a year, as it is not a priority for my research, but the effort lives on in the IPython repository, and it would be great to see IDE build upon it, and improve it. An IDE for easy scientific development with Python would bring together tools such as a shell, easy access to documentation, and an editor (reinventing any one of these components might not be necessary). There is EPDLab, which is being developed in the ETS repository. I love the technology stack that it is built upon (ETS provides good tools for building GUIs, and IPython provides an very handy and powerful command line), and I am thus full of hope for EPDLab. I can see however that people might be afraid of using it, let alone contributing to it, as it bares strong Enthought branding. This is a pity, because in this case we have the chance of having a compagny's interest lying in the same direction than the community. For a web environment, the Sage notebook is amazing. Unfortunately last time I looked, it was GPL licensed, which renders it improper for my use, as the tools we use at the lab must be BSD, in order to be able to build (eventually) medical imaging products from them one day. But, from a more pragmatic point of view the simplest thing to do to make it easier for a beginner to get started, would be to improve the documentation on the web. I am not thinking of the specific packages documentation, but more describing how things fit together: giving the workflow, and pointing to the various main packages used for different things. We already have a lot of material on the webpages, but this material is not as 'sexy' as it could be, and not as to-the-point as possible. Sure, this is a lot of work too. Building standard distributions ================================= I am a huge fan of distributions. Every large applied lab I know ends up building a distribution mechanism. Without standard distributions, we cannot reuse each-other's effort to distribute, but also we have huge friction on reusing each-other's tools: installing on your computer may be easy, but if you have to worry whether your non-technical users will succeed in installing a tool, you start wondering whether you want to rely on the tool, or whether you are going to reimplement it. However, the other side of the problem is that distributions could end up developing tools that make use of the tight integration that they provide to solve numerical or usability problems quicker, while locking the users in the distribution. If I want to integrate an algorithm developed by another lab in a medical imaging platform, I cannot afford to drag in Sage, just like I cannot afford R, or Maltab, as they are too big dependencies. An IDE that works only on a distribution is not one that I will rely on for teaching). This is why I believe that every single piece of code in a distribution should be usable outside of this distribution (and I applaud the SPD effort started by Ondrej and the SAGE guys). Concrete suggestions to ease the progress ========================================== Of course providing a consistent environment is a hard problem, but hey, this is a problem many of us face. I believe that we are making progress with many encouraging projects such as Sage, EPD, Python(x,y), or SPD. Establishing scientific environments in Python is an ambitious project; there will not be a one-size-fits-all solution and having many different approaches is healthy, as long as we keep it friendly and learn from all the efforts. I strongly believe that we will be getting more and more satisfactory solutions in the next years. Specifically, I would love to see an official umbrella project for BSD-licensed tools for building scientific projects with Python. As the "scipy" name is well branded (through the website, and the conference), we could call this the 'scipy project'. I would personally like to limit wheel reinvention and have preferred solutions for the various bricks (I am thinking of the unfortunate Chaco versus Matplotlib situation, where I have to depend on both libraries that complement each other). Back to the scipy foundation idea ================================== The idea of the scipy foundation is an idea that has been floating around for a while. If it is manned by a variety of people who express the wills and needs of users and developers of the scipy ecosystem, it could be a great thing. But I see two road blocks: first, as Robert points out, telling somebody what to do will not achieve anything. I am already way too busy scratching my own itches. Second, who will find the time to take care of this? And now, I have to catch up on sleep. Gaël

On Sun, Aug 2, 2009 at 7:52 AM, Gael Varoquaux<gael.varoquaux@normalesup.org> wrote:
Back to the scipy foundation idea ==================================
The idea of the scipy foundation is an idea that has been floating around for a while. If it is manned by a variety of people who express the wills and needs of users and developers of the scipy ecosystem, it could be a great thing. But I see two road blocks: first, as Robert points out, telling somebody what to do will not achieve anything.
To have a foundation, by itself, has no consequence on telling people what to do. It is just a way to have a single point of entry for people who want to interact with the community, and to have the legal right to collect money.
I am already way too busy scratching my own itches. Second, who will find the time to take care of this?
There is an inherent amount of bureaucracy involved with those things, but it does not have to always be done by the same people, and rotation works better than for code I think. David

On Sat, Aug 1, 2009 at 4:52 PM, Gael Varoquaux<gael.varoquaux@normalesup.org> wrote: [...]
For a web environment, the Sage notebook is amazing. Unfortunately last time I looked, it was GPL licensed, which renders it improper for my use, as the tools we use at the lab must be BSD, in order to be able to build (eventually) medical imaging products from them one day.
Actually, in this thread: http://groups.google.com/group/sage-devel/browse_thread/thread/65ca1e0489a0a... most (if not all) contributors to the Sage notebook agreed to release their code as BSD. The same about William being positive to license the build system as BSD too. So we can get lots of done by working on these things together with Sage. Ondrej

On Mon, Aug 03, 2009 at 03:32:31PM -0600, Ondrej Certik wrote:
On Sat, Aug 1, 2009 at 4:52 PM, Gael Varoquaux<gael.varoquaux@normalesup.org> wrote: [...]
For a web environment, the Sage notebook is amazing. Unfortunately last time I looked, it was GPL licensed, which renders it improper for my use, as the tools we use at the lab must be BSD, in order to be able to build (eventually) medical imaging products from them one day.
Actually, in this thread:
http://groups.google.com/group/sage-devel/browse_thread/thread/65ca1e0489a0a...
most (if not all) contributors to the Sage notebook agreed to release their code as BSD.
The same about William being positive to license the build system as BSD too. So we can get lots of done by working on these things together with Sage.
I can see that a lot of good things are coming out of Sage (the current Cython development frenzy was clearly helped by the needs of Sage). It is really nice to see our community (I am talking in the sens of a scientific Python community, agnostic of tools and distribution) growing. Cheers to these guys, that notebook is really amazing! Gaël

On Mon, Aug 3, 2009 at 3:36 PM, Gael Varoquaux<gael.varoquaux@normalesup.org> wrote:
On Mon, Aug 03, 2009 at 03:32:31PM -0600, Ondrej Certik wrote:
On Sat, Aug 1, 2009 at 4:52 PM, Gael Varoquaux<gael.varoquaux@normalesup.org> wrote: [...]
For a web environment, the Sage notebook is amazing. Unfortunately last time I looked, it was GPL licensed, which renders it improper for my use, as the tools we use at the lab must be BSD, in order to be able to build (eventually) medical imaging products from them one day.
Actually, in this thread:
http://groups.google.com/group/sage-devel/browse_thread/thread/65ca1e0489a0a...
most (if not all) contributors to the Sage notebook agreed to release their code as BSD.
The same about William being positive to license the build system as BSD too. So we can get lots of done by working on these things together with Sage.
I can see that a lot of good things are coming out of Sage (the current Cython development frenzy was clearly helped by the needs of Sage). It is really nice to see our community (I am talking in the sens of a scientific Python community, agnostic of tools and distribution) growing.
Cheers to these guys, that notebook is really amazing!
Yep. And Cython is BSD like (resp Apache) license too, so I think that for these basic tools that everyone needs (cython/notebook/build infrustructure) Sage is not against BSD at all. Ondrej

2 cents from an outsider who thought about contributing to scipy/scikits (but didn't (yet)): I think it is a good idea to make scipy easy to use for beginners. However, after reading this thread, I have the impression that it is not the goal to provide state of the art algorithms but rather making Scipy as popular as possible by putting money and effort into the "marketing" of Scipy. Don't get me wrong, I think there are some good reasons why a project should thrive for a large user base. Some of the best projects are popular. Alas, correlation does not imply causality. Me for instance, would rather like to see more efforts to get state of the art algorithms to be implemented in Scipy because that's something that would make a real difference in my research work. Of course, targeting the "clueless Matlab" users is quite pointless if it is that what you are after. IMHO the way to go is to convince experts to implement their research prototypes as part of scipy. Then you really get some "killer applications". I could name a few people who are coding some cool state of the art algorithms but waste so much time because they started coding directly in C++. In the meantime, they could have implemented the algorithms in Python _and_ in C++. If scipy had something really good that Matlab etc. do not have: guess what ppl would do.... What would you need to get experts contribute to scipy instead of hacking their prototype in Matlab or C++? I can't speak for everyone, so I'll just say what I think (and feel): I would instantly start "contributing research prototypes" to scipy if scipy offered: 1) an easy, modular and flexible build system (fortran, c, c++, D, swig, boost:python, cython,...) 2) very low entry barrier for possible contributors: a simple checkout, then ./manage.py startapp mycoolmodule and everything is ready to go ( "Start coding in 5 minutes!") 3) a distributed version control system (e.g. git). SVN really scares me off... 4) standardized unit tests 5) automated documentation generation Then I could simply 1) fork the master branch 2) ./manage.py startapp mycoolmodule 3) adjust config files that were written in ./scipy/mycoolmodule/config.py 4) start coding 5) share the experimental code with collaborators or interested users who are not afraid to use experimental code 6) eventually, when the project has matured, hope that it gets included in the master branch hope that made sense, Sebastian On Mon, Aug 3, 2009 at 11:54 PM, Ondrej Certik<ondrej@certik.cz> wrote:
On Mon, Aug 3, 2009 at 3:36 PM, Gael Varoquaux<gael.varoquaux@normalesup.org> wrote:
On Mon, Aug 03, 2009 at 03:32:31PM -0600, Ondrej Certik wrote:
On Sat, Aug 1, 2009 at 4:52 PM, Gael Varoquaux<gael.varoquaux@normalesup.org> wrote: [...]
For a web environment, the Sage notebook is amazing. Unfortunately last time I looked, it was GPL licensed, which renders it improper for my use, as the tools we use at the lab must be BSD, in order to be able to build (eventually) medical imaging products from them one day.
Actually, in this thread:
http://groups.google.com/group/sage-devel/browse_thread/thread/65ca1e0489a0a...
most (if not all) contributors to the Sage notebook agreed to release their code as BSD.
The same about William being positive to license the build system as BSD too. So we can get lots of done by working on these things together with Sage.
I can see that a lot of good things are coming out of Sage (the current Cython development frenzy was clearly helped by the needs of Sage). It is really nice to see our community (I am talking in the sens of a scientific Python community, agnostic of tools and distribution) growing.
Cheers to these guys, that notebook is really amazing!
Yep. And Cython is BSD like (resp Apache) license too, so I think that for these basic tools that everyone needs (cython/notebook/build infrustructure) Sage is not against BSD at all.
Ondrej _______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

On Tue, Aug 04, 2009 at 10:00:26AM +0200, Sebastian Walter wrote:
Me for instance, would rather like to see more efforts to get state of the art algorithms to be implemented in Scipy because that's something that would make a real difference in my research work.
On this side, we are hiring a talented engineer to work on machine learning in scipy, via the scikit learn. We already have the algorithm, it is a question of QAing them, integrating them in the scikit, writing docs and making releases. Gaël

Sebastian Walter wrote:
2 cents from an outsider who thought about contributing to scipy/scikits (but didn't (yet)):
I think it is a good idea to make scipy easy to use for beginners. However, after reading this thread, I have the impression that it is not the goal to provide state of the art algorithms but rather making Scipy as popular as possible by putting money and effort into the "marketing" of Scipy. Don't get me wrong, I think there are some good reasons why a project should thrive for a large user base. Some of the best projects are popular. Alas, correlation does not imply causality.
Me for instance, would rather like to see more efforts to get state of the art algorithms to be implemented in Scipy because that's something that would make a real difference in my research work. Of course, targeting the "clueless Matlab" users is quite pointless if it is that what you are after.
One point which has not been mentioned concerning matlab-like environment - maybe it is obvious and everyone implicitly acknowledges it, but Mathworks is a 30 years old company, with > 1000 people today. Building something like matlab, with a good GUI and top notch documentation takes a huge amount of resources, of which the 'useful' code is only a fraction. I of course don't know the details of matlab implementation, but I know that for music oriented softwares (which need good UI to sell well, and have non trivial computational requirements, so the comparison is not totally stupid), the graphical code is 80 % of the code. This ratio is consistent with the big open source audio softwares as well (ardour, rosegarden). Worse, being cross platform makes the problem much more difficult. For music softwares market, mac os x is rarely ignored (~ 40-50% of the market I believe), so people need to support two platforms, and that's really a lot of work. For scientific software, I think you can go the non native route for the graphical toolkit, though. Also, very few open source software are successful as far as good GUI are concerned (I don't want to enter into a debate here, but there are good documents/studies on this topic). You need financial incentive for this, so only projects backed up by big companies managed to pull it of. IOW, I am pretty pessimistic about being a 'matlab' clone. We should rather shoot for what makes numpy/scipy better (extensibility, cross platform, actual language, etc...), because really, matlab will always be a much better matlab than us. Price and licensing are not good enough to justify migration - if what you want is a free matlab clone, why not using octave or scilab anyway. That does NOT mean that we should not aim at making the software more accessible. I (and I guess other developers) are definitely interested in a more product-like, integrated stack, to make the barrier of entry lower. I for example am really tired of the installation problems consistently reported. I feel like we cover mac os x and windows pretty well now, but the linux situation is still dreadful. I have a few ideas on how to improve the situation, but they all requires quite a bit of work/infrastructure. I hope that soon, the scenario "I see this cool python script on the internet, it requires this numpy/scipy thing, can I try it in 2 minutes ?" will be a reality.
Then you really get some "killer applications". I could name a few people who are coding some cool state of the art algorithms but waste so much time because they started coding directly in C++. In the meantime, they could have implemented the algorithms in Python _and_ in C++. If scipy had something really good that Matlab etc. do not have: guess what ppl would do....
Yes, there are a lot of people who still don't know that there are languages outside Fortran, C and C++. In my field, I still see some people who implement parsers in C...
1) an easy, modular and flexible build system (fortran, c, c++, D, swig, boost:python, cython,...)
you mean like numscons :) ? Adding D support to numscons should be easy. For example, I added initial cython support in a couple of minutes during the cython talk at SciPy08, adding new languages is relatively easy thanks to scons.
2) very low entry barrier for possible contributors: a simple checkout, then ./manage.py startapp mycoolmodule and everything is ready to go ( "Start coding in 5 minutes!")
there are various pieces to enable this (in place build, develop command of setuptools, virtualenv/pip/easy_install), but yes, the situation is kind of messy. For scikits, that's not so difficult - you should be able to implement a trivial scikit by copying the scikits.example package and starting from there. One problem is that it is technically impossible to build in place and test in one go because of a nose limitation ATM (for some reason, nose fails to import a package if it is in the current directory).
3) a distributed version control system (e.g. git). SVN really scares me off...
That's a sensitive issue, I think we should avoid starting this one here :) Needless to say, you can use git-svn - several core developers use it for numpy/scipy dev, and we distribute an official import: http://projects.scipy.org/numpy/browse_git At least I have not touched svn for numpy/scipy development for > 6 months now, except to check releases when I tag them.
4) standardized unit tests
What do you mean exactly here ? We use nose for testing, what do you consider "non standard".
5) automated documentation generation
It is almost automated now - but an example for scikits is missing in the example package :) cheers, David

On Tue, Aug 4, 2009 at 10:35 AM, David Cournapeau<david@ar.media.kyoto-u.ac.jp> wrote:
Sebastian Walter wrote:
2 cents from an outsider who thought about contributing to scipy/scikits (but didn't (yet)):
I think it is a good idea to make scipy easy to use for beginners. However, after reading this thread, I have the impression that it is not the goal to provide state of the art algorithms but rather making Scipy as popular as possible by putting money and effort into the "marketing" of Scipy. Don't get me wrong, I think there are some good reasons why a project should thrive for a large user base. Some of the best projects are popular. Alas, correlation does not imply causality.
Me for instance, would rather like to see more efforts to get state of the art algorithms to be implemented in Scipy because that's something that would make a real difference in my research work. Of course, targeting the "clueless Matlab" users is quite pointless if it is that what you are after.
One point which has not been mentioned concerning matlab-like environment - maybe it is obvious and everyone implicitly acknowledges it, but Mathworks is a 30 years old company, with > 1000 people today.
Building something like matlab, with a good GUI and top notch documentation takes a huge amount of resources, of which the 'useful' code is only a fraction. I of course don't know the details of matlab implementation, but I know that for music oriented softwares (which need good UI to sell well, and have non trivial computational requirements, so the comparison is not totally stupid), the graphical code is 80 % of the code. This ratio is consistent with the big open source audio softwares as well (ardour, rosegarden). Worse, being cross platform makes the problem much more difficult. For music softwares market, mac os x is rarely ignored (~ 40-50% of the market I believe), so people need to support two platforms, and that's really a lot of work. For scientific software, I think you can go the non native route for the graphical toolkit, though.
Also, very few open source software are successful as far as good GUI are concerned (I don't want to enter into a debate here, but there are good documents/studies on this topic). You need financial incentive for this, so only projects backed up by big companies managed to pull it of.
IOW, I am pretty pessimistic about being a 'matlab' clone. We should rather shoot for what makes numpy/scipy better (extensibility, cross platform, actual language, etc...), because really, matlab will always be a much better matlab than us. Price and licensing are not good enough to justify migration - if what you want is a free matlab clone, why not using octave or scilab anyway.
That does NOT mean that we should not aim at making the software more accessible. I (and I guess other developers) are definitely interested in a more product-like, integrated stack, to make the barrier of entry lower. I for example am really tired of the installation problems consistently reported. I feel like we cover mac os x and windows pretty well now, but the linux situation is still dreadful. I have a few ideas on how to improve the situation, but they all requires quite a bit of work/infrastructure. I hope that soon, the scenario "I see this cool python script on the internet, it requires this numpy/scipy thing, can I try it in 2 minutes ?" will be a reality.
Then you really get some "killer applications". I could name a few people who are coding some cool state of the art algorithms but waste so much time because they started coding directly in C++. In the meantime, they could have implemented the algorithms in Python _and_ in C++. If scipy had something really good that Matlab etc. do not have: guess what ppl would do....
Yes, there are a lot of people who still don't know that there are languages outside Fortran, C and C++. In my field, I still see some people who implement parsers in C...
1) an easy, modular and flexible build system (fortran, c, c++, D, swig, boost:python, cython,...)
you mean like numscons :) ? Adding D support to numscons should be easy. For example, I added initial cython support in a couple of minutes during the cython talk at SciPy08, adding new languages is relatively easy thanks to scons.
2) very low entry barrier for possible contributors: a simple checkout, then ./manage.py startapp mycoolmodule and everything is ready to go ( "Start coding in 5 minutes!")
there are various pieces to enable this (in place build, develop command of setuptools, virtualenv/pip/easy_install), but yes, the situation is kind of messy. For scikits, that's not so difficult - you should be able to implement a trivial scikit by copying the scikits.example package and starting from there.
One problem is that it is technically impossible to build in place and test in one go because of a nose limitation ATM (for some reason, nose fails to import a package if it is in the current directory).
3) a distributed version control system (e.g. git). SVN really scares me off...
That's a sensitive issue, I think we should avoid starting this one here :) Needless to say, you can use git-svn - several core developers use it for numpy/scipy dev, and we distribute an official import:
http://projects.scipy.org/numpy/browse_git
At least I have not touched svn for numpy/scipy development for > 6 months now, except to check releases when I tag them.
4) standardized unit tests
What do you mean exactly here ? We use nose for testing, what do you consider "non standard".
5) automated documentation generation
It is almost automated now - but an example for scikits is missing in the example package :)
Just enumerating what I think would be useful to attract high quality contributors. I'm aware that scipy has already a lot of the features (which is nice). But it would be even nicer to have a really low entry barrier and have a framework that guides you to write good (and documented) code with extensive unit tests, just like the big web frameworks (Django, RoR, ...) It has to be a win-win situation for both the community and the developer.
cheers,
David _______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

At this point I think the question becomes: do we let the (clear) fact that there is not a single set of priorities for where SciPy should be headed (which I do not see as a bad thing at this stage) get in the way of the community moving on *some* proposal (e.g., Joe's, with mods) for *some* "not-for-profit entity" (e.g., a "SciPy Foundation," the original topic of this thread) that will function as an institutional resource for furthering whichever priorities for SciPy should bubble to the surface? In other words, this thread is diverging (into territory necessary to discuss, yes), but can we at least agree (a semi-rhetorical question because I think the answer is clearly "yes") that something along the lines of a "SciPy Foundation" would be useful, certainly for helping us move SciPy where we want it to go, but perhaps also for helping us decide where as well? DG --- On Tue, 8/4/09, Sebastian Walter <sebastian.walter@gmail.com> wrote:
Sebastian Walter wrote:
2 cents from an outsider who thought about contributing to scipy/scikits (but didn't (yet)):
I think it is a good idea to make scipy easy to use for beginners. However, after reading this thread, I have the impression that it is not the goal to provide state of the art algorithms but rather making Scipy as popular as possible by putting money and effort into the "marketing" of Scipy. Don't get me wrong, I think there are some good reasons why a project should thrive for a large user base. Some of the best projects are popular. Alas, correlation does not imply causality.
Me for instance, would rather like to see more efforts to get state of the art algorithms to be implemented in Scipy because that's something that would make a real difference in my research work. Of course, targeting the "clueless Matlab" users is quite
what you are after.
One point which has not been mentioned concerning matlab-like environment - maybe it is obvious and everyone implicitly acknowledges it, but Mathworks is a 30 years old company, with > 1000 people today.
Building something like matlab, with a good GUI and top notch documentation takes a huge amount of resources, of which the 'useful' code is only a fraction. I of course don't know the
implementation, but I know that for music oriented softwares (which need good UI to sell well, and have non trivial computational requirements, so the comparison is not totally stupid), the graphical code is 80 % of the code. This ratio is consistent with the big open
softwares as well (ardour, rosegarden). Worse, being cross platform makes the problem much more difficult. For music softwares market, mac os x is rarely ignored (~ 40-50% of the market I believe), so people need to support two platforms, and that's really a lot of work. For scientific software, I think you can go the non native route for the graphical toolkit, though.
Also, very few open source software are successful as far as good GUI are concerned (I don't want to enter into a debate here, but there are good documents/studies on this topic). You need financial incentive for this, so only projects backed up by big companies managed to pull it of.
IOW, I am pretty pessimistic about being a 'matlab' clone. We should rather shoot for what makes numpy/scipy better (extensibility, cross platform, actual language, etc...), because really, matlab will always be a much better matlab than us. Price and licensing are not good enough to justify migration - if what you want is a free matlab clone, why not using octave or scilab anyway.
That does NOT mean that we should not aim at making
accessible. I (and I guess other developers) are definitely interested in a more product-like, integrated stack, to make the barrier of entry lower. I for example am really tired of the installation problems consistently reported. I feel like we cover mac os x and windows pretty well now, but the linux situation is still dreadful. I have a few ideas on how to improve the situation, but they all requires quite a bit of work/infrastructure. I hope that soon, the scenario "I see this cool python script on the internet, it requires this numpy/scipy thing, can I try it in 2 minutes ?" will be a reality.
Then you really get some "killer applications". I could name a few people who are coding some cool state of the art algorithms but waste so much time because they started coding directly in C++. In the meantime, they could have implemented the algorithms in Python _and_ in C++. If scipy had something really good that Matlab etc. do not have: guess what ppl would do....
Yes, there are a lot of people who still don't know
languages outside Fortran, C and C++. In my field, I still see some people who implement parsers in C...
1) an easy, modular and flexible build system (fortran, c, c++, D, swig, boost:python, cython,...)
you mean like numscons :) ? Adding D support to numscons should be easy. For example, I added initial cython support in a couple of minutes during the cython talk at SciPy08, adding new languages is relatively easy thanks to scons.
2) very low entry barrier for possible contributors: a simple checkout, then ./manage.py startapp mycoolmodule and everything is ready to go ( "Start coding in 5 minutes!")
there are various pieces to enable this (in place build, develop command of setuptools, virtualenv/pip/easy_install), but yes,
kind of messy. For scikits, that's not so difficult - you should be able to implement a trivial scikit by copying the scikits.example package and starting from there.
One problem is that it is technically impossible to build in place and test in one go because of a nose limitation ATM (for some reason, nose fails to import a package if it is in the current
From: Sebastian Walter <sebastian.walter@gmail.com> Subject: Re: [SciPy-dev] SciPy Foundation To: "SciPy Developers List" <scipy-dev@scipy.org> Date: Tuesday, August 4, 2009, 2:25 AM On Tue, Aug 4, 2009 at 10:35 AM, David Cournapeau<david@ar.media.kyoto-u.ac.jp> wrote: pointless if it is that details of matlab source audio the software more that there are the situation is directory).
3) a distributed version control system (e.g.
git). SVN really scares me off...
That's a sensitive issue, I think we should avoid starting this one here :) Needless to say, you can use git-svn - several core developers use it for numpy/scipy dev, and we distribute an official import:
http://projects.scipy.org/numpy/browse_git
At least I have not touched svn for numpy/scipy development for > 6 months now, except to check releases when I tag them.
4) standardized unit tests
What do you mean exactly here ? We use nose for testing, what do you consider "non standard".
5) automated documentation generation
It is almost automated now - but an example for scikits is missing in the example package :)
Just enumerating what I think would be useful to attract high quality contributors. I'm aware that scipy has already a lot of the features (which is nice). But it would be even nicer to have a really low entry barrier and have a framework that guides you to write good (and documented) code with extensive unit tests, just like the big web frameworks (Django, RoR, ...) It has to be a win-win situation for both the community and the developer.
cheers,
David _______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

On Tue, Aug 4, 2009 at 13:53, David Goldsmith<d_l_goldsmith@yahoo.com> wrote:
At this point I think the question becomes: do we let the (clear) fact that there is not a single set of priorities for where SciPy should be headed (which I do not see as a bad thing at this stage) get in the way of the community moving on *some* proposal (e.g., Joe's, with mods) for *some* "not-for-profit entity" (e.g., a "SciPy Foundation," the original topic of this thread) that will function as an institutional resource for furthering whichever priorities for SciPy should bubble to the surface? In other words, this thread is diverging (into territory necessary to discuss, yes), but can we at least agree (a semi-rhetorical question because I think the answer is clearly "yes") that something along the lines of a "SciPy Foundation" would be useful, certainly for helping us move SciPy where we want it to go, but perhaps also for helping us decide where as well?
Perhaps a new name would be in order. I think a lot of the disagreement in vision arises from the fact that a number of the very good ideas about how to encourage the use of Python in the sciences, which could be implemented by the people involved in SciPy-the-project, are being conflated with scipy-the-package. Things like IDEs and GUIs and applications do not fit into scipy-the-package as it currently exists, and changing scipy-the-package such that they do fit in deteriorates what scipy-the-package is good at now. Personally, I see scipy-the-package as something very close in spirit to what GSL is to C: a library of quality numerical algorithms useful to science and engineering. scipy-the-package is not everything that is required to advance Python's use in the sciences. It can't be. A single Python package is the wrong technology for delivering all of that functionality. I think we need to step back and question the question itself. Perhaps we should not be asking "where should scipy(-the-package) be heading?" but "what do we need to do advance Python's use in the sciences?" I don't think a Foundation helps the former much, but I do think the latter would be an excellent mission for one. scipy-the-package is a component of what the Foundation might work one, but I think it would make a huge mistake if it fixated on scipy-the-package and assumed that all of the work it does needs to be jammed into scipy-the-package. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

I fully agree with your analysis Robert. I had this discussion with Eric, and he did mention that it would be useful if the name was reminiscent of 'SciPy', because it is a higly visible name. Should we have a BOF on that at the SciPy conference? Mailing list discussions tend to go in a circle. On Tue, Aug 04, 2009 at 02:37:01PM -0500, Robert Kern wrote:
Perhaps a new name would be in order. I think a lot of the disagreement in vision arises from the fact that a number of the very good ideas about how to encourage the use of Python in the sciences, which could be implemented by the people involved in SciPy-the-project, are being conflated with scipy-the-package. Things like IDEs and GUIs and applications do not fit into scipy-the-package as it currently exists, and changing scipy-the-package such that they do fit in deteriorates what scipy-the-package is good at now.
Personally, I see scipy-the-package as something very close in spirit to what GSL is to C: a library of quality numerical algorithms useful to science and engineering. scipy-the-package is not everything that is required to advance Python's use in the sciences. It can't be. A single Python package is the wrong technology for delivering all of that functionality.
I think we need to step back and question the question itself. Perhaps we should not be asking "where should scipy(-the-package) be heading?" but "what do we need to do advance Python's use in the sciences?" I don't think a Foundation helps the former much, but I do think the latter would be an excellent mission for one. scipy-the-package is a component of what the Foundation might work one, but I think it would make a huge mistake if it fixated on scipy-the-package and assumed that all of the work it does needs to be jammed into scipy-the-package.

On Tue, Aug 4, 2009 at 14:41, Gael Varoquaux<gael.varoquaux@normalesup.org> wrote:
I fully agree with your analysis Robert.
I had this discussion with Eric, and he did mention that it would be useful if the name was reminiscent of 'SciPy', because it is a higly visible name.
Should we have a BOF on that at the SciPy conference? Mailing list discussions tend to go in a circle.
We could get a bikeshed, some paint, and some brushes. Everyone who wants to contribute an idea must paint it on the bikeshed. I like it. Anyways, it could probably even be called the SciPy Foundation as long as the introductory material was very explicit about its relationship to scipy-the-package and the founding members use language carefully. Tricky, but doable. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Tue, Aug 4, 2009 at 14:45, Robert Kern<robert.kern@gmail.com> wrote:
We could get a bikeshed, some paint, and some brushes. Everyone who wants to contribute an idea must paint it on the bikeshed.
In their preferred color, of course. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Tue, Aug 4, 2009 at 1:48 PM, Robert Kern<robert.kern@gmail.com> wrote:
On Tue, Aug 4, 2009 at 14:45, Robert Kern<robert.kern@gmail.com> wrote:
We could get a bikeshed, some paint, and some brushes. Everyone who wants to contribute an idea must paint it on the bikeshed.
In their preferred color, of course.
Maybe everyone should bring a bike too. Ondrej

On Tue, Aug 4, 2009 at 3:28 PM, Ondrej Certik <ondrej@certik.cz> wrote:
On Tue, Aug 4, 2009 at 1:48 PM, Robert Kern<robert.kern@gmail.com> wrote:
On Tue, Aug 4, 2009 at 14:45, Robert Kern<robert.kern@gmail.com> wrote:
We could get a bikeshed, some paint, and some brushes. Everyone who wants to contribute an idea must paint it on the bikeshed.
In their preferred color, of course.
Maybe everyone should bring a bike too.
It would be nice if the hotels would offer rental bikes. As is, the bike stores are far enough away that getting the bike and dropping it off on departure, is too much hassle. Chuck

On Tue, Aug 04, 2009 at 03:47:21PM -0600, Charles R Harris wrote:
On Tue, Aug 4, 2009 at 3:28 PM, Ondrej Certik <[1]ondrej@certik.cz> wrote:
On Tue, Aug 4, 2009 at 1:48 PM, Robert Kern<[2]robert.kern@gmail.com> wrote: > On Tue, Aug 4, 2009 at 14:45, Robert Kern<[3]robert.kern@gmail.com> wrote: >> We could get a bikeshed, some paint, and some brushes. Everyone who >> wants to contribute an idea must paint it on the bikeshed.
> In their preferred color, of course.
Maybe everyone should bring a bike too.
It would be nice if the hotels would offer rental bikes. As is, the bike stores are far enough away that getting the bike and dropping it off on departure, is too much hassle.
The good news is that you'll find a bikeshed to shelter your bike when you get there. Actually, I suggest some people bring bikesheds too. As I am unsure of my favorite color. Gaël

Hi Joe, On Sat, Aug 1, 2009 at 2:06 AM, Joe Harrington<jh@physics.ucf.edu> wrote:
I define success as popular adoption in preference to commercial packages. I believe in vote-with-your-feet: this goal will not be reached until all aspects of the package and its presentation to the world exceed those of our commercial competition. Scipy is now a grass roots effort, but that takes it only so far. Other projects, such as OpenOffice and Sage, don't follow this model and do produce quality products that compete with commercial offerings, at least on open-source platforms.
I am not sure openoffice is a good example, but I share the sentiment that something is missing in the organization of the community. I think it is very important to keep in mind that in any open source project, telling people what to do does not work well. Not everybody will share the same goals, are interested in scipy in the same way, etc... So any structure should help people doing what they want for scipy's sake, but above all, should not alienate anyone who would have worked on scipy otherwise. It may just be rhetoric, but saying that "it would be nice for scipy to have this goal" instead of "we should do this" matters IMHO. Some of the things I am missing: - no quantifiable feedback from users: if we want to work on a set of features, we cannot prioritize. Likewise, we have very little statistics on usage, platforms, etc... OTOH, this is often hard to obtain for open source projects. - a scipy foundation: several times already, I have been asked privately to do add some feature to scipy, generally things which takes a few hours max, in exchange for some money. It is too much of a hassle to set up things to get money for a few hours work, and frankly, for a few hours, I would prefer to ask people to give money to a scipy foundation instead. Something like the R foundation (http://www.r-project.org/foundation/main.html). A foundation with a legal status would make the situation much easier w.r.t donations I believe. It should not be that hard to set up. - website: I think the root of the problem is lack of a dedicated person for it, a person with design skills ideally, to design a coherent graphic "chart" (not sure about the exact English word), etc... I don't know how to get volunteers for this: it seems like many projects manage to have such volunteers. About the more particular points you raised:
- Packaging - Personal Package Archive or equivalent for every release of every OS for the full toolstack (There are tools that do this but we don't use them. NSF requires Metronome - http://nmi.cs.wisc.edu/ - for funding most development grants, so right now we're not even on NSF's radar.) - Track record of having the whole toolstack installation "just work" in a few command lines or clicks for *everyone* - Regular, scheduled releases of numpy and scipy - Coordinated releases of numpy, scipy, and stable scikits into PPA system
The problem of packaging is that it is hard to do well, but has no technically challenging part in it. And it usually does not fall into the "scratching ones' itch", because once you know how to build the software, you are done and usually want to start using the damn thing. Worse, it needs to be done every-time (every release). So this is fundamentally different than doc: having done a great packaging work for version N is useless after N+1 is out. It does not make sense to pay someone to do it once. Having some infrastructure would help: for example, something which automatically builds packages on a set of supported platforms. It has to be 100 % automatic, so that pushing one button get you the sources, build the package, install it, and test it. This costs money and time to set up.
- Public communication - A real marketing plan - Executing on that plan - Web site geared toward multiple audiences, run by experts at that kind of communication - More webinars, conference booths, training, aimed at all levels - Demos, testimonials, topical forums, all showcased
Concerning communication with users, I think that the mailing lists do not work well. It is ok for development, but it kinda sucks for helping average users. Since I have been working on the dark side for numpy/scipy- windows, I have been regularly using stackoverflow to ask for some obscure windows stuff. stackoverflow is a a mix between a FAQ and wikipedia. It works extremely well, and the user experience is way above anything I have seen in this vein. Something like this to use for scipy/numpy would be extremely useful I believe. It is vastly superior to ML or wiki for focused problems ("how to do this in matlab", "how to install on this linux distribution", etc...). As an example of usage, R has recently used the main website so that the most upvoted N R questions would be answered by R core developers (during a R conference I believe). This all feels much better than ML to me (again, as far as average user usage is concerned, not for developer communication). One website to handle all the user community, no need for complicated forum rules and all (everything works with search and tags). Stackoverflow works without any fixed hierarchy for many times more participants that we will ever have, and much broader topics than us. They will have soon a dedicated solution for custom websites using the same stack - maybe something can be worked on as a open source project. David

[Replying only on scipy-dev, per the original post.] David wrote:
I think it is very important to keep in mind that in any open source project, telling people what to do does not work well. Not everybody will share the same goals, are interested in scipy in the same way, etc... So any structure should help people doing what they want for scipy's sake, but above all, should not alienate anyone who would have worked on scipy otherwise. It may just be rhetoric, but saying that "it would be nice for scipy to have this goal" instead of "we should do this" matters IMHO.
I think (hope!) that everyone understands that anything posted here is a personal opinion and that none of us feels we are in a position to give orders. Nobody is boss or supervisor to the whole list. When I write, "We need...," of course I am writing "It is my opinion that we need," etc., but that gets tedious both to write and to read. Visions should be bold. That said, there do need to be goals, standards, etc. Those do translate into telling people what to do. I think the key point is that it must be the community, not any individual, that does the telling. For example, we are engaged in a discussion of a plan I floated. The list I posted is "my plan", but already we've added code to the funding umbrella and no doubt there will be more changes (I fully expected Robert Kern to flip out about my suggestion to remove functions from numpy...maybe he didn't read that far...I expect to lose that one.:-). I think that once it's the community's plan, we can say no to contributions that don't fit, that conflict with others, that are too slow or insufficient, and so on, because we will have the critical mass to replace those contributions with ones the community thinks are better. We see this already with the vigilant rejection of change requests to the numpy API and the review comment system on the doc wiki. We can and have to say no occasionally, to maintain our direction and our standards. We just have to be careful about it and make sure it is based on established community goals and norms, not one person's random opinion. More on some of your other points later... --jh--

On 07/31/09 22:36, Joe Harrington wrote:
About sixteen months ago, I launched the SciPy Documentation Project and its Marathon. Dozens pitched in and now numpy docs are rapidly approaching a professional level. The "pink wave" ("Needs Review" status) is at 56% today! There is consensus among doc writers that much of the rest can be labeled in the "unimportant" category, so we're close to starting the review push (hold your fire, there is a web site mod to be done first).
We're also nearing the end of the summer, and it's time to look ahead. The path for docs is clear, but the path for SciPy is not. I think our weakest area right now is organization of the project. There is no consensus-based plan for improvement of the whole toward a stated goal, no centralized coordination of work, and no funded work focused on many of our weaknesses, notwithstanding my doc effort and what Enthought does for code.
Thank you for your efforts! I believe I will be able to help this effort in various ways over the next few years from India as part of a large government grant. I do not have the time to discuss it here at the moment but I will be at SciPy09 and would love to discuss it there in person. I will also be talking briefly about our overall goals there. Specifically see: http://conference.scipy.org/abstract?id=13 regards, prabhu

I've finally had time to look at all the replies to this thread. There were dozens, so rather than quoting and responding to everyone individually, I'll summarize. The short version is that due to an early misunderstanding, we spent a lot of bandwidth generating agreement that masqueraded as dissent! In the end, I think we have general agreement (and no specific dissent) to the idea of an organization dedicated to development of scientific tools in Python and gathering and disbursing funds to that end. We even agree on our major priorities. So, I propose that we move forward with planning. There's a BoF proposal at the end. Here's the longer version: 1. Objection: The mission statement stuffs too much into one package. The scipy package doesn't need a GUI! (Long post by Gael 2009-08-01 22:52:16, shorter one by Robert 2009-08-04 19:37:01, many others.) My apologies to these fine gentlemen and others who discussed on this threadlet, but this was a bit of a bandwidth waster since I started my proposed mission statement with "(The toolstack)", not "SciPy" or "scipy". Of course nobody would go to such lengths just for one package, nor propose stuffing so much into it that exists elsewhere and is in wide use already (GUIs, interactive shells, etc.). We're talking broadly about scientific use of Python. Robert proposed a name change to avoid such ambiguity. SciPython? SciPyStack? Py4Sci? Scientific Python is taken. I really prefer SciPy, as it has branding already, but perhaps SciPyStack is ok informally. I think we're stuck with SciPy for formal docs, web site, etc, just like JPL (which has not studied the propulsion of jets or rocket engines for decades). What this means is: a. POSTERS BE CLEAR: specify package or toolstack when you talk about scipy. Use "SciPy" for the toolstack and "scipy" for the package, but don't rely on that alone. (note: I did this!) b. RESPONDENTS BE CAREFUL: double-check what the poster wrote before replying if it's about "scipy" or "SciPy". 2. It's important for the package structure to be light. Yes! I am not proposing to change the package structure at all. People need to be able to pick and choose, and it needs to be light for many reasons, such as OLPC. However, as a practical matter, I know of *nobody* who is a heavy user and who does not install a significant number of packages. We install about 15 python-related packages now for our group. It has become a nightmare that takes my very experienced system manager, an Ubuntu developer with a PhD in computer science, several days. Basically, if you want everything current (e.g., to get recent docs in numpy, or HDF libraries that actually work), it is hard to do a consistent build without doing a lot of patching. Clearly, most potential users cannot tolerate that, or even do it. So, I would like to see packaging *coordination* such that a monolithic install is as trivial for the user as it is to install one package. From my discussion with hundreds of users who are sitting on the sidelines in my discipline alone, this and docs are essentially what they are waiting for. Done right, I think most of the relevant package authors would welcome the opportunity to coordinate (but I don't speak for them). Exactly what and how is a matter to discuss but let's get the overall project structure settled first. 3. This is going to be a lot of work, particularly IDEs and GUIs! I don't want to burn out or hurt my career. People should not burn out or hurt their careers on service projects! It's the first rule of academia. There are tons of workers who will happily contribute small bits if they were served in nice-sized chunks and integrated by someone when finished. There are lots more willing to work for pay, or even partial pay. This proposal is a way of moving to that model, which might also be called "many hands make light work". I think the doc project proves the viability of the paid-coordinator model. For IDEs and GUIs, there are good starts already. With enough momentum, we can directly fund development to provide something better. 4. Why not use Sage/EPD/etc.? Those solve the monolithic packaging problem, usually inelegantly but that's the only way to do it today. There is plenty broken in our own house before we even get to the monolithic packaging problem, like missing documentation, code cleanups, API stabilization/rationalization, and getting packages to build together for all platforms. Once that's done, Sage's and our goals might well merge. Still, Sage has its own focus, and it is not scientific modeling and data analysis. EPD focuses on Windows. My ideal would be that our much-improved packaging makes rolling a monolithic distro for a particular purpose much easier, in some cases as easy as publishing a meta-package that pulls in what you want as dependencies. Then STScI can release an astronomy distro, someone else can release a neuroscience distro, and Sage can release their thing for math, all benefitting from a toolstack that builds cleanly together. 5. Packaging is hard. What we need is (long description of packaging needs)... What we need is fully-automated builds that populate PPAs on all platforms for every version and a nightly snapshot of every package, and tests run nightly that show they still work together. There is a tool that does this. It was funded by the US National Science Foundation and is required for applicants to many of their grant programs. It is called metronome, formerly NMI Build and Test Suite: http://nmi.cs.wisc.edu/ At last count, they build on 46 platforms. Getting there will be hard. That is what money is for. Story: When I was a freshman in 1984, there was a free student computing system at MIT called Multics that was run by a student group. Your account had a certain amount of "money", which it charged for CPU usage and printing. When you ran out, you asked for more "money" and got it for free. There was a sign on the door to the group's office that said, "If you need more money, use the request-extension command." But, it was done with funky colors and words going every which way, and to me it initially read, "If you need more, use money, the request-extension command." I've looked at money in a different light ever since... Gael Varoquaux 2009-08-01 22:52:16 GMT writes:
Specifically, I would love to see an official umbrella project for BSD-licensed tools for building scientific projects with Python. As the "scipy" name is well branded (through the website, and the conference), we could call this the 'scipy project'. I would personally like to limit wheel reinvention and have preferred solutions for the various bricks (I am thinking of the unfortunate Chaco versus Matplotlib situation, where I have to depend on both libraries that complement each other).
This is exactly what I am proposing. Pretty much everything else in the message was based on a misunderstandings of my intent about package vs. toolstack. I would not limit it to BSD-licensed tools, but would want that to continue to be a requirement for the core stuff, and likely for grants we would write. In other words, if a benefactor came along wanting to give some cash to a field-specific project that was under GPL, fine, I'd be glad to funnel their money to the developers.
first, as Robert points out, telling somebody what to do will not achieve anything. I am already way too busy scratching my own itches.
It works well if you are paying them. What is amazing (witness the doc project) is that if just one person is paid to organize an area, lots of people flock to the project and pitch in doing small tasks. Not everyone. Not even most people. But enough. That is what the funding is for. It's the request_extension command! Specifically, extension of effort on the part of someone who would otherwise find other uses for their time.
Second, who will find the time to take care of this?
I've been doing it since Spring 2008 for the doc project. Hopefully a few others will join me so we can write some grants, start a funding organization, and launch something more permanent and far-reaching. I've proposed a BoF on this topic. I immodestly think it could be the most important of the meeting. I propose Thursday at 8:30 (I think that 2.5 hours for dinner is too much and that we should start the BoFs much earlier, like at 7:30, so we can do an early and a late set of BoFs. I'm not sure what the reception is. Is it dinner? Or just a delay in the start of dinner?). Alternatively, we can do it Friday over lunch, though that depends on getting some box lunches. Proposed format: Organization, Funding, and Future Direction of SciPy (coordinator: Joe Harrington, sergeant-at-arms: David Goldsmith) I'd like to spend a strict 10 minutes on each of these, cutting off discussion and moving on after each item. After all 6 items, we can continue discussion on any item: * What are our long-term goals? * What are our current strengths and weaknesses? * How is our current Steering Committee/grass-roots model working? * What would we do with funding? Would it require a change in how the community operates? * How can we get funding? * In the large, how should we proceed? --jh--

Hi Joe, just one quick comment : I really think that you cannot use scipy name without certainly creating misunderstandings down the line. It is crazy in my mind to rely on 2 upper/lowercases to differentiate 2 different "objects". I do not like the difference package/toolstack either. For one thing you may have more confusion coming from the non English native speakers than you really wish! Why no Py4Science? It does convey the ultimate goal of this effort, and I only saw it in the context of ipython and matplotlib : first hit from google is http://ipython.scipy.org/moin/Py4Science which was a practical workshop in python usage for scientific work (I think content still lives in matplotlib SVN). anyway, my two cents..... Johann Joe Harrington wrote:
I've finally had time to look at all the replies to this thread. There were dozens, so rather than quoting and responding to everyone individually, I'll summarize. The short version is that due to an early misunderstanding, we spent a lot of bandwidth generating agreement that masqueraded as dissent! In the end, I think we have general agreement (and no specific dissent) to the idea of an organization dedicated to development of scientific tools in Python and gathering and disbursing funds to that end. We even agree on our major priorities. So, I propose that we move forward with planning. There's a BoF proposal at the end.
Here's the longer version:
1. Objection: The mission statement stuffs too much into one package. The scipy package doesn't need a GUI! (Long post by Gael 2009-08-01 22:52:16, shorter one by Robert 2009-08-04 19:37:01, many others.)
My apologies to these fine gentlemen and others who discussed on this threadlet, but this was a bit of a bandwidth waster since I started my proposed mission statement with "(The toolstack)", not "SciPy" or "scipy". Of course nobody would go to such lengths just for one package, nor propose stuffing so much into it that exists elsewhere and is in wide use already (GUIs, interactive shells, etc.). We're talking broadly about scientific use of Python.
Robert proposed a name change to avoid such ambiguity. SciPython? SciPyStack? Py4Sci? Scientific Python is taken. I really prefer SciPy, as it has branding already, but perhaps SciPyStack is ok informally. I think we're stuck with SciPy for formal docs, web site, etc, just like JPL (which has not studied the propulsion of jets or rocket engines for decades). What this means is:
a. POSTERS BE CLEAR: specify package or toolstack when you talk about scipy. Use "SciPy" for the toolstack and "scipy" for the package, but don't rely on that alone. (note: I did this!)
b. RESPONDENTS BE CAREFUL: double-check what the poster wrote before replying if it's about "scipy" or "SciPy".
2. It's important for the package structure to be light.
Yes! I am not proposing to change the package structure at all. People need to be able to pick and choose, and it needs to be light for many reasons, such as OLPC.
However, as a practical matter, I know of *nobody* who is a heavy user and who does not install a significant number of packages. We install about 15 python-related packages now for our group. It has become a nightmare that takes my very experienced system manager, an Ubuntu developer with a PhD in computer science, several days. Basically, if you want everything current (e.g., to get recent docs in numpy, or HDF libraries that actually work), it is hard to do a consistent build without doing a lot of patching. Clearly, most potential users cannot tolerate that, or even do it.
So, I would like to see packaging *coordination* such that a monolithic install is as trivial for the user as it is to install one package. From my discussion with hundreds of users who are sitting on the sidelines in my discipline alone, this and docs are essentially what they are waiting for. Done right, I think most of the relevant package authors would welcome the opportunity to coordinate (but I don't speak for them). Exactly what and how is a matter to discuss but let's get the overall project structure settled first.
3. This is going to be a lot of work, particularly IDEs and GUIs! I don't want to burn out or hurt my career.
People should not burn out or hurt their careers on service projects! It's the first rule of academia. There are tons of workers who will happily contribute small bits if they were served in nice-sized chunks and integrated by someone when finished. There are lots more willing to work for pay, or even partial pay. This proposal is a way of moving to that model, which might also be called "many hands make light work". I think the doc project proves the viability of the paid-coordinator model. For IDEs and GUIs, there are good starts already. With enough momentum, we can directly fund development to provide something better.
4. Why not use Sage/EPD/etc.?
Those solve the monolithic packaging problem, usually inelegantly but that's the only way to do it today. There is plenty broken in our own house before we even get to the monolithic packaging problem, like missing documentation, code cleanups, API stabilization/rationalization, and getting packages to build together for all platforms.
Once that's done, Sage's and our goals might well merge. Still, Sage has its own focus, and it is not scientific modeling and data analysis. EPD focuses on Windows. My ideal would be that our much-improved packaging makes rolling a monolithic distro for a particular purpose much easier, in some cases as easy as publishing a meta-package that pulls in what you want as dependencies. Then STScI can release an astronomy distro, someone else can release a neuroscience distro, and Sage can release their thing for math, all benefitting from a toolstack that builds cleanly together.
5. Packaging is hard. What we need is (long description of packaging needs)...
What we need is fully-automated builds that populate PPAs on all platforms for every version and a nightly snapshot of every package, and tests run nightly that show they still work together. There is a tool that does this. It was funded by the US National Science Foundation and is required for applicants to many of their grant programs. It is called metronome, formerly NMI Build and Test Suite:
At last count, they build on 46 platforms. Getting there will be hard. That is what money is for.
Story: When I was a freshman in 1984, there was a free student computing system at MIT called Multics that was run by a student group. Your account had a certain amount of "money", which it charged for CPU usage and printing. When you ran out, you asked for more "money" and got it for free. There was a sign on the door to the group's office that said, "If you need more money, use the request-extension command." But, it was done with funky colors and words going every which way, and to me it initially read, "If you need more, use money, the request-extension command." I've looked at money in a different light ever since...
Gael Varoquaux 2009-08-01 22:52:16 GMT writes:
Specifically, I would love to see an official umbrella project for BSD-licensed tools for building scientific projects with Python. As the "scipy" name is well branded (through the website, and the conference), we could call this the 'scipy project'. I would personally like to limit wheel reinvention and have preferred solutions for the various bricks (I am thinking of the unfortunate Chaco versus Matplotlib situation, where I have to depend on both libraries that complement each other).
This is exactly what I am proposing. Pretty much everything else in the message was based on a misunderstandings of my intent about package vs. toolstack. I would not limit it to BSD-licensed tools, but would want that to continue to be a requirement for the core stuff, and likely for grants we would write. In other words, if a benefactor came along wanting to give some cash to a field-specific project that was under GPL, fine, I'd be glad to funnel their money to the developers.
first, as Robert points out, telling somebody what to do will not achieve anything. I am already way too busy scratching my own itches.
It works well if you are paying them. What is amazing (witness the doc project) is that if just one person is paid to organize an area, lots of people flock to the project and pitch in doing small tasks. Not everyone. Not even most people. But enough. That is what the funding is for. It's the request_extension command! Specifically, extension of effort on the part of someone who would otherwise find other uses for their time.
Second, who will find the time to take care of this?
I've been doing it since Spring 2008 for the doc project. Hopefully a few others will join me so we can write some grants, start a funding organization, and launch something more permanent and far-reaching.
I've proposed a BoF on this topic. I immodestly think it could be the most important of the meeting. I propose Thursday at 8:30 (I think that 2.5 hours for dinner is too much and that we should start the BoFs much earlier, like at 7:30, so we can do an early and a late set of BoFs. I'm not sure what the reception is. Is it dinner? Or just a delay in the start of dinner?). Alternatively, we can do it Friday over lunch, though that depends on getting some box lunches. Proposed format:
Organization, Funding, and Future Direction of SciPy (coordinator: Joe Harrington, sergeant-at-arms: David Goldsmith)
I'd like to spend a strict 10 minutes on each of these, cutting off discussion and moving on after each item. After all 6 items, we can continue discussion on any item:
* What are our long-term goals? * What are our current strengths and weaknesses? * How is our current Steering Committee/grass-roots model working? * What would we do with funding? Would it require a change in how the community operates? * How can we get funding? * In the large, how should we proceed?
--jh-- _______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

On Sun, Aug 16, 2009 at 09:40, Joe Harrington<jh@physics.ucf.edu> wrote:
EPD focuses on Windows.
Uh, no. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Maybe a confusion with Python(x,y)? I believe it has a very short release cycle (at least in the recent past), which seems to be what Joe would like to see happening for the toolstack, so that many different flavors of science can easily have an updated set of packages that meet their needs. I guess EPD has a much longer release cycle..... Johann Robert Kern wrote:
On Sun, Aug 16, 2009 at 09:40, Joe Harrington<jh@physics.ucf.edu> wrote:
EPD focuses on Windows.
Uh, no.
participants (14)
-
Charles R Harris
-
Dag Sverre Seljebotn
-
David Cournapeau
-
David Cournapeau
-
David Goldsmith
-
Gael Varoquaux
-
Joe Harrington
-
Johann Cohen-Tanugi
-
josef.pktd@gmail.com
-
Ondrej Certik
-
Prabhu Ramachandran
-
Robert Kern
-
Sebastian Walter
-
Tommy Grav