Hey everyone, In preparation for doc-day tomorrow, I've been thinking about scikits and its relationship to scipy. There seem to be three distinct purposes of scikits: 1) A place for specialized tool boxes that build on numpy and/or scipy that live under a common name-space, but are too specialized to live in scipy itself. 2) A place for GPL or similarly-licensed tools so that people can understand what they are getting and appropriate use can be made. 3) A place for modularly-installed scipy tools. Given these three purposes. It seems that we should have three name-spaces: 1) scikits 2) ??? perhaps scigpl 3) scipy (why not use the same namespace --- there is a technological hurdle that may prevent egg distribution of these until we fix it. But, I think we can fix it and would rather not invent another name-space for something that should be scipy). This idea occurred to me, when I was thinking about "tagging" modules in scikits so users could understand the tool. But, at that point it seemed obvious that we should just have different name-spaces which would promote the needed clarity. Ideas. -Travis O.
On Dec 27, 2007 8:34 PM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Hey everyone,
In preparation for doc-day tomorrow, I've been thinking about scikits and its relationship to scipy. There seem to be three distinct purposes of scikits:
1) A place for specialized tool boxes that build on numpy and/or scipy that live under a common name-space, but are too specialized to live in scipy itself. 2) A place for GPL or similarly-licensed tools so that people can understand what they are getting and appropriate use can be made. 3) A place for modularly-installed scipy tools.
Given these three purposes. It seems that we should have three name-spaces:
1) scikits 2) ??? perhaps scigpl
Loses points for ugly :)
3) scipy (why not use the same namespace --- there is a technological hurdle that may prevent egg distribution of these until we fix it. But, I think we can fix it and would rather not invent another name-space for something that should be scipy).
I don't like the idea of the base scipy package being a moving target. Much like the standard library (for any given python version) is a known quantity, I'd like scipy to be the same. Rather than rehash it, I'm just going to copy here for the public discussion the reply I sent in our private chat on this topic. I think it states my view clearly, and others can then disagree. I'm pasting it in full so it reads normally, even if you've already addressed the domain-specific aspects above. Cheers, f. ########## What about domain-specific functionality, for example? I think that it's important that 'scipy version x.y' is a known, fixed quantity, so that installing it means having a well-defined set of tools. But over time, I can foresee lots of domain-specific functionality that is scipy-based being developed, and I simply don't think it's realistic (for many reasons) to pull all of it into scipy itself. Much like Matlab has 'toolboxes' and Mathematica has a simliar concept, I think there's value for the users in having a well-defined location where they can find lots of extra tools that are related to scipy, but somewhat independent of it in their development. The scikits would all honor similar naming conventions and doucumentation, we could have a centralized page listing them so they are easy to find, and users could add (via namespace packages) their own scikits without necessarily having write privileges over the central scipy directory. Basically in my mind the distinction is not "we did a poor job modularising scipy" but rather: - scipy: core library with large amounts of Fortran (as much of netlib as is reasonable) and functionality that can be reasonably considered to be of wide appeal. All of it BSD-compatible. - scikits: toolkits under a single umbrella namespace, easy to find (we can provide tools for this), with unified naming, coding, documentation and example conventions. Domain-specific codes go here, as well as GPL or patent-encumbered codes (but still open source). [Edit: I'm not sure if *any* patent-encumbered code is really a good idea, so perhaps this last sentence should be removed]. In addition, scikits could be the staging area for new projects to be developed until they mature a bit, for eventual inclusion into scipy itself. This would give us a monitoring mechanism to ensure that a contributor is developing a package according to the scipy standards of naming, quality, documentation, etc. while allowing the developer to proceed at his own pace without locking into the scipy release schedule. Eventually if a project turns out to work very well and is deemed of full general interest, it can be folded into scipy itself (like what happened to ElementTree or optik in the stdlib, for example). This way developers can also get users to follow their own release schedule, without the problems we have today with the sandbox (scikits should be available via eggs, so users can easily grab and update scikits they're interested in). For the above scipy.foo discussion, if foo==clustering, it probably belongs in scipy itself (people in all disciplines use that), but a DNA sequence analysis tool that finds clustering patterns directly operating on standard bioinformatics formats should probably be a scikit. I don't know about the others, but I find the above distinction reasonably clear and useful in practice. But perhaps I'm totally missing the mark. Cheers, f
Fernando Perez wrote:
On Dec 27, 2007 8:34 PM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Given these three purposes. It seems that we should have three name-spaces:
1) scikits 2) ??? perhaps scigpl
Loses points for ugly :)
True. Are there any other names: How about: sci_restrict sci_no_bsd sci_gpl scifree (as a nod to Free Software Foundation -- although there may be unintentional negative double entendre). scifsf The point is that it is very useful for users to be able to know that scikits and scipy and scipydev have a BSD or similar license, but "scifree" is GPL-like and creates possible encumbrances for people who use it in their code bases.
3) scipy (why not use the same namespace --- there is a technological hurdle that may prevent egg distribution of these until we fix it. But, I think we can fix it and would rather not invent another name-space for something that should be scipy).
I don't like the idea of the base scipy package being a moving target. Much like the standard library (for any given python version) is a known quantity, I'd like scipy to be the same.
I'm fine with calling #3 something like scipydev. The point is that it would be good if there were some way for a developer of a Scientific ToolKit to indicate their intention when they develop it and for others to do so as well. -Travis
On Dec 27, 2007 10:00 PM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Fernando Perez wrote:
On Dec 27, 2007 8:34 PM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Given these three purposes. It seems that we should have three name-spaces:
1) scikits 2) ??? perhaps scigpl
Loses points for ugly :)
True. Are there any other names:
How about:
sci_restrict sci_no_bsd sci_gpl scifree (as a nod to Free Software Foundation -- although there may be unintentional negative double entendre). scifsf
gscipy?
The point is that it is very useful for users to be able to know that scikits and scipy and scipydev have a BSD or similar license, but "scifree" is GPL-like and creates possible encumbrances for people who use it in their code bases.
I certainly agree on the value of making the bsd/gpl distinction very clear to any new user.
3) scipy (why not use the same namespace --- there is a technological hurdle that may prevent egg distribution of these until we fix it. But, I think we can fix it and would rather not invent another name-space for something that should be scipy).
I don't like the idea of the base scipy package being a moving target. Much like the standard library (for any given python version) is a known quantity, I'd like scipy to be the same.
I'm fine with calling #3 something like scipydev. The point is that it would be good if there were some way for a developer of a Scientific ToolKit to indicate their intention when they develop it and for others to do so as well.
What bothers you about using scikits for standalone packages, even if some of them might eventually become part of scipy proper at some point in the future? I'm not sure I see it... To summarize my take on it at this point, after your input, I'd have this layout: - scipy: fixed package (NOT namespace). All BSD. All components should have fairly broad appeal to a wide audience of scientific users, and code should be reasonably mature (we could argue about how true that is today, but let's not :) - gscipy: extensions to scipy that carry GPL restrictions. This would allow us to better integrate things like GSL, GMP, etc. - scikits: domain-specific toolkits and other self-contained packages that might be at some point candidates for scipy, but aren't yet mature enough to be included in the core. License can be BSD or GPL, per-package. It's a namespace package, so users can install only the components they want and update each independently. Seems clean enough for me... Cheers, f
On 28/12/2007, Fernando Perez <fperez.net@gmail.com> wrote:
- scikits: domain-specific toolkits and other self-contained packages that might be at some point candidates for scipy, but aren't yet mature enough to be included in the core. License can be BSD or GPL, per-package. It's a namespace package, so users can install only the components they want and update each independently.
I think I agree with this - not that I'm a developer, but I hope you won't mind a user's opinion: it's really hard to tell whether a package will eventually become part of scipy or not. Many packages might start out domain-specific but as they mature and flesh out they might generalize and come to be recognized as generally useful. If domain-specific and immature packages are lumped together, no one is forced to make the decision on whether some package will eventually be of general applicability when it will someday be finished. I suppose alternatively, all immature packages could go in one namespace, to be moved to either scipy proper or a domain-specific namespace when they mature. But how often do open-source packages really reach a stable state? Many packages are useful for a sufficiently specific domain even when quite immature; the maturation process usually includes some generalization. Anne
On Dec 27, 2007 10:23 PM, Anne Archibald <peridot.faceted@gmail.com> wrote:
On 28/12/2007, Fernando Perez <fperez.net@gmail.com> wrote:
- scikits: domain-specific toolkits and other self-contained packages that might be at some point candidates for scipy, but aren't yet mature enough to be included in the core. License can be BSD or GPL, per-package. It's a namespace package, so users can install only the components they want and update each independently.
I think I agree with this - not that I'm a developer, but I hope you won't mind a user's opinion: it's really hard to tell whether a
Thanks for your input, which I think provides a useful perspective on the above. But I'd like to make sure that something is clear: - user opinions are *always* welcome and encouraged on these lists. I myself hardly count as a numpy/scipy developer, but that has never stopped me from opening my big mouth, and it shouldn't stop anyone else. This is a community where the lines between users and developers are deliberately blurry and fluid: we expect anyone using these tools to one day be able to contribute something as a developer, and the next day to need help on the lists (Robert Kern excepted :). If at any point a comment of mine made it appear otherwise, I apologize, as it was certainly not my intent. The whole point of working in this type of environment is that contributions are accepted because of their intrinsic value, not because of whose name is behind them. *Credit* (in commit logs, credit files, etc) goes with names as is the standard tradition in academia, but hopefully we'll always acknowledge a good idea regardless of where it comes from. - Having said the above, there are users who have earned massive amounts of good karma due to stellar contributions on these mailing lists, and you are certainly at the very top of that group. Even more reason to always voice your opinion! Cheers, f
Anne Archibald wrote:
On 28/12/2007, Fernando Perez <fperez.net@gmail.com> wrote:
- scikits: domain-specific toolkits and other self-contained packages that might be at some point candidates for scipy, but aren't yet mature enough to be included in the core. License can be BSD or GPL, per-package. It's a namespace package, so users can install only the components they want and update each independently.
I think I agree with this - not that I'm a developer, but I hope you won't mind a user's opinion: it's really hard to tell whether a package will eventually become part of scipy or not. I certainly love users opinions as I think developers usually get things wrong because it is easy to forget what being a user is like.
For certain cases, it is true that whether or not something is general purpose changes over time. But right now it is already the case that we have scikits that should be going into scipy and my big question is why they are not already there. Nothing I have heard alleviates the problem that namespace clarity is designed to address: * for users it will be much saner if common things go into scipy so that we don't end up with more and more ways to do common things like optimization * however, for developers there is no real incentive to move things from scikits into scipy if everything is just lumped together into scikits * there is also no simple way for an outside user to understand whether something in scikits is really slated for inclusion in SciPy or not There really is a difference between the kinds of things that Fernando is lumping into scikits. What prompted me to ask for new namespaces is precisely because I was thinking of proposing "tags" to go along with the packages. But, namespaces seems like a much better idea than adding a new layer called "tags" for the same namespace. -Travis
Fernando Perez wrote:
On Dec 27, 2007 10:00 PM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Fernando Perez wrote:
On Dec 27, 2007 8:34 PM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Given these three purposes. It seems that we should have three name-spaces:
1) scikits 2) ??? perhaps scigpl
Loses points for ugly :)
True. Are there any other names:
How about:
sci_restrict sci_no_bsd sci_gpl scifree (as a nod to Free Software Foundation -- although there may be unintentional negative double entendre). scifsf
gscipy?
+1
The point is that it is very useful for users to be able to know that scikits and scipy and scipydev have a BSD or similar license, but "scifree" is GPL-like and creates possible encumbrances for people who use it in their code bases.
I certainly agree on the value of making the bsd/gpl distinction very clear to any new user.
I'm fine with calling #3 something like scipydev. The point is that it would be good if there were some way for a developer of a Scientific ToolKit to indicate their intention when they develop it and for others to do so as well.
What bothers you about using scikits for standalone packages, even if some of them might eventually become part of scipy proper at some point in the future? I'm not sure I see it... To summarize my take on it at this point, after your input, I'd have this layout:
I think the problem is one of "getting lost" in the scikits. I'd like there to be a way for a developer of a scikit to signal their intention from the start. When looking at the sandbox, I could not tell in many cases what the intent was. I'd rather have the developer of the project be clear about it from the get go. I can see there being hundreds of scikits, and trying to coordinate effort between developers trying to get something into scipy at somepoint is difficult if there is not a way to signal the intention up front. Also, having a name like scipydev tells everybody what the purpose of the project is. Right now, for example, I have no idea why delaunay, openopt, audiolab, and learn are scikits. They do not seem domain specific to me. But, then again, perhaps the developers don't want to put their packages into scipy. If that is the case, then I'd like that to be clear up front and use that to help fix whatever issues are causing scipy to be "unattractive" to a developer of a module with obvious wide-spread appeal.
- scipy: fixed package (NOT namespace). All BSD. All components should have fairly broad appeal to a wide audience of scientific users, and code should be reasonably mature (we could argue about how true that is today, but let's not :)
- gscipy: extensions to scipy that carry GPL restrictions. This would allow us to better integrate things like GSL, GMP, etc.
- scikits: domain-specific toolkits and other self-contained packages that might be at some point candidates for scipy, but aren't yet mature enough to be included in the core. License can be BSD or GPL, per-package. It's a namespace package, so users can install only the components they want and update each independently.
Seems clean enough for me...
Hmm... It looks like we have a subtle difference about what should be in scikits. I would not put any GPL code there once there is a scipy and gscipy. If you want scikits to be a place for the "maturation" process (which is not unreasonable), then there should be gscikits as well so that it is always clear. If I understand correctly, you are arguing that scikits should be both domain specific and a "staging" area and that these roles don't need to be decided on upfront. I'm concerned that if the roles are not decided upon, nothing will ever move into scipy and there will be a whole bunch of disconnected scikits that do very much the same kinds of things (optimization, interpolation, loading files of various formats, etc.) and really should be in scipy, but with no incentive or push to actually get them there, because moving from scikits to scipy offers no benefit to the developer. If instead we restrict scikits to "domain specific" tools and target more general purpose tools for scipy but allow them to be staged and developed at their own pace using the scipydev name-space, then the tools that really should go into scipy will be named that way from the beginning, and the developer incentive will be to get the scipydev off their name as well as getting into the scipy package. It will also make it easier for SciPy developers to understand the intent of "abandoned" projects if things that are being developed are not lost in things that will never be included in SciPy. I think my concern stems from what is there now (in the sandbox) and why much of it has not moved into scipy already. I don't think just moving it all to scikits will fix those things and will still make me and others developing SciPy have to sift through potentially hundreds of scikits to determine the intent. -Travis O.
On Dec 27, 2007 11:12 PM, Travis E. Oliphant <oliphant@enthought.com> wrote:
Fernando Perez wrote:
gscipy?
+1
OK. Settled?
What bothers you about using scikits for standalone packages, even if some of them might eventually become part of scipy proper at some point in the future? I'm not sure I see it... To summarize my take on it at this point, after your input, I'd have this layout:
I think the problem is one of "getting lost" in the scikits. I'd like there to be a way for a developer of a scikit to signal their intention from the start. When looking at the sandbox, I could not tell in many cases what the intent was. I'd rather have the developer of the project be clear about it from the get go.
I can see there being hundreds of scikits, and trying to coordinate effort between developers trying to get something into scipy at somepoint is difficult if there is not a way to signal the intention up front.
Also, having a name like scipydev tells everybody what the purpose of the project is. Right now, for example, I have no idea why delaunay, openopt, audiolab, and learn are scikits. They do not seem domain specific to me. But, then again, perhaps the developers don't want to put their packages into scipy. If that is the case, then I'd like that to be clear up front and use that to help fix whatever issues are causing scipy to be "unattractive" to a developer of a module with obvious wide-spread appeal.
I think I'd answer to your concern here with Anne's recent post. I very much like her argument as to why certain tools may naturally evolve out from a domain-specific one into something more general over time. I realize that you are frustrated by the mess the sandbox became, but I think we shouldn't let that influence our decisions right now. I view that mess more as a historical accident due to lack of guided project management, than an intrinsic flaw of the naming conventions. I think that if we have clear guidelines we agree on, the problem will be naturally avoided.
Hmm... It looks like we have a subtle difference about what should be in scikits. I would not put any GPL code there once there is a scipy and gscipy. If you want scikits to be a place for the "maturation" process (which is not unreasonable), then there should be gscikits as well so that it is always clear.
If I understand correctly, you are arguing that scikits should be both domain specific and a "staging" area and that these roles don't need to be decided on upfront.
Correct (cf Anne's post for more on that view).
I'm concerned that if the roles are not decided upon, nothing will ever move into scipy and there will be a whole bunch of disconnected scikits that do very much the same kinds of things (optimization, interpolation, loading files of various formats, etc.) and really should be in scipy, but with no incentive or push to actually get them there, because moving from scikits to scipy offers no benefit to the developer.
If instead we restrict scikits to "domain specific" tools and target more general purpose tools for scipy but allow them to be staged and developed at their own pace using the scipydev name-space, then the tools that really should go into scipy will be named that way from the beginning, and the developer incentive will be to get the scipydev off their name as well as getting into the scipy package.
It will also make it easier for SciPy developers to understand the intent of "abandoned" projects if things that are being developed are not lost in things that will never be included in SciPy.
I think my concern stems from what is there now (in the sandbox) and why much of it has not moved into scipy already. I don't think just moving it all to scikits will fix those things and will still make me and others developing SciPy have to sift through potentially hundreds of scikits to determine the intent.
Honestly I think the sandbox problem won't reoccur (at least not as badly). I also think that asking developers to commit to 'core scipy' from the get-go may be too much in the beginning, while the suggestion "make a scikit out of it, and if it works after a while and it makes sense, it can be moved into the core where its release cycle will get locked into the rest" may be a bit less intimidating. I also happen not to like a whole lot the idea of yet another namespace: scipy, gscipy and scikits seems enough to me, and a fourth scipydev (and possibly a fifth gscikits) really feels like overkill to me. I think I've stated where I differ from you on this one, so I won't belabor it too much further. I'm not really trying to force the issue, and ultimately if you really prefer the extra namespace I can live with it. Perhaps others can provide their perspective as well... Cheers, f
Also, having a name like scipydev tells everybody what the purpose of the project is. Right now, for example, I have no idea why delaunay, openopt, audiolab, and learn are scikits. They do not seem domain specific to me. But, then again, perhaps the developers don't want to put their packages into scipy. If that is the case, then I'd like that to be clear up front and use that to help fix whatever issues are causing scipy to be "unattractive" to a developer of a module with obvious wide-spread appeal.
I can speak about openopt and learn (but in the future for the latter). For once, I don't think my code is good enough to be in scipy. For openopt, I don't know how it fits in scipy, it depends on how well dmitrey did the branches on the additional external solvers. Although my code is not domain specific (generic optimizers), it is not as easy to use as a simple call to a function (but far more powerful IMHO). Besides it might use an additional matrix library in the future for some modificied decomposition. As for learn, I may put some of my manifold learning stuff in it, and it uses ctypes or SWIG intensively as well as a matrix library (a lot is still done in C++) and it depends on my generic optimizers. So perhaps all that is not my code would fit in Scipy, but I'd like some additional thoughts about it ;). Matthieu -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher
Matthieu Brucher wrote:
Also, having a name like scipydev tells everybody what the purpose of the project is. Right now, for example, I have no idea why delaunay, openopt, audiolab, and learn are scikits. They do not seem domain specific to me. But, then again, perhaps the developers don't want to put their packages into scipy. If that is the case, then I'd like that to be clear up front and use that to help fix whatever issues are causing scipy to be "unattractive" to a developer of a module with obvious wide-spread appeal.
I can speak about openopt and learn (but in the future for the latter). For once, I don't think my code is good enough to be in scipy. For openopt, I don't know how it fits in scipy, it depends on how well dmitrey did the branches on the additional external solvers. Although my code is not domain specific (generic optimizers), it is not as easy to use as a simple call to a function (but far more powerful IMHO). Besides it might use an additional matrix library in the future for some modificied decomposition. As for learn, I may put some of my manifold learning stuff in it, and it uses ctypes or SWIG intensively as well as a matrix library (a lot is still done in C++) and it depends on my generic optimizers.
Thanks for the input. This is the kind of information I was looking for. Here's my current proposal (which is very close to Fernando's I think with one nuance). scipy --- core facilities gscikits --- For GPL encumbered packages regardless of origin or destiny. scikits --- For BSD third-party packages. These may be packages with wide-spread appeal with a different calling convention than scipy or packages that the developers are not done with or just want to keep their own release cycles. Code may come out of here for inclusion into scipy, but it will do so using: scipy-somepackage (imports as scipy.somepackage but is distributed separately) --- Packages that will soon be released with scipy but for now are being distributed alone because they need a faster release cycle. These packages involve the input of SciPy developers more than a scikits package might.
So perhaps all that is not my code would fit in Scipy, but I'd like some additional thoughts about it ;).
It sounds like your code would live in scikits and then if parts should be taken into scipy then they would be through the scipy-somepackage approach. -Travis
Travis E. Oliphant wrote:
gscikits --- For GPL encumbered packages regardless of origin or destiny.
I think this is a misnomer. There are also LGPL-, MPL-, CPL-, CeCILL-, OPL-, etc-encumbered packages, too. I don't see a good reason to not let these packages use the scikits namespace. Every package has its own license; scikits is not a package. It's just not that confusing. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Dec 28, 2007 1:56 AM, Robert Kern <robert.kern@gmail.com> wrote:
gscikits --- For GPL encumbered packages regardless of origin or destiny.
I think this is a misnomer. There are also LGPL-, MPL-, CPL-, CeCILL-, OPL-, etc-encumbered packages, too. I don't see a good reason to not let these packages use the scikits namespace. Every package has its own license; scikits is not a package. It's just not that confusing.
Personally, I don't see why different licenses would necessitate different namespaces either. IMO separating BSD scikits from everything else would needlessly confuse new users and diminish the overall 'scikit' mind share. Is it not fair to say that the distinctions among the various licenses are completely unimportant to the vast majority of the audience scikits is supposed to address? Furthermore, does anyone want to police this sort of policy should 100s of scikits be developed? Travis E. Oliphant wrote:
The point is that it is very useful for users to be able to know that scikits and scipy and scipydev have a BSD or similar license, but "scifree" is GPL-like and creates possible encumbrances for people who use it in their code bases.
I doubt that anyone who's legitimately concerned about such matters is going to trust that scikit.foo is actually BSD code without verifying it personally. -- Nathan Bell wnbell@gmail.com
Nathan Bell wrote:
On Dec 28, 2007 1:56 AM, Robert Kern <robert.kern@gmail.com> wrote:
gscikits --- For GPL encumbered packages regardless of origin or destiny.
I think this is a misnomer. There are also LGPL-, MPL-, CPL-, CeCILL-, OPL-, etc-encumbered packages, too. I don't see a good reason to not let these packages use the scikits namespace. Every package has its own license; scikits is not a package. It's just not that confusing.
Think of it as GPL-inspired scikits, then. I did not mean for gscikits to only include the GPL itself. Sure, scikits is not a package, but it is a "namespace" and that has meaning. The point is what "meaning" do you want it to have. I see great value in a clear separation between GPL-inspired licenses and other licenses.
If these are all awash in the scikits namespace, then it is going to be more difficult for people who would like to be able to use scikits packages but cannot use the GPL to know what they can and can't use.
Personally, I don't see why different licenses would necessitate different namespaces either. IMO separating BSD scikits from everything else would needlessly confuse new users and diminish the overall 'scikit' mind share.
Is it not fair to say that the distinctions among the various licenses are completely unimportant to the vast majority of the audience scikits is supposed to address? All I'm saying that the distinction between the licenses that impose restrictions on what you do with your own code that depends on them and
It is the 'mind share' I'm interested in as well. Right now it is pretty clear that scipy is BSD (or similarly) licensed. It would be productive if that same kind of advertising were available for scikits. licenses that don't do that is important enough to warrant a name-space division.
Furthermore, does anyone want to police this sort of policy should 100s of scikits be developed?
It is much easier to "police" if all you have to do is change the name of the package it gets installed in.
Travis E. Oliphant wrote:
The point is that it is very useful for users to be able to know that scikits and scipy and scipydev have a BSD or similar license, but "scifree" is GPL-like and creates possible encumbrances for people who use it in their code bases.
I doubt that anyone who's legitimately concerned about such matters is going to trust that scikit.foo is actually BSD code without verifying it personally.
Sure, but that same person is more likely not going to touch *any* of scikits if there is GPL code released in it's name-space (even if they are "separate" packages and especially if they are actually hosted on the same svn tree). It gets too murky for people who need to care to spend the time figuring it out --- they'll just go buy the off-the-shelf solution even if it isn't as good in some technical sense. -Travis
On Dec 28, 2007 3:56 AM, Travis E. Oliphant <oliphant@enthought.com> wrote:
do you want it to have. I see great value in a clear separation between GPL-inspired licenses and other licenses.
So does Richard M. Stallman :) Most users don't care about OSS politics.
If these are all awash in the scikits namespace, then it is going to be more difficult for people who would like to be able to use scikits packages but cannot use the GPL to know what they can and can't use.
And therefore a majority of users should be inconvenienced for a small minority?
It is the 'mind share' I'm interested in as well. Right now it is pretty clear that scipy is BSD (or similarly) licensed. It would be productive if that same kind of advertising were available for scikits.
To some, not many.
All I'm saying that the distinction between the licenses that impose restrictions on what you do with your own code that depends on them and licenses that don't do that is important enough to warrant a name-space division.
This simply isn't true. I don't use 'gapt-get' or 'gsourceforge'. I'd be a bit irritated if I had to fire up gapt-get for X but had to use apt-get for Y. Not to mention the potential problem of collisions (scikits.foo and gscikits.foo).
Furthermore, does anyone want to police this sort of policy should 100s of scikits be developed?
It is much easier to "police" if all you have to do is change the name of the package it gets installed in.
So you are going to audit the contents of scikits periodically? Even if scikits has hundreds/thousands of packages and contributors?
Sure, but that same person is more likely not going to touch *any* of scikits if there is GPL code released in it's name-space (even if they are "separate" packages and especially if they are actually hosted on the same svn tree). It gets too murky for people who need to care to spend the time figuring it out --- they'll just go buy the off-the-shelf solution even if it isn't as good in some technical sense.
Who are these people? I can't imagine a business taking 'our word' for it and blindly using a scikit in their product. The first time GPL code slips into a scikit the distinction will become meaningless and untrustworthy. The problem I have with 'gscikits' is that if the scikit idea actually takes off then it will require an army of people to keep scikits free of encumbered code. We cannot trust casual contributors to do their homework and keep GPL out of their scikits. The distinction is only as good as the certifier. -- Nathan Bell wnbell@gmail.com
On 28/12/2007, Travis E. Oliphant <oliphant@enthought.com> wrote:
All I'm saying that the distinction between the licenses that impose restrictions on what you do with your own code that depends on them and licenses that don't do that is important enough to warrant a name-space division.
I think this is the key point. You think there are lots of these people, and they are important. Others think there are not many of these people and making them work harder is fine. I wonder if the difference of opinions is largely a difference of ideas on what non-BSD licenses allow. In particular, you talk about "restrictions on what you do with your own code". My interpretation is that if I am writing some scientific code and I want to work with numpy/scipy/scikits/what have you, I may do one of two things: * Write python code that simply imports some packages and uses functions/classes from them. * Extract and modify source code from the library to produce a version of numpy/scipy/the scikit that can do more. As I understand the notion of "derived work", the latter is a derived work of the library and so the GPL (for example) forces me to release my modifications unde the GPL. But (as I understand it), the former is *not* a derived work of the scikit, and so my code can be under any license I wish. Is this correct? I realize there are packaging issues - if I want to make one tidy exectuable that includes my code plus python plus all libraries I use I may need to provide some source code. This does not seem unduly troublesome. Anne
Anne Archibald wrote:
On 28/12/2007, Travis E. Oliphant <oliphant@enthought.com> wrote:
All I'm saying that the distinction between the licenses that impose restrictions on what you do with your own code that depends on them and licenses that don't do that is important enough to warrant a name-space division.
I think this is the key point. You think there are lots of these people, and they are important. Others think there are not many of these people and making them work harder is fine.
I wonder if the difference of opinions is largely a difference of ideas on what non-BSD licenses allow.
In particular, you talk about "restrictions on what you do with your own code".
My interpretation is that if I am writing some scientific code and I want to work with numpy/scipy/scikits/what have you, I may do one of two things:
* Write python code that simply imports some packages and uses functions/classes from them.
* Extract and modify source code from the library to produce a version of numpy/scipy/the scikit that can do more.
As I understand the notion of "derived work", the latter is a derived work of the library and so the GPL (for example) forces me to release my modifications unde the GPL. But (as I understand it), the former is *not* a derived work of the scikit, and so my code can be under any license I wish. Is this correct?
There is no case law to decide this. The law itself is unclear. However, the FSF considers both instances an area where you must release the whole code under a GPL-compatible license. Since the FSF is the author of the GPL, most programmers who release their code under the GPL follow this interpretation as well. The programmer's interpretation is the most important one (since at least US law holds as most important the shared understanding of the license by the licensor and the licensee); the FSF's interpretation is only important because it is the most common. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Travis E. Oliphant wrote:
All I'm saying that the distinction between the licenses that impose restrictions on what you do with your own code that depends on them and licenses that don't do that is important enough to warrant a name-space division.
I think it's very important. However, I think that a different namespace package is entirely the wrong tool to use. "License: BSD" and "License: GPL" is the correct tool to use. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
There are lots of scikit attributes important to users: - license - current maturity status - development rate So yielding lots of *sci* classes only for to separate scikits by license is not a good idea (+taking into account lots of different license classes). BTW, some scikits are BSD-licensed. I think it should be something like database, like PYPI. scipy should have a function (like "aptitude install pylab") to list all available (maybe, via some kind of search, specifying interested parameters - isOpenSourse, isOSIApproved, including some keywords, etc), extract and install required scikits from internet. Maybe, it has some sense to start the function after scipy installation (or yielding a text message after scipy installation finish about the function). Regards, D.
I would like to propose to ask user (after scipy installation) about willing to install some more scikits, for example
you can select some scikits packages you want to install:
ver date license description 1. learn 0.1 2007-12-01 GPL machine learn 2. pylab 0.4b 2007-11-01 LGPL matlab-Python bridge ... etc
enter your choise (comma- or space- separated)
It will require working internet connection. Alternatively (and/or optionally) user can specify directory containing scikits files, transferred as tar.gz files or from svn repository w/o working internet connection. Also, user may be interested in installing scikit directly from svn repository (latest snapshot), so idea of organizing this possibility also should be taken into account. I guess only few (which are already rather good, or because no good alternative is available for now) scikits should be proposed. When some scikit become out of support because newer more powerful one have appeared or support dropped or due to any other reasons - scikit name should be excluded from the list of propositions. Regards, D.
On Dec 28, 2007 11:21 AM, dmitrey <dmitrey.kroshko@scipy.org> wrote:
I would like to propose to ask user (after scipy installation) about willing to install some more scikits, for example
you can select some scikits packages you want to install:
ver date license description
1. learn 0.1 2007-12-01 GPL machine learn 2. pylab 0.4b 2007-11-01 LGPL matlab-Python bridge ... etc
enter your choise (comma- or space- separated)
It will require working internet connection. Alternatively (and/or optionally) user can specify directory containing scikits files, transferred as tar.gz files or from svn repository w/o working internet connection.
This should not be turned on by default. ./setup.py install just needs to install scipy and that's it. No user interaction, no internet connection. Ondrej
On Fri, Dec 28, 2007 at 01:42:05AM -0600, Travis E. Oliphant wrote:
scikits --- For BSD third-party packages. These may be packages with wide-spread appeal with a different calling convention than scipy or packages that the developers are not done with or just want to keep their own release cycles. Code may come out of here for inclusion into scipy, but it will do so using:
scipy-somepackage (imports as scipy.somepackage but is distributed separately) --- Packages that will soon be released with scipy but for now are being distributed alone because they need a faster release cycle. These packages involve the input of SciPy developers more than a scikits package might.
I think this may cause some confusion: how would you explain to users that, while they have the latest scipy installed, they cannot import a sub-package under the root scipy namespace? If the package was located under scipy.scikit.somepackage, that would be a bit better, but I recall that Robert said it would be troublesome to implement. As for the scikits, I'd like to see them all under one roof, regardless of their licenses (they were created to support non-BSD licensing, IIRC). Regards Stéfan
In preparation for doc-day tomorrow, I've been thinking about scikits and its relationship to scipy. There seem to be three distinct purposes of scikits:
1) A place for specialized tool boxes that build on numpy and/or scipy that live under a common name-space, but are too specialized to live in scipy itself. 2) A place for GPL or similarly-licensed tools so that people can understand what they are getting and appropriate use can be made. 3) A place for modularly-installed scipy tools.
Given these three purposes. It seems that we should have three name-spaces:
From working with lots of end users, I get the feeling that just having numpy and scipy is complicated enough. I completely understand why numpy and scipy are separate packages, but from a users perspective it _is_ complicated. I get lots of questions from users who can't find such and such functions in numpy - when it is in scipy.
The addition of scikits complicates the landscape even further - especially because most users don't care about the seemingly subtle differences between BSD/GPL licenses. I fear that having multiple namespaces under scikits just complicates things even further. Even is scikits is a single download, a user would potentially have to search through 5 top-level namespaces (numpy, scipy, scikits, gscipy) to find something. This is made worse, given the fact that the names don't really reflect the actual content of the packages. Having two packages whose sole difference is the licenses used by subpackages is a horrible situation. Things like that are "implementation details" from a user's perspective and we shouldn't exposing those things are part of our "public API." Because of these complexities, I think in many cases people will keep things out of scikits and just release things as standalone projects. For scikits to be a success, I think its purpose has to be dead clear and its organization has to be dead simple and focused on what users will experience when they sit down to use it. I think a single top-level namespace works best for this. Brian
1) scikits 2) ??? perhaps scigpl 3) scipy (why not use the same namespace --- there is a technological hurdle that may prevent egg distribution of these until we fix it. But, I think we can fix it and would rather not invent another name-space for something that should be scipy).
This idea occurred to me, when I was thinking about "tagging" modules in scikits so users could understand the tool. But, at that point it seemed obvious that we should just have different name-spaces which would promote the needed clarity.
Ideas.
-Travis O.
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
Brian Granger wrote:
In preparation for doc-day tomorrow, I've been thinking about scikits and its relationship to scipy. There seem to be three distinct purposes of scikits:
1) A place for specialized tool boxes that build on numpy and/or scipy that live under a common name-space, but are too specialized to live in scipy itself. 2) A place for GPL or similarly-licensed tools so that people can understand what they are getting and appropriate use can be made. 3) A place for modularly-installed scipy tools.
Given these three purposes. It seems that we should have three name-spaces:
From working with lots of end users, I get the feeling that just having numpy and scipy is complicated enough. I completely understand why numpy and scipy are separate packages, but from a users perspective it _is_ complicated. I get lots of questions from users who can't find such and such functions in numpy - when it is in scipy.
Very good. This I can agree with. I would love to see fewer namespaces. Originally, I thought scikits was just supposed to be a place for GPL-like packages to go. But, it would appear that others see other purposes for it. Apparently nobody agrees that we need to keep scikits namespace from being a mine-field of GPL-like code. I don't care enough about it to continue arguing. I don't buy the arguments that allowing a scipy-somepackage to be distributed separately is going to cause unwarranted confusion. In fact, I would like it to cause a little-bit of confusion as to why it doesn't come installed with scipy already (although I suspect that in many distributions it will be delivered) because that puts pressure to get scipy-somepackage into scipy itself when it's user base grows. So, I'm going to encourage that by moving some of the sandbox packages to that style of install and not into the scikits name-space. Jarrod has a document listing the packages that are targeted for inclusion into scipy. These should be distributed as scipy-somepackage (or scipy_somepackage). -Travis
participants (10)
-
Anne Archibald -
Brian Granger -
dmitrey -
Fernando Perez -
Matthieu Brucher -
Nathan Bell -
Ondrej Certik -
Robert Kern -
Stefan van der Walt -
Travis E. Oliphant