Re: [SciPy-dev] Some concerns on Scipy development
Hey Pearu, I've have thought about this a little lately also. There is a philosophical difference to packaging among the scientific developers. Some wish for small single purpose and stand alone packages that are installed one by one. Others wish for a single "standard library" of scientific tools that, once installed, is a one stop shop for a large number of scientific algorithms. There are benefits to both. However, I come squarely down in the second camp. A monolithic package is easier to install for end users, and it solves compatibility issues (such as SciPy changing the behavior of Numeric in some places). I believe the existence of such a package is required before there can be a mass conversion of engineers and scientist to Python as their tool of choice for daily tasks. This is the goal of SciPy. That said, the monolithic nature does pose some problems occasionally complicating development and following the CVS as Jochen has pointed out. Also, some may want to use some modules (weave, integrate, linalg) outside of scipy. This is useful in cases where you want to minimize install size or use something like py2exe (which currently doesn't work with SciPy). We should facilitate separating out packages when it is convenient, but not when it requires duplication of code or a lot of extra work. Perhaps re-organizing the architecture can make it convenient more often and come closer to making both camps happy. I agree that, when possible, it is nice to develop packages independent of SciPy -- that is how weave was developed. Later it was folded into SciPy, but it still runs separately. The new build structure (with separate setup_xxx.py files for each sub-package) implemented several months ago was developed, in part, to facilitate this sort of thing. Weave was easy in this respect because it doesn't need many numeric capabilities. So, I think this is a worthy goal for *some* of the modules (notably the ones people are discussing such as integrate, linalg, etc), with one caveat. These modules need access to some functions provided by scipy and will need to import at least one extra module. Scanning linalg, the needed functions are amax, amin, triu, etc. and a handful of functions subsumed from Numeric as well as some constants from scipy.limits. I consider it a bad idea to replicate these functions across multiple modules because of the maintenance issues associated with duplicate code. I don't want to go down that path. However, one thought is to make the idea of "levels" more explicit. We could define a package called "scipy_lite" or "scipy_level0" that would subsume Numeric and add the helper functions that are often used. It would not reference other scipy modules. This package would live in the scipy development tree, but would install as a separate package. So scipy_level0 would sit next to scipy in the site-packages directory. scipy_level0 would be easy to build without major dependencies -- much like Numeric. It would hold fastumath and maybe a few other extension modules, but it would be predominantly python code. The linalg, integrate, etc modules would import scipy_level0 instead of scipy. This way, people only have to port scipy_level0 instead of the whole of scipy if they want to use integrate in their package. I can't imagine much dissent about this approach by the people wanting single packages. If your willing to install Numeric and linalg on your own, then you should be willing to install the scipy_level0 package. Installing scipy_level0 outside the scipy package has some precedence since we've already done this with scipy_distutils and scipy_test. I'd rather leave it as a sub-directory of scipy, but pulling it out is necessary because of the way that python handles (or doesn't handle...) imports within packages -- that is if we want to make it easy to use sub-modules of scipy separately. So this is what the site-packages view of scipy would be: site-packages scipy_distutils scipy_test scipy_level0 subsumes and customizes Numeric handy.py misc.py scimath.py Matrix.py (?) fastumath.so (pyd) etc. scipy subsume scipy_base everything else In regards to higher level modules that use fft, svd, and other complex algorithms, they are just gonna have to import scipy. This requires some discussion before we make the change. It's also gonna require someone to step up and implement the change -- though it probably isn't a major effort. eric -- Eric Jones <eric at enthought.com> Enthought, Inc. [www.enthought.com and www.scipy.org] (512) 536-1057
Hi Eric and Travis. On Tue, 26 Mar 2002, eric wrote:
I've have thought about this a little lately also. There is a philosophical difference to packaging among the scientific developers. Some wish for small single purpose and stand alone packages that are installed one by one. Others wish for a single "standard library" of scientific tools that, once installed, is a one stop shop for a large number of scientific algorithms. There are benefits to both. However, I come squarely down in the second camp. A monolithic package is easier to install for end users, and it solves compatibility issues (such as SciPy changing the behavior of Numeric in some places). I believe the existence of such a package is required before there can be a mass conversion of engineers and scientist to Python as their tool of choice for daily tasks. This is the goal of SciPy.
I have been in peace with this goal of SciPy for a long time. In my concers I was not trying to propose to change this general goal in any way. Instead, I was concerned on the internal structure of SciPy and to see if we could ease the SciPy development and make it more robust for the future. One efficient way to achive that would be to require that internal modules in SciPy would be as independent as possible. A good measure for this independence is that a particular module can be installed as a standalone. Note that I am not proposing this because I would like to use these modules as standalone modules myself (or any other party), but only to strengthen SciPy by making it more robust internally. By doing this, it does not mean that the main goal of SciPy is somehow threatened, it will be still a monolithic package for end-users. Just its internal structure will be modular and less sensitive to adding new modules or reviewing some if needed in future. Now about the question whether SciPy parts can be completely independent? I think this can be never achived in principle nor it is desired, but it is a good ideal to follow *whenever* it is possible (and not just a nice thing to do as you say) and, indeed, can be practical for other projects, and all that for the sake of SciPy own success. <snip>
So, I think this is a worthy goal for *some* of the modules (notably the ones people are discussing such as integrate, linalg, etc), with one caveat. These modules need access to some functions provided by scipy and will need to import at least one extra module. Scanning linalg, the needed functions are amax, amin, triu, etc. and a handful of functions subsumed from Numeric as well as some constants from scipy.limits. I consider it a bad idea to replicate these functions across multiple modules because of the maintenance issues associated with duplicate code. I don't want to go down that path.
Me neither. However your statement that these modules necessarily need access to scipy functions, is a bit exaggerated. In general, there are several ways how the same functionality can be implemented, and it is my experience that linalg2 can be implemented without the scipy dependence and that also without replicating any code. In fact, using high-level scipy convinience functions in linalg2 that is supposed to provide highly efficient and yet to be user-friendly (yes, both goals can achived at the same time!) algorithms, is not good because scipy functions just are inefficient due to their general purpose feature and the initial wins in performance are lost. Therefore low level modules like linalg, integrate, etc must be carefully implemented even if it takes more time and seemingly direct Python hooks could be applied.
So this is what the site-packages view of scipy would be:
site-packages scipy_distutils scipy_test scipy_level0 subsumes and customizes Numeric handy.py misc.py scimath.py Matrix.py (?) fastumath.so (pyd) etc. scipy subsume scipy_base everything else
This looks like a positive plan to me. Any other candidates for naming scipy_level0? It reflects too much the internals of SciPy but will contain very useful general purpose functions, I assume, to be useful more widely. How about scipy_base? Another idea would be then to move scipy_test inside scipy_base (and dropping its scipy_ prefix). Since scipy_base would be mostly pure Python, it should be feasible. (Later, be not surprised if I will question the naming of handy.py and misc.py, but I am not ready for that yet ...;-)
In regards to higher level modules that use fft, svd, and other complex algorithms, they are just gonna have to import scipy.
+2
This requires some discussion before we make the change. It's also gonna require someone to step up and implement the change -- though it probably isn't a major effort.
It may be a good idea to release 0.2 before such a change. If it works out nicely, then 0.3 could follow quickly. Regards, Pearu
At 07:59 2002-03-27 -0600, Pearu wrote:
One efficient way to achive that would be to require that internal modules in SciPy would be as independent as possible. A good measure for this independence is that a particular module can be installed as a standalone.
I think this is very sound thinking. We always run into situations sooner or later when we need to chage some parts of a system. The more dependence between modules we have, the more pain from such a change. "O What a tangled web we weave..."
Note that I am not proposing this because I would like to use these modules as standalone modules myself (or any other party), but only to strengthen SciPy by making it more robust internally.
Right. But if it's also possible to accomodate those who for some reason can't or won't use the entire thing, it's still an additional benefit. I've only used the plotting parts of SciPy so far, and I'm likely to make simple applications in the future, and distribute them using py2exe or the McMillan installer. I prefer such apps to be as small as possible, and I wouldn't see any benefit in bundling code my users don't (or at least shouldn't) need. -- Magnus Lycka, Thinkware AB Alvans vag 99, SE-907 50 UMEA, SWEDEN phone: int+46 70 582 80 65, fax: int+46 70 612 80 65 http://www.thinkware.se/ mailto:magnus@thinkware.se
Hey Pearu,
I've have thought about this a little lately also. There is a philosophical difference to packaging among the scientific developers. Some wish for
small
single purpose and stand alone packages that are installed one by one. Others wish for a single "standard library" of scientific tools that, once installed, is a one stop shop for a large number of scientific algorithms. There are benefits to both. However, I come squarely down in the second camp. A monolithic package is easier to install for end users, and it solves compatibility issues (such as SciPy changing the behavior of Numeric in some places). I believe the existence of such a package is required before there can be a mass conversion of engineers and scientist to Python as their tool of choice for daily tasks. This is the goal of SciPy.
I have been in peace with this goal of SciPy for a long time. In my concers I was not trying to propose to change this general goal in any way. Instead, I was concerned on the internal structure of SciPy and to see if we could ease the SciPy development and make it more robust for the future.
Right. I didn't think you were -- I just wanted to note the differnce of opinions on this and explain where SciPy fit in the picture.
One efficient way to achive that would be to require that internal modules in SciPy would be as independent as possible. A good measure for this independence is that a particular module can be installed as a standalone.
I agree and think your suggestion to move as far as possible this direction is good. But, I also don't think dependence on a single package is to much of a price to pay. There is already some difference in scipy_base/scipy_lite (whatever it is called) and Numeric's behavior. We need to import this instead of Numeric directly to insure current and future linalg, etc. modules comply with the expected behavior in SciPy. Also, scipy_base has many convenience functions that will be helpful in other places.
Note that I am not proposing this because I would like to use these modules as standalone modules myself (or any other party), but only to strengthen SciPy by making it more robust internally.
I'm actually am a consumer in this case. I'd would like to use modules outside of SciPy on occasion, and want to make it as easy as possible within the SciPy framework. Witness weave. It seems like the scipy_base concept accomplishes this. If your willing to inlcude Numeric as a requirement, adding scipy_base shouldn't be an issue.
By doing this, it does not mean that the main goal of SciPy is somehow threatened, it will be still a monolithic package for end-users. Just its internal structure will be modular and less sensitive to adding new modules or reviewing some if needed in future.
Again, I agree -- I think we are on the same page.
Now about the question whether SciPy parts can be completely independent? I think this can be never achived in principle nor it is desired, but it is a good ideal to follow *whenever* it is possible (and not just a nice thing to do as you say) and, indeed, can be practical for other projects, and all that for the sake of SciPy own success.
<snip>
So, I think this is a worthy goal for *some* of the modules (notably the
people are discussing such as integrate, linalg, etc), with one caveat. These modules need access to some functions provided by scipy and will need to import at least one extra module. Scanning linalg, the needed functions are amax, amin, triu, etc. and a handful of functions subsumed from Numeric as well as some constants from scipy.limits. I consider it a bad idea to replicate
ones these
functions across multiple modules because of the maintenance issues associated with duplicate code. I don't want to go down that path.
Me neither. However your statement that these modules necessarily need access to scipy functions, is a bit exaggerated. In general, there are several ways how the same functionality can be implemented, and it is my experience that linalg2 can be implemented without the scipy dependence and that also without replicating any code.
This may be the case. Please let us know what you have in mind. Travis has implemented a lot of stuff that uses functions that are currently in scipy and will be in scipy_lite. The linalg interfaces to solve, expm, etc. may not currently be the most efficient, but, by all reports, they are working pretty well and address many problems. I'm sure we will need to rework the interface some -- I personally see the need for an lu_factor and lu_solve method that are thinly layered over getrf and getrs for efficiency. I'm sure there are other places that linear algebra gurus could point out. Waiting for the perfect interface though, makes people like Jochen who is waiting on a (somewhat) stable release continue to wait. If the only problem is efficiency, I say we get a release based on the current interface out there, and solve the efficiency issues in the next release. One other note. I do not see the interface of a 0.2 package set in stone. Users are considered "early adopters." If there is good reason to change the interface between 0.2 and 0.3 then we should do it. When we get up in the .6 or .7 range, then we should be more careful about changes. But for now, like f2py, the changes are OK. Perhaps we should start a thread discussing the SciPy linear algebra interface. Would this be helpful?
In fact, using high-level scipy convinience functions in linalg2 that is supposed to provide highly efficient and yet to be user-friendly (yes, both goals can achived at the same time!) algorithms, is not good because scipy functions just are inefficient due to their general purpose feature and the initial wins in performance are lost.
Some can be made efficient. Some will be less so. I'm more worried about getting a working version out that (hopefully) can be made efficient in the future than I am in optimizing it right now. If we want to make changes to linalg, lets discuss specifics.
Therefore low level modules like linalg, integrate, etc must be carefully implemented even if it takes more time and seemingly direct Python hooks could be applied.
So this is what the site-packages view of scipy would be:
site-packages scipy_distutils scipy_test scipy_level0 subsumes and customizes Numeric handy.py misc.py scimath.py Matrix.py (?) fastumath.so (pyd) etc. scipy subsume scipy_base everything else
This looks like a positive plan to me.
Any other candidates for naming scipy_level0? It reflects too much the internals of SciPy but will contain very useful general purpose functions, I assume, to be useful more widely. How about scipy_base?
scipy_base is fine with me.
Another idea would be then to move scipy_test inside scipy_base (and dropping its scipy_ prefix). Since scipy_base would be mostly pure Python, it should be feasible.
Good idea. The current "packagization" of scipy_test was a complete hack to get around limitations in distutils. scipy_base is a much better home for it.
(Later, be not surprised if I will question the naming of handy.py and misc.py, but I am not ready for that yet ...;-)
Funny you should mention that. misc.py was my utility module. handy.py was Travis O.'s. We both thought they should be merged into an appropriately named module in the move to scipy_base. Pick a name.
In regards to higher level modules that use fft, svd, and other complex algorithms, they are just gonna have to import scipy.
+2
This requires some discussion before we make the change. It's also gonna require someone to step up and implement the change -- though it probably isn't a major effort.
It may be a good idea to release 0.2 before such a change. If it works out nicely, then 0.3 could follow quickly.
We could do that. I think the change isn't that difficult. Travis O. has already structured the code in a way that is pretty much equivalent to the scipy_base idea. His level0 functions/modules can be moved over into to scipy_base plus fastumath, limits, scipy_test (others?). Creating scipy_base now solves the problem of where to put fastumath which doesn't have a good home. The issue that needs more thought is the NaN functions. They should also go over there, but they are part of cephes, and the entire "special" package should not be moved (I don't think...). Needs the most thought. After making the scipy_base package, the find/replaces need to be done in appropriate modules. I'd lean toward trying to get the scipy_base idea in this release. If it looks like to much disruption though, we'll push it to 0.3. Perhaps April 5th is to ambitious to fit all this in. I'd like to try though. eric
Hi, On Wed, 27 Mar 2002, eric wrote:
(Later, be not surprised if I will question the naming of handy.py and misc.py, but I am not ready for that yet ...;-)
Funny you should mention that. misc.py was my utility module. handy.py was Travis O.'s. We both thought they should be merged into an appropriately named module in the move to scipy_base. Pick a name.
I think we should pick many names here... Some months back I made a quick reference card on scipy functions and their dependencies for my self. It follows below. Note that it aims not to be complete or updated but to give some prespective. A quick look on this map shows to me a rather high scattering of relative functions in different modules. In the following messages I shall be more specific to give some starting ideas on re-factoring this stuff. Please, feel free to draw your own conclusions so that overlapping ideas can be collected and applied. Pearu ----------------------------------- scipy: __init__: import Numeric,os,sys,fastumath,string from helpmod import help, source from Matrix import Matrix as Mat defines: Inf,inf,NaN,nan somenames2all,names2all,modules2all,objects2all helpmod: import inspect,types,sys,os defines: split_line,makenamedict,help,source handy: import Numeric,types,cPickle,sys,scipy,re from Numeric import * from fastumath import * defines: ScalarType nd_grid,grid concatenator,r_,c_ index_exp disp,logspace,linspace fix,mod,fftshift,ifftshift,fftfreq,cont_ft,r1array r2array,who,objsave,objload,isscalar,toeplitz, hankel,real_if_close,sort_complex,poly,polyint, polyder,polyval,polyadd,polysub,polymul,polydiv, deconvolve,poly1d,select misc: import scipy.special from types import IntType,ComplexType defines: real,imag,iscomplex,isreal,array_iscomplex,array_isreal isposinf,isneginf,nan_to_num,logn,log2,lena, histogram,trim_zeros,atleast_1d,atleast_2d,atleast_3d, vstack,hstack,column_stack,dstack,replace_zero_by_x_arrays, array_split,split,hsplit,vsplit,dsplit x_array_kind,x_array_precision,x_array_type x_common_type limits: import Numeric defines: toChar,toInt8,toInt16,toInt32,toFloat32,toFloat64 epsilon,tiny float_epsilon,float_tiny,float_min,float_max, float_precision,float_resolution double_epsilon,double_tiny double_min,double_max,double_precision,double_resolution data_store: import dumb_shelve import string,os defines: load,save,create_module,create_shelf dumb_shelve: from shelve import Shelf import zlib from cStringIO import StringIO import cPickle defines: DbfilenameShelf,open dumbdbm_patched: defines: open basic: from Numeric import * import Matrix,copy from handy import isscalar, mod from fastumath import * defines: eye,tri,diag,fliplr,flipud,rot90,tril,triu,amax,amin ptp,mean,median,std,cumsum,prod,cumprod,diff cov,corrcoef,squeeze,sinc,angle,unwrap,allMat basic1a: import Numeric,fastumath,types,handy from scipy import diag,special,r1array,hstack from scipy.linalg import eig import scipy.stats as stats from Numeric import * from fastumath import * defines: find_non_zero,roots,factorial,comb,rand,randn ... -------------------------------------------
Hi again, Below I just outline some ideas on cleaning scipy up and without taking into account actual implementation details of these functions. The analysis is very raw and the suggestions may only show possible directions. Currently I find quite difficult to decide where a particular function should go because the overall structure really needs refactored and an overall purpose of each part should be summarized. Can someone layout a more detailed vision of scipy parts and structure? I am a bit lost right now, may be due to the late hour here ... Pearu On Wed, 27 Mar 2002 pearu@scipy.org wrote:
----------------------------------- scipy: __init__: defines: Inf,inf,NaN,nan somenames2all,names2all,modules2all,objects2all
Eric mentioned the problem with NaN stuff. I have not looked into it yet..
helpmod: defines: split_line,makenamedict,help,source
There is also helper.py that defines help2,help3,etc. Merge helper.py and helpmod.py.
handy: defines: ScalarType
nd_grid,grid concatenator,r_,c_ index_exp
these functions could be factored somewhere.
disp
find a better place..
logspace,linspace
dito
fix,mod,fftshift,ifftshift,fftfreq,cont_ft,
fft stuff could go into a yet not existing transform module. I have in mind implementing other transforms as well like hilbert, etc. that is based on fft.
r1array r2array
find or make a place ..
who
find or make a place ..
objsave,objload
related to data_store??
isscalar,
toeplitz, hankel
should go into Matrix
real_if_close,sort_complex,
find or make a place
poly,polyint, polyder,polyval,polyadd,polysub,polymul,polydiv, deconvolve,poly1d,select
collect poly,polyint,... into a separate polynomial module?
misc: defines: real,imag
Put into scimath.py?
iscomplex,isreal,array_iscomplex,array_isreal isposinf,isneginf
find or make a place
nan_to_num,logn,log2
put into scimath.py?
lena
?? to something with it
histogram,trim_zeros,atleast_1d,atleast_2d,atleast_3d, vstack,hstack,column_stack,dstack,replace_zero_by_x_arrays,
Looks like a stuff for a separate module. Any relation to basic.py?
array_split,split,hsplit,vsplit,dsplit
find or make a place
x_array_kind,x_array_precision,x_array_type x_common_type
find a place
limits: defines: toChar,toInt8,toInt16,toInt32,toFloat32,toFloat64 epsilon,tiny float_epsilon,float_tiny,float_min,float_max, float_precision,float_resolution double_epsilon,double_tiny double_min,double_max,double_precision,double_resolution
That looks ok to me.
data_store: import dumb_shelve import string,os defines: load,save,create_module,create_shelf dumb_shelve: from shelve import Shelf import zlib from cStringIO import StringIO import cPickle defines: DbfilenameShelf,open dumbdbm_patched: defines: open
Merge data_store, dumb_shelve, dumbdbm_patched to reduce number of files. Or make a separete package if this stuff will be extended.
basic: defines: eye,tri,diag,fliplr,flipud,rot90,tril,triu,amax,amin ptp,mean,median,std,cumsum,prod,cumprod,diff cov,corrcoef,squeeze,sinc,angle,unwrap,allMat
looks like MLab
basic1a: defines: find_non_zero,roots,factorial,comb,rand,randn
"PP" == pearu <pearu@scipy.org> writes:
>> So this is what the site-packages view of scipy would be: >> >> site-packages scipy_distutils scipy_test scipy_level0 subsumes >> and customizes Numeric handy.py misc.py scimath.py Matrix.py >> (?) fastumath.so (pyd) etc. scipy subsume scipy_base >> everything else PP> This looks like a positive plan to me. PP> Any other candidates for naming scipy_level0? It reflects too PP> much the internals of SciPy but will contain very useful PP> general purpose functions, I assume, to be useful more widely. PP> How about scipy_base? FWIW, scipy_base sounds much better. prabhu
участники (4)
-
eric
-
Magnus Lyckå
-
pearu@scipy.org
-
Prabhu Ramachandran