
On Mon, Jan 5, 2015 at 4:08 AM, Konrad Hinsen <konrad.hinsen@fastmail.net> wrote:
--On 5 janvier 2015 08:43:45 +0000 Sturla Molden <sturla.molden@gmail.com> wrote:
To me it seems that algorithms in scientific papers and books are described in various forms of pseudo-code.
That's indeed what people do when they write a paper about an algorithm. But many if not most algorithms in computational science are never published in a specific article. Very often, a scientific article gives only an outline of a method in plain English. The only full documentation of the method is the implementation.
Perhaps we need a notation which is universal and ethernal like the language mathematics. But I am not sure Python could or should try to be that "scripting" language.
Neither Python nor any other programming was designed for that task, and none of them is really a good fit. But today's de facto situation is that programming languages fulfill the role of algorithmic specification languages in computational science. And I don't expect this to change rapidly, in particular because to the best of my knowledge there is no better choice available at the moment.
I wrote an article on this topic that will appear in the March 2015 issue of "Computing in Science and Engineering". It concludes that for now, a simple Python script is probably the best you can do for an executable specification of an algorithm. However, I also recommend not using big libraries such as NumPy in such scripts.
I also think it is reasonable to ask if journals should require code as algorithmic documentation to be written in some ISO standard language like C or Fortran 90. The behavior of Python and NumPy are not dictated by standards, and as such is not better than pseudo-code.
True, but the ISO specifications of C and Fortran have so many holes ("undefined behavior") that they are not really much better for the job. And again, we can't ignore the reality of the de facto use today: there are no such requirements or even guidelines, so Python scripts are often the best we have as algorithmic documentation.
Matlab is more "well defined" than numpy. numpy has too many features. I think, if you want a runnable python script as algorithmic documentation, then it will be necessary and relatively easy in most cases to stick to the "stable" basic features. The same for a library, if we want to minimize compatibility problems, then we shouldn't use features that are most likely a moving target. One of the issues is whether we want to write "safe" or "fancy" code. (Fancy code might or will be faster, with a specific version.) For example in most of my use cases having a view or copy of an array makes a difference to the performance but not the results. I didn't participate in the `diagonal` debate because I don't have a strong opinion and don't use it with an assignment. There is an explicit np.fill_diagonal that is inplace. Having views or copies of arrays never sounded like having a clear cut answer, there are too many functions that "return views if possible". When our (statsmodels) code correctness depends on whether it's a view or copy, then we usually make sure and write the matching unit tests. Other cases, the behavior of numpy in edge cases like empty arrays is still in flux. We usually try to avoid relying on implicit behavior. Dtypes are a mess (in terms of code compatibility). Matlab is much nicer, it's all just doubles. Now pandas and numpy are making object arrays popular and introduce strange things like datetime dtypes, and users think a program written a while ago can handle them. Related compatibility issue python 2 and python 3: For non-string manipulation scientific code the main limitation is to avoid version specific features, and decide when to use lists versus iterators for range, zip, map. Other than that, it looks much simpler to me than expected. Overall I think the current policy of incremental changes in numpy works very well. Statsmodels needs a few minor adjustments in each version. But most of those are for cases where numpy became more strict or where we used a specific behavior in edge cases, AFAIR. One problem with accumulating changes for a larger version change like numpy 2 or 3 or 4 is to decide what changes would require this. Most changes will break some code, if the code requires or uses some exotic or internal behavior. If we want to be strict, then we don't change the policy but change the version numbers, instead of 1.8, 1.9 we have numpy 18 and numpy 19. However, from my perspective none of the recent changes were fundamental enough. BTW: Stata is versioning scripts. Each script can define for which version of Stata it was written, but I have no idea how they handle the compatibility issues. It looks to me that it would be way too much work to do something like this in an open source project. Legacy cleanups like removal of numeric compatibility in numpy or weave (and maxentropy) in scipy have been announced for a long time, and eventually all legacy code needs to run in a legacy environment. But that's a different issue from developing numpy and the current scientific python related packages which need the improvements. It is always possible just to "freeze" a package, with it's own frozen python and frozen versions of dependencies. Josef
Konrad.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion