[Numpy-discussion] The future of ndarray.diagonal()

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Jan 5 10:48:55 EST 2015


On Mon, Jan 5, 2015 at 4:08 AM, Konrad Hinsen <konrad.hinsen at fastmail.net>
wrote:

> --On 5 janvier 2015 08:43:45 +0000 Sturla Molden <sturla.molden at gmail.com>
> wrote:
>
> > To me it seems that algorithms in scientific papers and books are
> > described in various forms of pseudo-code.
>
> That's indeed what people do when they write a paper about an algorithm.
> But many if not most algorithms in computational science are never
> published in a specific article. Very often, a scientific article gives
> only an outline of a method in plain English. The only full documentation
> of the method is the implementation.
>
> > Perhaps we need a notation
> > which is universal and ethernal like the language mathematics. But I am
> > not sure Python could or should try to be that "scripting" language.
>
> Neither Python nor any other programming was designed for that task, and
> none of them is really a good fit. But today's de facto situation is that
> programming languages fulfill the role of algorithmic specification
> languages in computational science. And I don't expect this to change
> rapidly, in particular because to the best of my knowledge there is no
> better choice available at the moment.
>
> I wrote an article on this topic that will appear in the March 2015 issue
> of "Computing in Science and Engineering". It concludes that for now, a
> simple Python script is probably the best you can do for an executable
> specification of an algorithm. However, I also recommend not using big
> libraries such as NumPy in such scripts.
>
> > I also think it is reasonable to ask if journals should require code as
> > algorithmic documentation to be written in some ISO standard language
> like
> > C or Fortran 90. The behavior of Python and NumPy are not dictated by
> > standards, and as such is not better than pseudo-code.
>
> True, but the ISO specifications of C and Fortran have so many holes
> ("undefined behavior") that they are not really much better for the job.
> And again, we can't ignore the reality of the de facto use today: there are
> no such requirements or even guidelines, so Python scripts are often the
> best we have as algorithmic documentation.
>

Matlab is more "well defined" than numpy.

numpy has too many features.

I think, if you want a runnable python script as algorithmic documentation,
then it will be necessary and relatively easy in most cases to stick to the
"stable" basic features.
The same for a library, if we want to minimize compatibility problems, then
we shouldn't use features that are most likely a moving target.
One of the issues is whether we want to write "safe" or "fancy" code.
(Fancy code might or will be faster, with a specific version.)

For example in most of my use cases having a view or copy of an array makes
a difference to the performance but not the results. I didn't participate
in the `diagonal` debate because I don't have a strong opinion and don't
use it with an assignment. There is an explicit np.fill_diagonal that is
inplace.

Having views or copies of arrays never sounded like having a clear cut
answer, there are too many functions that "return views if possible".
When our (statsmodels) code correctness depends on whether it's a view or
copy, then we usually make sure and write the matching unit tests.

Other cases, the behavior of numpy in edge cases like empty arrays is still
in flux. We usually try to avoid relying on implicit behavior.
Dtypes are a mess (in terms of code compatibility). Matlab is much nicer,
it's all just doubles. Now pandas and numpy are making object arrays
popular and introduce strange things like datetime dtypes, and users think
a program written a while ago can handle them.

Related compatibility issue python 2 and python 3: For non-string
manipulation scientific code the main limitation is to avoid version
specific features, and decide when to use lists versus iterators for range,
zip, map. Other than that, it looks much simpler to me than expected.


Overall I think the current policy of incremental changes in numpy works
very well. Statsmodels needs a few minor adjustments in each version. But
most of those are for cases where numpy became more strict or where we used
a specific behavior in edge cases, AFAIR.

One problem with accumulating changes for a larger version change like
numpy 2 or 3 or 4 is to decide what changes would require this. Most
changes will break some code, if the code requires or uses some exotic or
internal behavior.
If we want to be strict, then we don't change the policy but change the
version numbers, instead of 1.8, 1.9 we have numpy 18 and numpy 19.
However, from my perspective none of the recent changes were fundamental
enough.

BTW: Stata is versioning scripts. Each script can define for which version
of Stata it was written, but I have no idea how they handle the
compatibility issues.  It looks to me that it would be way too much work to
do something like this in an open source project.

Legacy cleanups like removal of numeric compatibility in numpy or weave
(and maxentropy) in scipy have been announced for a long time, and
eventually all legacy code needs to run in a legacy environment. But that's
a different issue from developing numpy and the current scientific python
related packages which need the improvements.
It is always possible just to "freeze" a package, with it's own frozen
python and frozen versions of dependencies.

Josef






>
> Konrad.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150105/b5037ec2/attachment.html>


More information about the NumPy-Discussion mailing list