[Numpy-discussion] Args for ones, zeros, rand, eye, ones, empty (possible 1.0 change?)

Mon Jul 3 10:19:29 EDT 2006

Travis Oliphant wrote:
> Hmmm.....   One thing that bothers me is that it seems that those 
> arguing *against* this behavior are relatively long-time users of Python 
> while those arguing *for* it are from what I can tell somewhat new to 
> Python/NumPy.  I'm not sure what this means.
>
> Is the current behavior a *wart* you get used to or a clear *feature* 
> providing some  gain in programming efficiency. 
>
> If new users complain about something, my tendency is to listen openly 
> to the discussion and then discourage the implementation only if there 
> is a clear reason to.
>
> With this one, I'm not so sure of the clear reason.   I can see that 
> "program parsers" would have a harder time with a "flexible" calling 
> convention.  But, in my calculus, user-interface benefit outweighs 
> programmer benefit (all things being equal).
>
> It's easy as a programmer to get caught up in a very rigid system when 
> many users want flexibility.
>
> I must confess that I don't like looking at ones((5,5)) either.  I much 
> prefer ones(5,5) or even ones([5,5]). 
>   
I'm not sure why more people don't write the second version. It's 
significantly easier to read since the mixed brackets stand out better. 
If one buys into Guido's description of what tuples and lists are for, 
then it's also more appropriate since a shape is homogeneous, variable 
length sequence (as opposed to fixed length heterogeneous collection 
which would be more appropriate to represent using a tuple). Not 
everyone buys into that, using tuples as essentially immutable lists, 
but it's easier to read in any event.

I also find:

  ones([5, 5], dt)

clearer than:

   ones(5, 5, dt)

or, more dramatic, consider: ones([dx, dy], dt) versus ones(dx, dy, dt). 
Brrrr!

A side note: one reason I'm big on specifying the dtype is that 
according to Kahan (many papers available at 
http://www.cs.berkeley.edu/~wkahan/) the single best thing you can do to 
check that an implementation is numerically sane is to examine the 
results at different precisions (say float32 and float64 since they're 
commonly available) and verify that the results don't go off the rails. 
Kahan is oft quoted by Tim Peters who seems to have one of the better 
grasps of the fiddly aspects of floating point in the Python community, 
so I give his views a lot of weight. Since I try to program with this in 
mind, at least for touchy numerical code, I end up parameterizing things 
based on dtype anyway. Then it's relatively easy to check that things 
are behaving simply by changing the dtype that is passed in.

> But perhaps what this shows is something I've heard Robert discuss 
> before that ihas not received enough attention.   NumPy really has at 
> least two "users"  1) application developers and 2) prototype developers 
> (the MATLAB crowd for lack of a better term). 
>
> These two groups usually want different functionality (and in reality 
> most of us probably fall in both groups on different occasions).  The 
> first clamors for more rigidity and conformity even at the expense of 
> user interfaces.  These people usually want
>
> 1) long_but_explanatory_function names
> 2) rigid calling syntax
>   
One might also call it consistent calling syntax, but yes I'd put myself 
in this group and having consistent calling syntax catches many errors 
that would otherwise pass silently. It's also easier to figure out 
what's going one when coming back to functions written in the distant 
pass if their is only one calling syntax.

> 3) strict name-space control
>   
This combined with more explanatory names mentioned in 1 help make it 
easier to decipher code written in yesteryear.

> The second group which wants to get something prototyped and working 
> quickly wants
>
> 1) short easy-to-remember names
> 2) flexible calling syntax
> 3) all-in-one name-space control
>
> My hypothesis is that when you first start with NumPy (especially with a 
> MATLAB  or IDL history) you seem to start out in group 2 and stay there 
> for quick projects. Then, as code-size increases and applications get 
> larger, you become more like group 1. 
>
> I think both groups have valid points to make and both styles can be 
> useful and one point or another. Perhaps, the solution is the one I have 
> barely begun to recognize, that others of you have probably already seen.
>
> A) Make numpy a good "library" for group 1.
> B) Make a shallow "name-space" (like pylab) with the properties of group 2.
>
> Perhaps a good solution is to formalize the discussion and actually 
> place in NumPy a name-space package much like Bill has done privately.
>   
+1

-tim