[Numpy-discussion] Args for rand and randn: call for a vote

Wed Jul 12 00:05:35 EDT 2006

Travis Oliphant wrote:
> So, I'm opposed to getting rid of the *args based syntax.   My feelings 
> are weaker regarding adding the capability to rand and randn to accept a 
> tuple.  I did test it out and it does seem feasible to add this feature 
> at the cost of an additional comparison.  I know Robert is opposed to it 
> but I'm not sure I understand completely why. 
> 
> Please correct me if I'm wrong, but I think it has something to do with 
> making the random_sample and standard_normal functions irrelevant and 
> unnecessary combined with my hypothesis that Robert doesn't like the 
> *args-style syntax.  Therefore, adding support to rand and randn for 
> tuples, would make them the default random-number generators and there 
> would be proliferating code that was "harder to read" because of the 
> different usages.

My opposition has much to do with my natural orneriness, I suppose. However, for 
most of this argument I've never seen a reason why rand() ought to be considered 
a peer of ones() and zeros() and thus have any need to be "consistent" in this 
one respect with them. I've always considered ones() and zeros() to be 
fundamental constructors of basic arrays of an arbitrary dtype that (most 
likely) you'll be mangling up immediately. I've seen rand() and all of the other 
RandomState methods as simply functions that return arrays. In that respect, 
they have more in common with arange() than anything else. However, I've come to 
realize that because rand() is exposed in the top-level namespace, other people 
probably *are* seeing it as another fundamental constructor of arrays (albeit of 
a single dtype). When I was writing the package, I never even considered that 
some of its functions might be imported into the top-level namespace. I'm 
somewhat more sympathetic to the confusion caused by the rand()-as-constructor 
viewpoint, now. That's why my currently preferred compromise position is to 
remove rand() from numpy.* and leave it in numpy.random*.

I also have found that I just don't use those functions convenient in "real" 
code. They usually come into play when I need one or lots of arbitrary, but 
non-degenerate-with-high-probability arrays to as a test or demo input to 
another piece of code. Otherwise, I probably need something more sophisticated. 
Manually shifting and scaling standard variates makes the code more verbose 
without making it more comprehensible.

I do indeed dislike functions that try to be clever in determining what the user 
wanted by the number and types of arguments provided. Even if they are easy to 
implement robustly (as in this case), I think that it makes explaining the 
function more complicated. It also makes the same function be called in two very 
different ways; I find this break in consistency rather more egregious than two 
different functions, used for different purposes in different circumstances, 
being called in two different ways. In my experience, it's much more likely to 
cause confusion.

I'm currently trudging my way through a mountain of someone else's C++ code at 
the moment. There are good uses of C++'s function overloading, for example to do 
type polymorphism on the same number of arguments when templates don't quite cut 
it. However, if it is abused to create fundamentally different ways to call the 
same function, I've found that the readability drops rapidly. I have not been a 
happy camper.

I have yet to see a good exposition of the actual *problems* the current 
situation is causing. Most of the argument in favor of changing anything has 
been a call for consistency. However, consistency can never be an end of itself. 
It must always be considered a means to achieve a comprehensible, flexible, 
usable API. But it's only one of several means to that goal, and even that goal 
must be weighed against other goals, too.

When real problems were discussed, I didn't find the solution to fit those 
problems. Alan said that it was an annoyance (at the very least) to have to 
teach students that rand() is called differently from ones(). The answer here is 
to not teach rand(); use the functions that follow the one convention that you 
want to teach. Also teach the students to read docstrings so if they come across 
rand() in their copious spare time, they know what's what.

Another almost-compelling-to-me real problem was someone saying that they always 
had to pause when writing rand() and think about what the calling convention 
was. My answer is similar: don't use rand(). This problem, as described, is a 
"write-only" problem; if you want consistency in the calling convention between 
ones() et al. and your source of random numbers, it's there. You might run into 
uses of rand() in other people's code, but it'll never confuse you as to what 
the calling convention is. However, I do think that the presence of rand() in 
the numpy.* namespace is probably throwing people off. They default to rand() 
and chafe against its API even though they'd be much happier with the 
tuple-based API.

But mostly my opposition follows from this observation: making rand() cleverly 
handle its argument tuple *does not solve the problem*. A polymorphic rand() 
will *still* be inconsistent with ones() and zeros() because ones() and zeros() 
won't be polymorphic, too. And they can't be really; the fact that they take 
other arguments besides the shape tuple makes the implementation and use idioms 
rather harder than for rand(). And mark my words, if we make rand() polymorphic, 
we will get just as many newbies coming to the list asking why ones(3, 4) 
doesn't work. I've already described how I feel that this would just trade one 
inconsistency (between different functions) for another (between different uses 
of the same function). I find the latter much worse.

To summarize: while I'd rather just leave things as they are, I realize that 
feeling is more from spite than anything else, and I'm above that. Mostly. I 
*do* think that the brouhaha will rapidly diminish if rand() and rand() are 
simply removed from numpy.* and left in numpy.random.* where they belong. While 
I'm sure that some who have been pushing for consistency the hardest will still 
grumble a bit, I strongly suspect that no one would have thought to complain if 
this had been the configuration from the very beginning.

Okay, now I think I've officially spent more time on this email than I ever did 
using or implementing rand().

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco