[Numpy-discussion] Args for rand and randn: call for a vote
Robert Kern
robert.kern at gmail.com
Wed Jul 12 00:05:35 EDT 2006
Travis Oliphant wrote:
> So, I'm opposed to getting rid of the *args based syntax. My feelings
> are weaker regarding adding the capability to rand and randn to accept a
> tuple. I did test it out and it does seem feasible to add this feature
> at the cost of an additional comparison. I know Robert is opposed to it
> but I'm not sure I understand completely why.
>
> Please correct me if I'm wrong, but I think it has something to do with
> making the random_sample and standard_normal functions irrelevant and
> unnecessary combined with my hypothesis that Robert doesn't like the
> *args-style syntax. Therefore, adding support to rand and randn for
> tuples, would make them the default random-number generators and there
> would be proliferating code that was "harder to read" because of the
> different usages.
My opposition has much to do with my natural orneriness, I suppose. However, for
most of this argument I've never seen a reason why rand() ought to be considered
a peer of ones() and zeros() and thus have any need to be "consistent" in this
one respect with them. I've always considered ones() and zeros() to be
fundamental constructors of basic arrays of an arbitrary dtype that (most
likely) you'll be mangling up immediately. I've seen rand() and all of the other
RandomState methods as simply functions that return arrays. In that respect,
they have more in common with arange() than anything else. However, I've come to
realize that because rand() is exposed in the top-level namespace, other people
probably *are* seeing it as another fundamental constructor of arrays (albeit of
a single dtype). When I was writing the package, I never even considered that
some of its functions might be imported into the top-level namespace. I'm
somewhat more sympathetic to the confusion caused by the rand()-as-constructor
viewpoint, now. That's why my currently preferred compromise position is to
remove rand() from numpy.* and leave it in numpy.random*.
I also have found that I just don't use those functions convenient in "real"
code. They usually come into play when I need one or lots of arbitrary, but
non-degenerate-with-high-probability arrays to as a test or demo input to
another piece of code. Otherwise, I probably need something more sophisticated.
Manually shifting and scaling standard variates makes the code more verbose
without making it more comprehensible.
I do indeed dislike functions that try to be clever in determining what the user
wanted by the number and types of arguments provided. Even if they are easy to
implement robustly (as in this case), I think that it makes explaining the
function more complicated. It also makes the same function be called in two very
different ways; I find this break in consistency rather more egregious than two
different functions, used for different purposes in different circumstances,
being called in two different ways. In my experience, it's much more likely to
cause confusion.
I'm currently trudging my way through a mountain of someone else's C++ code at
the moment. There are good uses of C++'s function overloading, for example to do
type polymorphism on the same number of arguments when templates don't quite cut
it. However, if it is abused to create fundamentally different ways to call the
same function, I've found that the readability drops rapidly. I have not been a
happy camper.
I have yet to see a good exposition of the actual *problems* the current
situation is causing. Most of the argument in favor of changing anything has
been a call for consistency. However, consistency can never be an end of itself.
It must always be considered a means to achieve a comprehensible, flexible,
usable API. But it's only one of several means to that goal, and even that goal
must be weighed against other goals, too.
When real problems were discussed, I didn't find the solution to fit those
problems. Alan said that it was an annoyance (at the very least) to have to
teach students that rand() is called differently from ones(). The answer here is
to not teach rand(); use the functions that follow the one convention that you
want to teach. Also teach the students to read docstrings so if they come across
rand() in their copious spare time, they know what's what.
Another almost-compelling-to-me real problem was someone saying that they always
had to pause when writing rand() and think about what the calling convention
was. My answer is similar: don't use rand(). This problem, as described, is a
"write-only" problem; if you want consistency in the calling convention between
ones() et al. and your source of random numbers, it's there. You might run into
uses of rand() in other people's code, but it'll never confuse you as to what
the calling convention is. However, I do think that the presence of rand() in
the numpy.* namespace is probably throwing people off. They default to rand()
and chafe against its API even though they'd be much happier with the
tuple-based API.
But mostly my opposition follows from this observation: making rand() cleverly
handle its argument tuple *does not solve the problem*. A polymorphic rand()
will *still* be inconsistent with ones() and zeros() because ones() and zeros()
won't be polymorphic, too. And they can't be really; the fact that they take
other arguments besides the shape tuple makes the implementation and use idioms
rather harder than for rand(). And mark my words, if we make rand() polymorphic,
we will get just as many newbies coming to the list asking why ones(3, 4)
doesn't work. I've already described how I feel that this would just trade one
inconsistency (between different functions) for another (between different uses
of the same function). I find the latter much worse.
To summarize: while I'd rather just leave things as they are, I realize that
feeling is more from spite than anything else, and I'm above that. Mostly. I
*do* think that the brouhaha will rapidly diminish if rand() and rand() are
simply removed from numpy.* and left in numpy.random.* where they belong. While
I'm sure that some who have been pushing for consistency the hardest will still
grumble a bit, I strongly suspect that no one would have thought to complain if
this had been the configuration from the very beginning.
Okay, now I think I've officially spent more time on this email than I ever did
using or implementing rand().
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion
mailing list