random.boolean or bernoulli

I am yet a novice python user, and not a strong statistician to boot, but I had an idea about how to enhance the 'random' module with one or two new functions for generating boolean values instead of floating-point values. This idea has a lot of flexibility in how it might be implemented, but I propose two new functions for the random module that might be implemented in python as follows def boolean(): return choice((True,False)) def bernoulli(p): return random() <= p It's true that since both of these functions have very simple short-statement implementations that it might be unnecessary baggage, but it occurred to me that their implementation might be consistent with the rest of the module. ~eblume

Erich Blume, 20.04.2011 23:37:
I am yet a novice python user, and not a strong statistician to boot, but I had an idea about how to enhance the 'random' module with one or two new functions for generating boolean values instead of floating-point values.
This idea has a lot of flexibility in how it might be implemented, but I propose two new functions for the random module that might be implemented in python as follows
def boolean(): return choice((True,False))
I like this one. It reads well in code: if random.boolean(): ... It reads less well with a from-import: if boolean(): ... But that's just a matter of renaming while importing it: from random import boolean as random_choice if random_choice(): ...
def bernoulli(p): return random()<= p
This seems less obvious: if random.bernoulli(0.5): ... Who's Random Bernoulli anyway? Stefan

On Apr 20, 2011, at 8:49 PM, Stefan Behnel wrote:
Erich Blume, 20.04.2011 23:37:
I am yet a novice python user, and not a strong statistician to boot, but I had an idea about how to enhance the 'random' module with one or two new functions for generating boolean values instead of floating-point values.
This idea has a lot of flexibility in how it might be implemented, but I propose two new functions for the random module that might be implemented in python as follows
def boolean(): return choice((True,False))
I like this one. It reads well in code:
if random.boolean(): ...
The traditional way to spell it is: if random() < 0.5: ... Raymond

Raymond Hettinger, 21.04.2011 05:55:
On Apr 20, 2011, at 8:49 PM, Stefan Behnel wrote:
Erich Blume, 20.04.2011 23:37:
I am yet a novice python user, and not a strong statistician to boot, but I had an idea about how to enhance the 'random' module with one or two new functions for generating boolean values instead of floating-point values.
This idea has a lot of flexibility in how it might be implemented, but I propose two new functions for the random module that might be implemented in python as follows
def boolean(): return choice((True,False))
I like this one. It reads well in code:
if random.boolean(): ...
The traditional way to spell it is:
if random()< 0.5: ...
When I see constructs like this, my first thought is "Is there an off-by-one error here?", which then distracts my reading. It obviously wouldn't even matter here, since the randomness properties of random() are likely not good enough to see any difference, but that's second thought to me. It starts off by getting in my way. Stefan

On Apr 20, 2011, at 2:37 PM, Erich Blume wrote:
I am yet a novice python user, and not a strong statistician to boot, but I had an idea about how to enhance the 'random' module with one or two new functions for generating boolean values instead of floating-point values.
ISTM, it would be better if you first gained some experience using the module as-is.
This idea has a lot of flexibility in how it might be implemented, but I propose two new functions for the random module that might be implemented in python as follows
def boolean(): return choice((True,False))
def bernoulli(p): return random() <= p
It's true that since both of these functions have very simple short-statement implementations that it might be unnecessary baggage,
I agree that they are unnecessary baggage. AFAICT, other languages have avoided adding this sort of thing. We already have randrange(), so this is just an inflexible specialization. It is better to propose ideas that substantially increase the power of the module, not ones that offer trivial respellings. Raymond

On 4/21/2011 1:11 AM, Raymond Hettinger wrote:
It's true that since both of these functions have very simple short-statement implementations that it might be unnecessary baggage,
I agree that they are unnecessary baggage. AFAICT, other languages have avoided adding this sort of thing. We already have randrange(), so this is just an inflexible specialization. It is better to propose ideas that substantially increase the power of the module, not ones that offer trivial respellings.
Well put. -1 from me also. -- Terry Jan Reedy

Erich Blume wrote:
def bernoulli(p): return random() <= p
When I use a function like this I've been calling it chance(), which seems less jargony.
def boolean(): return choice((True,False))
Since this is equal to chance(0.5), I'm not sure it's worth it. A chance of exactly 50% seems like a rather special case. -- Greg

On Thu, Apr 21, 2011 at 8:46 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Erich Blume wrote:
def bernoulli(p): return random() <= p
When I use a function like this I've been calling it chance(), which seems less jargony.
def boolean(): return choice((True,False))
Since this is equal to chance(0.5), I'm not sure it's worth it. A chance of exactly 50% seems like a rather special case.
What about just: def chance(n=0.5): return random() < n ? As for "unnecessary baggage" --about half the random module - randrange, randint, uniform, choice,could be viewed as "unecessary baggae" -- but the ability of cleamly specifying what one wants on the random module through this functions, instead of having to deal with the raw "random floating point from 0.0 to 1.0" as all other languages is what probably made me start using Python, about 10 years ago. js -><-
-- Greg _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Joao S. O. Bueno writes:
What about just:
def chance(n=0.5): return random() < n
n=0.5 just isn't that special. Anyway, EIBTI. Also, I almost certainly wouldn't bother to use random.chance() if it existed. I'd simply use application-specific definitions like def random_gender(): return 'female' if random() < p else 'male' (it's quite rare that I actually want True and False as the values of chance()-like functions).
As for "unnecessary baggage" --about half the random module - randrange, randint, uniform, choice,could be viewed as "unecessary baggae"
Sure, this is always in the eye of the beholder. My taste is that the module gets it about right. I've never needed the full functionality of randrange, but I can imagine others using it fairly frequently, and if I did need it it's cheaper to look it up than to code it up. I do often use "choice" and somewhat less often "shuffle". These are somewhat tedious to implement. OTOH, to a mathematician, random() immediately makes one want to ask, "what distribution?" So uniform() really is needed.

Stephen J. Turnbull wrote:
I'd simply use application-specific definitions like
def random_gender(): return 'female' if random() < p else 'male'
The benefit of chance() is that it saves you from having to think "Should that be < p or > p?", and readers of the code thinking "Now does that mean a probability of p or 1-p?" The answers to those questions might be second nature to you, but it's not necessarily so for others. -- Greg

On Apr 21, 2011, at 8:30 AM, Joao S. O. Bueno wrote:
What about just:
def chance(n=0.5): return random() < n
Come on people. This is junk and bad design. Besides being unnecessary, trivial, opaque, and slow, it has other issues like: * no range checking for n<0 or n>1 * bad choice of argument name (p or x is used for probability while n is typically an integer representing a count) * not obvious that it returns a boolean * not obvious that a uniform distribution is presumed * a name that will have difference interpretations for different people and make not make sense in a given context. * no parallels in other languages (even Excel doesn't have this). * it presumes that our users are not very bright and are in dire need of the language being dumbed down. I'm amazed (and a little appalled) that the python-ideas crowd would entertain adding this to a mature module like random. Guido has had twenty years to put something like this in the module (I believe he was the original writer) and likely didn't do so for a good reason. Even stats packages don't seem to include anything this mundane. The needs to be some effort to not make modules unnecessarily fat and to limit feature creep except for tools that greatly improve expressive power. Raymond P.S. Bernoulli isn't even jargon; it's a person's name. A Bernoulli trial just means that events are independent. It doesn't imply anything about a distribution or population of possible result values.

On 4/21/11 1:46 PM, Raymond Hettinger wrote:
P.S. Bernoulli isn't even jargon; it's a person's name. A Bernoulli trial just means that events are independent. It doesn't imply anything about a distribution or population of possible result values.
Actually, it is the canonical name of a particular discrete probability distribution. *If* one cared to add it, it would be a perfectly fine name for it, though "bernoullivariate" might fit better with the other named distributions. http://en.wikipedia.org/wiki/Bernoulli_distribution -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Robert Kern wrote:
On 4/21/11 1:46 PM, Raymond Hettinger wrote:
P.S. Bernoulli isn't even jargon; it's a person's name. A Bernoulli trial just means that events are independent. It doesn't imply anything about a distribution or population of possible result values.
Actually, it is the canonical name of a particular discrete probability distribution. *If* one cared to add it, it would be a perfectly fine name for it, though "bernoullivariate" might fit better with the other named distributions.
I don't think those suggested functions are really missing from the random module. Distributions that are missing (at least AFAIK) are these two important discrete distributions: One which returns integers distributed according to the binomial distribution B(n,p): http://en.wikipedia.org/wiki/Binomial_distribution (The Bernoulli distribution is a special case (B(1,p)).) The other important discrete distribution missing is the Poisson distribution Pois(lambda): http://en.wikipedia.org/wiki/Poisson_distribution Both are often used in simulations and test data generators for real world discrete events. In the past we've used Ivan Frohne's fine rv module for these, but it would be great if they could be added to the standard random module (taking benefit of the much better performance this provides): http://svn.scipy.org/svn/scipy/tags/pre_numeric/Lib/stats/rv.py Perhaps a nice project for a GSoC student... -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 21 2011)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Robert Kern wrote:
On 4/21/11 1:46 PM, Raymond Hettinger wrote:
P.S. Bernoulli isn't even jargon;
Actually, it is the canonical name of a particular discrete probability distribution. *If* one cared to add it, it would be a perfectly fine name for it, though "bernoullivariate" might fit better with the other named distributions.
The problem is that if bernoullivariate(0.5): do_something() doesn't help to make the code any clearer to someone who isn't steeped in statistical theory, whereas chance() has a suggestive meaning to just about everyone. -- Greg

On Thu, Apr 21, 2011 at 12:30:07PM -0300, Joao S. O. Bueno wrote:
On Thu, Apr 21, 2011 at 8:46 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote: [snip all this crap]
I find this idea to be more of a subroutine then a function. They're essentially the same but the principal is different. If you would find a random bool and random boyardee or whatever function useful in your program, you're free to implement it for yourself. That doesn't mean we should add it to the stdlib, though.
participants (10)
-
Erich Blume
-
Greg Ewing
-
Joao S. O. Bueno
-
M.-A. Lemburg
-
Raymond Hettinger
-
Robert Kern
-
Stefan Behnel
-
Stephen J. Turnbull
-
Terry Reedy
-
Westley Martínez