[ python-Bugs-1460340 ] random.sample can raise KeyError

SourceForge.net noreply at sourceforge.net
Wed Mar 29 12:07:45 CEST 2006

Bugs item #1460340, was opened at 2006-03-29 00:05
Message generated for change (Comment added) made by phr
You can respond by visiting: 

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.3
Status: Closed
Resolution: Fixed
Priority: 5
Submitted By: paul rubin (phr)
Assigned to: Nobody/Anonymous (nobody)
Summary: random.sample can raise KeyError

Initial Comment:
I have only tested this in 2.3 and the relevant code in
random.py has changed in the current svn branch, but
from inspection it looks to me like the bug may still
be there.

If you initialize a dictionary as follows:




raises KeyError most times that you call it.


>Comment By: paul rubin (phr)
Date: 2006-03-29 10:07

Logged In: YES 

Actually the previous comment is wrong too; 99% of the time,
sample(a,1) will return None since that's the value
connected to every key in the dictionary, i.e. it's
population[j] for every j.  The other 1% of the time, the
dict gets converted to a list, and the sample returns a key
from the dict rather than a value, which is certainly wrong.
 And you can see how the probabilities are still messed up
if the values in the dict are distinct.

I think it's ok to give up on dicts, but some warning should
about it be added to the manual unless dict-like things
somehow get detected in the code.  It would be best to test
for the object really being a sequence, but I don't know if
such a test exists.  Maybe one should be added.

I'll leave it to you guys to reopen this bug if appropriate.


Comment By: paul rubin (phr)
Date: 2006-03-29 09:46

Logged In: YES 

I don't think the fix at 43421 is quite right, but I can't
easily test it in my current setup.  Suppose

a = dict.fromkeys(range(99) + ['x'])
b = random.sample(a,1)

99% of the time, there's no KeyError and b gets set to [j]
where j is some random integer.

1% of the time, there's a KeyError, random.sample is called
recursively, and the recursive call returns [some integer j]
99% of the time, and returns ['x'] 1% of the time.

So in total, ['x'] gets returned .01% of the time instead of
1% of the time.

I think it's better to not set result[i]=population[j]
inside the loop.  Instead, just build up the selected set
until it has enough indices; then try to make a result list
using those indices, and if there's a KeyError, convert the
population to a list and use the same selection set to make
the results.

gbrandl also correctly points out that a dict is not a
sequence type, so maybe it's ok to just punt on dicts.  But
it's obvious from the code comments that somebody once
wanted dicts to work, and it's reasonable for people to want
sets to work.


Comment By: Raymond Hettinger (rhettinger)
Date: 2006-03-29 09:17

Logged In: YES 

Checked-in a fix.
See revision 43420 and 43421.


Comment By: Georg Brandl (gbrandl)
Date: 2006-03-29 09:11

Logged In: YES 

random.sample is documented to take a sequence as its first
argument. A dict is not a sequence type.

I think you want
random.sample(a.keys(), 3)
or, for large dicts,
random.sample(a.iterkeys(), 3)


You can respond by visiting: 

More information about the Python-bugs-list mailing list