Multiple disjoint sample sets?
roy at panix.com
Fri Jan 11 15:15:29 CET 2013
I have a list of items. I need to generate n samples of k unique items
each. I not only want each sample set to have no repeats, but I also
want to make sure the sets are disjoint (i.e. no item repeated between
random.sample(items, k) will satisfy the first constraint, but not the
second. Should I just do random.sample(items, k*n), and then split the
resulting big list into n pieces? Or is there some more efficient way?
len(items) = 5,000,000
n = 10
k = 100,000
More information about the Python-list