[Tutor] (No Subject)

Paul Sidorsky paulsid@shaw.ca
Thu, 21 Feb 2002 12:12:19 -0700

kevin parks wrote:

> so i am starting with something like this (I am at a PC room with no
> interpreter so if i make a type forgive me)... my version...
> was i headed down the right path here?

At a glance your version looks fine.

> someone said that the subtraction is not
> needed and the comparison with 1.0 can then be eliminated.  By putting largest
> weights first, the remaining number of comparisons is minimized:
> list1 = [('three', 0.5), ('two', 0.75)] # else 'one'
[snip again]
> I don't understand this since then our last choice has to be hardwired in that
> second return statement. I am guessing that the 0.75 is a type and was supposed
> to be a 0.25

Looks like it.  I like this solution a little better, except the
hardcoding doesn't seem to be necessary.  If the probabilities add up to
1.0 the last item should always be picked.  I suppose it might be
hardcoded to offset errors caused by floating point arithmetic, but
actually all you need to do is make the last probability in the list 1
and then it will always be picked anyhow.  Of course that may make
things unclear.

> I am not yet able to get my head around this one but i put it through danny's histogram()
> and it seemed to work.  

Yeah, I find that one rather hard to understand as well.

> Which approach is more better (more robust, faster, all-purpose,
> efficient, etc.)

Your version should be fine.  It may not be the fastest but it seems
reasonable to me and it is fairly easy to understand.

However, if I were going to do this kind of thing a lot I would want to
use a class.  One advantage of the class is that it can do all of the
maintenance work upon addition/deletion, which makes picking faster. 
Presumably all of the adding will be done at the start and then it will
just be picking afterwards, so ideally picking should be the fastest

This sounded like a fun morning project so I came up with the class
below.  Note that deletion is supported but is relatively inefficient. 
The assumption is that deletions normally aren't needed for this type of
operation so it doesn't need to be fast.  Also this class isn't as
robust as it should be; it's just a starting point.


import random

class WeightedList:
    def __init__(self):
        self._total = 0.0
        self._items = []

    def add(self, obj, pct):
        if 100.0 - self._total < pct:
            raise "Percentage too high"
        self._items.append((obj, self._total, self._total + pct))
        self._total = self._total + pct

    def pick(self):
        if not self._items:
            raise "Nothing in weighted list"
        x = random.uniform(0, 1)
        for item in self._items:
            if item[1] <= x < item[2]:
                return item[0]
        # Force return of an item even if percentages don't sum
        # to 1.0 - this may be undesirable for some applications.
        return item[0]

    def remove(self, index):
        self._items[index:index+1] = []

    def reassign(self):
        self._total = 0.0
        newitems = []
        for item in self._items:
            pct = item[2] - item[1]
            newitems.append((item[0], self._total, self._total + pct))
            self._total = self._total + pct
        self._items = newitems            

    def debugprint(self):
        i = 0
        for item in self._items:
            print "[%i] %s, %.3f-%.3f (%.1f%%)" % (i, item[0],
                item[1], item[2], (item[2] - item[1]) * 100.0)
            i = i + 1

if __name__ == "__main__":
    wl = WeightedList()
    wl.add("Red", 0.1)
    wl.add("Yellow", 0.4)
    wl.add("Green", 0.3)
    wl.add("Blue", 0.6)
    for i in range(10):
        print wl.pick()


Here's a sample output from the test:

[0] Red, 0.000-0.100 (10.0%)
[1] Green, 0.100-0.400 (30.0%)
[2] Blue, 0.400-1.000 (60.0%)


Have fun!

Paul Sidorsky                                          Calgary, Canada
paulsid@shaw.ca                        http://members.shaw.ca/paulsid/