histogram type thingy for (unique) dict items

Mon Mar 29 21:01:03 EST 2004

hi. I've been banging my head against this one a while and have asked
around, and i
am throwing this one out there in the hopes that some one can shed
some light on
what has turned out to be a tough problem for me (though i am getting
closer).

i have been mucking with a lot of data in a dictionary that looks
like:

events = { (2, 1, 0) : [8, 3, 5, 4], 
     (2, 7, 0) : [4, 3, 2, 2], 
     (2, 14, 0) : [8, 3, 5, 4], 
     (2, 18, 0) : [10, 2, 8, 7], 
     (2, 20, 0) : [10, 0, 5, 7], 
     (2, 22, 0) : [10, 2, 8, 7], 
     (2, 24, 0) : [7, 9, 3, 8], 
     (2, 28, 0) : [10, 0, 5, 7], 
     (2, 29, 0) : [10, 11], 
     (2, 30, 0) : [8, 3, 5, 4], 
     (2, 32, 0) : [5, 0, 10, 7], 
     (2, 34, 0) : [8, 3, 7, 9], 
     (2, 36, 0) : [5, 4, 3, 1], 
     (2, 36, 1) : [5, 4, 3, 1, 7], # GNA
     (2, 37, 0) : [0, 8, 2, 4, 9, 10, 1], 
     (2, 37, 1) : [0, 8, 2, 4, 9, 10, 1, 6], # GNA
     (2, 39, 0) : [8, 10, 1, 9], 
     (2, 39, 1) : [8, 10, 1, 9, 7], # GNA
     (2, 41, 0) : [2, 0, 3, 1], 
     (2, 41, 1) : [2, 0, 3, 1, 6], # GNA
    # ~~~~~~~~~~~~~~~~~~~~ page 3 ~~~~~~~~~~~~~~~~~~~~
     (3, 43, 0) : [3, 2, 4], 
     (3, 44, 0) : [0, 8, 2, 4, 9, 10, 1], 
     (3, 44, 1) : [0, 8, 2, 4, 9, 10, 1, 6] } # GNA

pages and pages of it, this is just a tiny slice...

The tuple (key) represents a point time (page, line, event) and the
lists are my values for
that time. It happens that there are many times that have the same
data values [that is,
the list items are the same]... what i want to do now is take each
_unique_ value
list (there are only 120 or so of them as my unique.py function
reports) and tell me where they occur so that, i would get a kind of
histogram with the values and a list of all the places this
item occurs. A further wrinkle (the one i am really stuck on) is that
i want just one sorted
value list and not several different entries for what is the same
input set in a different
order so that not all my time value pairs are accounted for properly.
Ideally all the
keys (locations/times) that fit the same values would be grouped
together
so that the input of:

foo = { (5, 138, 1) : [ 0, 2, 7 ], 
	(7, 264, 1) : [ 0, 2, 7 ], 
	(9, 367, 0) : [ 0, 2, 7 ], 
	(5, 156, 1) : [ 0, 7, 2 ], 
	(8, 315, 1) : [ 0, 7, 2 ], 
	(8, 317, 1) : [ 0, 7, 2 ] }

would give me the [unique sorted value](0, 2, 7)
with all the applicable locations.
(0, 2, 7) --> [(7, 264, 1), (5, 138, 1), (5, 156, 1),(8, 315, 1), (8,
317, 1), (9, 367, 0)]

instead i get some locations listed for sets:
    (0, 2, 7)
    (0, 7, 2)
    (2, 0, 7)
    (2, 7, 0)
    (7, 0, 2)

etc. though for my purposes these are the same and i want them all to
be together.

i know that if it was i list on input i can do this:

    # Now sort each individual set, but we don't want to have dupl.
elements either so we call unique here too
    for each_event2 in all_events2:
        ndupes = unique(each_event2)
        ndupes.sort()
        outlist.append(ndupes)
    # find only the unique sets
    unique_items = unique(outlist)
    unique_item_count = 1

but i am not sure how to handle this in the dictionary scenario, since
in this case i
am almost treating values as keys and vice versa, and need only unique
sorted item, so
that all the times that show up for

    (0, 2, 7)
    (0, 7, 2)
    (2, 0, 7)
    (2, 7, 0)
    (7, 0, 2) ...

are together and accounted for (ideally, id like to count the items
too)

I other words i want a sort of histogram of each unique event telling
me at what key
they happen and how many of each unique event there is.

i've searched http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/python-Tutor
and
the cookbook and i don't see anything, except stuff like
http://aspn.activestate.com/ASPN/Python/Cookbook/Recipe/52306
and a google turns up some crazy incomprehensible very advanced bells
and
whistles graphic things that just is a heck of a lot more than i can
wrap my pea sized
brain around and is just way more than i need.

So, if anyone has such a beast or wants to help me get started i would
be grateful.  As you can see my code is getting more an more tangled
up:

def histo(input):
    events = input.copy()
    all_events = events.values()[:]   # make new lists to be safe.
    outlist = []
    # Now sort each individual set, but we don't want to have
duplicate
    # elements either so we call unique here too
    for each_event in all_events:
        ndupes = unique(each_event)
        ndupes.sort()
        outlist.append(ndupes)
    # find only the unique sets
    unique_items = unique(outlist)
    unique_item_count = 1
    line_num = 1
    print '=' * 62, '\n  -+ ---------- +- sets & their locations', 
    print '... (', len(unique_items), ') events :\n', '-' * 62
    print "line#\tset\tLocations:"
    n = {}
    for key in events.keys():
        try:
            n[tuple(events[key])].append(key)
        except KeyError:
            n[tuple(events[key])] = [key]
    nsortkeys = n.keys()        # first make a copy of the keys
    nsortkeys.sort()            # now sort that copy in place
    foobar = n.keys()
    print foobar
    foo2 = unique(foobar)
    for xx in foo2:
        print 'xx = ', xx
    for key, value in n.items():
        print line_num,"\t", str(key), "\t", str(value)
        line_num = line_num + 1
####

cheers,
kevin