Tough sorting problem: or, I'm confusing myself
Peter Otten
__peter__ at web.de
Fri Apr 16 09:14:46 EDT 2010
david jensen wrote:
> Hi all,
>
> I'm trying to find a good way of doing the following:
>
> Each n-tuple in combinations( range( 2 ** m ), n ) has a corresponding
> value n-tuple (call them "scores" for clarity later). I'm currently
> storing them in a dictionary, by doing:
>
> ----
> res={}
> for i in itertools.combinations( range( 2**m ) , n):
> res[ i ] = getValues( i ) # getValues() is computationally
> expensive
> ----
>
> For each (n-1)-tuple, I need to find the two numbers that have the
> highest scores versus them. I know this isn't crystal clear, but
> hopefully an example will help: with m=n=3:
>
> Looking at only the (1, 3) case, assuming:
> getValues( (1, 2, 3) ) == ( -200, 125, 75 ) # this contains the
> highest "other" score, where 2 scores 125
> getValues( (1, 3, 4) ) == ( 50, -50, 0 )
> getValues( (1, 3, 5) ) == ( 25, 300, -325 )
> getValues( (1, 3, 6) ) == ( -100, 0, 100 ) # this contains the
> second-highest, where 6 scores 100
> getValues( (1, 3, 7) ) == ( 80, -90, 10 )
> getValues( (1, 3, 8) ) == ( 10, -5, -5 )
>
> I'd like to return ( (2, 125), (6, 100) ).
>
> The most obvious (to me) way to do this would be not to generate the
> res dictionary at the beginning, but just to go through each
> combinations( range( 2**m), n-1) and try every possibility... this
> will test each combination n times, however, and generating those
> values is expensive. [e.g. (1,2,3)'s scores will be generated when
> finding the best possibilities for (1,2), (1,3) and (2,3)]
>
> What I'm doing now is ugly, and i think is where i'm confusing myself:
>
> ----
> best2={}
> for i in itertools.combinations( range( 2**m), n-1):
> scorelist=[]
> for j in range( 2**m ):
> if j not in i:
> k=list(i)
> k.append(j)
> k=tuple(sorted(k)) #gets the key for looking up the
> scores in res
> scorelist.append((j,res[k][k.index(j)]))
> best2[i]=sorted(scorelist,key=lambda x: -x[1])[:2]
> ----
>
> Am I missing an obviously better way?
After some tinkering I came up with
def calculate(getvalues):
best2 = defaultdict(list)
for t in combinations(range(2**m), n):
values = getvalues(t)
for i, k in enumerate(t):
best2[t[:i] + t[i+1:]].append((k, values[i]))
return dict((k, nlargest(2, v, key=itemgetter(1))) for k, v in
best2.iteritems())
which is concise but slower than your code with Raymond's improvements. I
then tried inlining the nlargest() operation:
def calculate_inlined(getvalues):
best2 = {}
for t in combinations(range(2**m), n):
values = getvalues(t)
for i, k in enumerate(t):
key = t[:i] + t[i+1:]
value = values[i]
if key not in best2:
best2[key] = [(k, value), (None, None)]
else:
a, b = best2[key]
if value > a[1]:
best2[key] = [(k, value), a]
elif value > b[1]:
best2[key] = [a, (k, value)]
return best2
This gives a speed boost (I measured with m=n=5) but is both messy and
brittle. I've got a hunch that there is a better way; I just don't see it at
the moment...
Peter
More information about the Python-list
mailing list