[Tutor] I can't believe this needs to be this complex

Dick Moores rdm at rcblue.com
Sat Aug 2 15:27:21 CEST 2008


At 05:02 AM 8/2/2008, Kent Johnson wrote:
>On Sat, Aug 2, 2008 at 6:07 AM, Dick Moores <rdm at rcblue.com> wrote:
> > I'm pretty new to Python's dictionaries, but I had a need for a function
> > that would find the values in a dict that have more than one key each.
>
> From your sample output it appears that you want not just the values,
>but a list of (value, keys) pairs for which there are more than one
>key. It is easy to just build a reverse dict and filter its items:
>
>In [15]: from collections import defaultdict
>In [16]: rev=defaultdict(list)
>In [18]: for k, v in d.iteritems():
>     rev[v].append(k)
>
>In [20]: [ [v, keys] for v, keys in rev.iteritems() if len(keys) > 1 ]
>
>Out[20]:
>[[1, ['a', 'e', 'g']],
>  [2, ['b', 'f', 'i', 'h']],
>  [4, ['d', 'j']],
>  ['U.S. Senator', ['John McCain', 'Barack Obama']],
>  [56, [45, 55]]]
>
>This also has the advantage of making only two passes over the data so
>its performance should be O(n). Your solution and Andre's make one or
>more passes over the data for each data element so they will have
>O(n*n) performance meaning for a large dict they could be very slow.

Wow, the genius' genius appears!

You're using some Python tools I didn't know about. More study!

I made a function of your code and time tested it against mine, using 
as d that dict of colors at <http://py77.python.pastebin.com/f796752ff>.

Yours is 73 times faster!

In [12]: run -t -N10 fcn_values_dupe_keys.py
Time was 0.297 seconds
Time was 0.328 seconds
Time was 0.344 seconds
Time was 0.328 seconds
Time was 0.313 seconds
Time was 0.344 seconds
Time was 0.39 seconds
Time was 0.297 seconds
Time was 0.312 seconds
Time was 0.297 seconds

IPython CPU timings (estimated):
Total runs performed: 10
   Times :      Total       Per run
   User  : 3.3092253345 s, 0.33092253345 s.
   System:        0.0 s,        0.0 s.

In [13]: run -t -N10 kent1.py
Time was 0 seconds
Time was 0 seconds
Time was 0 seconds
Time was 0 seconds
Time was 0 seconds
Time was 0.015 seconds
Time was 0 seconds
Time was 0 seconds
Time was 0 seconds
Time was 0 seconds

IPython CPU timings (estimated):
Total runs performed: 10
   Times :      Total       Per run
   User  : 0.044969961266 s, 0.0044969961266 s.
   System:        0.0 s,        0.0 s.

BTW Kent, I'm going to take this opportunity to ask you about 
"System" in the IPython timing results. It's always zero for the code 
I time. What's an example of code that would have System be greater 
than zero? And what's the distinction between User and System? (I'm 
using Win XP, if that's relevant.)

Thanks,

Dick



More information about the Tutor mailing list