[Tutor] Sorting a dictionary on a value in a list.

Lawrence Wickline lawrence.wickline at gmail.com
Thu Dec 4 19:48:54 CET 2008


Thanks for the help I think I got it.

As far as lines go I believe it will be processing hundreds of  
thousands of lines if not a million or more lines per run. I haven't  
gotten to do a full run but it has been running acceptably fast on my  
test files.

I ended up putting it into a main function and adding:

if __name__ == "__main__":
    main()



On Dec 3, 2008, at 5:42 PM, Kent Johnson wrote:

> On Wed, Dec 3, 2008 at 7:58 PM, Lawrence Wickline
> <lawrence.wickline at gmail.com> wrote:
>
>> how would I sort on bytes sent?
>
> You can't actually sort a dictionary; what you can do is sort the  
> list of items.
>
> In this case each item will look be a tuple
> (filename, (bytes, bytes_sent))
> and dict.items() will be a list of such tuples.
>
> The best way to sort a list is to make a key function that extracts a
> key from a list item, then pass that to the list sort() method. In
> your case, you want to extract the second element of the second
> element, so you could use the function
> def make_key(item):
> return item[1][1]
>
> Then you can make a sorted list with
> sorted(dict.items(), key=make_key)
>
>> how would I make this more efficient?
>
> It looks pretty good to me. A few minor notes below.
>
>> code:
>>
>> # Expect as input:
>> #      URI, 
>> 1,return_code,bytes,referer,ip,time_taken,bytes_sent,ref_dom
>> # index 0  1       2       3      4    5      6           7        8
>>
>> import sys
>>
>>
>> dict = {}
>
> Don't use dict as the name of a variable, it shadows the built-in
> dict() function.
>
>> def update_dict(filename, bytes, bytes_sent):
>>  # Build and update our dictionary adding total bytes sent.
>>  if dict.has_key(filename):
>>      bytes_sent += dict[filename][1]
>>      dict[filename] = [bytes, bytes_sent]
>>  else:
>>      dict[filename] = [bytes, bytes_sent]
>
> If you really want to squeeze every bit of speed,
> filename in dict
> is probably faster than
> dict.has_key(filename)
> and you might try also using a try / catch block instead of has_key().
> You could also try passing dict as a parameter, that might be faster
> than having it as a global.
>
> None of these will matter unless you have many thousand lines of
> input. How many lines do you have? How long does it take to process?
>
>> # input comes from STDIN
>> for line in sys.stdin:
>>  # remove leading and trailing whitespace and split on tab
>>  words = line.rstrip().split('\t')
>
> rstrip() removes only trailing white space. It is not needed since you
> don't use the last field anyway.
>
>>  file = words[0]
>>  bytes = words[3]
>>  bytes_sent = int(words[7])
>>  update_dict(file, bytes, bytes_sent)
>
> If you put all this into a function it will run a little faster.
>
> Kent



More information about the Tutor mailing list