[Tutor] Sorting a dictionary on a value in a list.
Lawrence Wickline
lawrence.wickline at gmail.com
Thu Dec 4 19:48:54 CET 2008
Thanks for the help I think I got it.
As far as lines go I believe it will be processing hundreds of
thousands of lines if not a million or more lines per run. I haven't
gotten to do a full run but it has been running acceptably fast on my
test files.
I ended up putting it into a main function and adding:
if __name__ == "__main__":
main()
On Dec 3, 2008, at 5:42 PM, Kent Johnson wrote:
> On Wed, Dec 3, 2008 at 7:58 PM, Lawrence Wickline
> <lawrence.wickline at gmail.com> wrote:
>
>> how would I sort on bytes sent?
>
> You can't actually sort a dictionary; what you can do is sort the
> list of items.
>
> In this case each item will look be a tuple
> (filename, (bytes, bytes_sent))
> and dict.items() will be a list of such tuples.
>
> The best way to sort a list is to make a key function that extracts a
> key from a list item, then pass that to the list sort() method. In
> your case, you want to extract the second element of the second
> element, so you could use the function
> def make_key(item):
> return item[1][1]
>
> Then you can make a sorted list with
> sorted(dict.items(), key=make_key)
>
>> how would I make this more efficient?
>
> It looks pretty good to me. A few minor notes below.
>
>> code:
>>
>> # Expect as input:
>> # URI,
>> 1,return_code,bytes,referer,ip,time_taken,bytes_sent,ref_dom
>> # index 0 1 2 3 4 5 6 7 8
>>
>> import sys
>>
>>
>> dict = {}
>
> Don't use dict as the name of a variable, it shadows the built-in
> dict() function.
>
>> def update_dict(filename, bytes, bytes_sent):
>> # Build and update our dictionary adding total bytes sent.
>> if dict.has_key(filename):
>> bytes_sent += dict[filename][1]
>> dict[filename] = [bytes, bytes_sent]
>> else:
>> dict[filename] = [bytes, bytes_sent]
>
> If you really want to squeeze every bit of speed,
> filename in dict
> is probably faster than
> dict.has_key(filename)
> and you might try also using a try / catch block instead of has_key().
> You could also try passing dict as a parameter, that might be faster
> than having it as a global.
>
> None of these will matter unless you have many thousand lines of
> input. How many lines do you have? How long does it take to process?
>
>> # input comes from STDIN
>> for line in sys.stdin:
>> # remove leading and trailing whitespace and split on tab
>> words = line.rstrip().split('\t')
>
> rstrip() removes only trailing white space. It is not needed since you
> don't use the last field anyway.
>
>> file = words[0]
>> bytes = words[3]
>> bytes_sent = int(words[7])
>> update_dict(file, bytes, bytes_sent)
>
> If you put all this into a function it will run a little faster.
>
> Kent
More information about the Tutor
mailing list