[Tutor] pruning and ordering a list of lists

William O'Higgins Witteman hmm at woolgathering.cx
Sat Mar 24 02:20:10 CET 2007


On Fri, Mar 23, 2007 at 05:31:51PM -0700, Bob Gailer wrote:
>William O'Higgins Witteman wrote:
>>I have a list of lists derived from a log file that I want to create a
>>summary of, but I am not sure of an approach to do what I need.
>>
>>Here's a sample of the data:
>>
>>[["user1","18/Mar/2007:07:52:38 -0400"],["user1","18/Mar/2007:07:52:40 
>>-0400"],["user2","18/Mar/2007:07:52:42 
>>-0400"],["user3","18/Mar/2007:07:52:42 
>>-0400"],["user2","18/Mar/2007:07:52:43 -0400"]]
>>
>>What I want as output is something like this:
>>
>>[["first user alphabetically","most recent timestamp for this 
>>user"],["second user alphabetically","most recent timestamp for this 
>>user"], ...]
>>
>>Can anyone suggest an approach for this?  Thanks.
>>  
># following code is untested
># assume your data is in variable log:
>userData = {} # setup a dictionary to collect latest timestamp for each user
>for user, timestamp in log:
> if user not in userData or timestamp > userData[user]
>   # note that we need a way to compare timestamps
>   # the current representation does not support this
>   userData[user] = timestamp
>userData2 = userData.items().sorted()

Thank you.  I found a similar solution myself while waiting.  I was
stuck with thinking about the output being a list of lists, but once I
thought of it as a dictionary the solution came much more easily.
Here's the code, including timestamp conversions:

#!/usr/bin/python

import time

def userlists(usertimepairs):
  
  userandtoptimes = {}
  for line in usertimepairs:
    line[0] = line[0].lower()
    if userandtoptimes.has_key(line[0]):
      a = time.strptime(userandtoptimes[line[0]],"%d/%b/%Y:%H:%M:%S")
      prevtime = time.mktime(a)
      b = time.strptime(line[1],"%d/%b/%Y:%H:%M:%S -0400")
      thistime = time.mktime(b)
      if thistime > prevtime:
        c = time.gmtime(thistime)
        d = time.strftime("%d/%b/%Y:%H:%M:%S",c)
        userandtoptimes[line[0]] = d
      else:
        pass
    else:
      e = time.strptime(line[1],"%d/%b/%Y:%H:%M:%S -0400")
      f = time.strftime("%d/%b/%Y:%H:%M:%S",e)
      userandtoptimes[line[0]] = f

  #debug print(userandtoptimes)

  # Output to CSV file
  for user, timestamp in userandtoptimes.iteritems():
    op.write(user + "," + timestamp + "\n")

The time is not perfect, because of the discarded GMT offset, but it is
good enough, and by converting to seconds since the epoch the
comparisons are much simpler.  Thanks again.
-- 

yours,

William


More information about the Tutor mailing list