[Tutor] sorting objects on two attributes

Michael H. Goldwasser goldwamh at slu.edu
Mon Mar 3 17:56:54 CET 2008



Hello Eric,

  Your basic outlook is fine, but you can do it much more efficiently
  with a single sort.   Here's the way I'd approach the task (untested):

  # --------------------------------------------------
  # first compute the latest date for each id group; uses O(n) time
  newest = {}
  for q in queryset:
      id,date = q.object_id, q.submit_date
      if id not in newest or date > newest[id]:
          newest[id] = date

  # now sort based on the following decorator as a key
  data.sort(reverse=True, key=lambda x: (newest[x.object_id], x.object_id, x.submit_date))
  # --------------------------------------------------

  In essence, you compute the max date within each group, but your
  approach (i.e., building explicit sublists and then repeatedly
  calling max on those sublists) is far more time-consuming than the
  above dictionary based approach.

  Note well that using a tuple as a decorator key is more efficient
  than calling sort separately for each subgroup.   The
  lexicographical order of the following

    (newest[x.object_id], x.object_id, x.submit_date)

  should produce the order that you desire. The reason for the middle
  entry is to ensure that items groups by object_id in the case that
  two different groups achieve the same maximum date.  It wasn't clear
  to me whether you wanted elements within groups from oldest to
  newest or newest to oldest. I believe that the code I give produces
  the ordering that you intend, but you may adjust the sign of the
  decorate elements if necessary.

With regard,
Michael

       +-----------------------------------------------
       | Michael Goldwasser
       | Associate Professor
       | Dept. Mathematics and Computer Science
       | Saint Louis University
       | 220 North Grand Blvd.
       | St. Louis, MO 63103-2007


On Monday March 3, 2008, Eric Abrahamsen wrote: 

>    I have a grisly little sorting problem to which I've hacked together a  
>    solution, but I'm hoping someone here might have a better suggestion.
>    
>    I have a list of objects, each of which has two attributes, object_id  
>    and submit_date. What I want is to sort them by content_type, then by  
>    submit_date within content_type, and then sort each content_type block  
>    according to which block has the newest object by submit_date. (This  
>    sequence of sorting might not be optimal, I'm not sure). I'm actually  
>    creating a list of recent comments on blog entries for a python-based  
>    web framework, and want to arrange the comments according to blog  
>    entry (content_type), by submit_date within that entry, with the  
>    entries with the newest comments showing up on top.
>    
>    I don't believe a single cmp function fed to list.sort() can do this,  
>    because you can't know how two objects should be compared until you  
>    know all the values for all the objects. I'd be happy to be proven  
>    wrong here.
>    
>    After some false starts with dictionaries, here's what I've got.  
>    Queryset is the original list of comments (I'm doing this in django),  
>    and it returns a list of lists, though I might flatten it afterwards.  
>    It works, but it's ghastly unreadable and if there were a more  
>    graceful solution I'd feel better about life in general:
>    
>    
>    def make_com_list(queryset):
>        ids = set([com.object_id for com in queryset])
>        xlist = [[com for com in queryset if com.object_id == i] for i in  
>    ids]
>        for ls in xlist:
>            ls.sort(key=lambda x: x.submit_date)
>        xlist.sort(key=lambda x: max([com.submit_date for com in x]),  
>    reverse=True)
>        return xlist
>    
>    I'd appreciate any hints!
>    
>    Thanks,
>    Eric
>    



More information about the Tutor mailing list