[Tutor] sorting objects on two attributes
Michael H. Goldwasser
goldwamh at slu.edu
Mon Mar 3 17:56:54 CET 2008
Hello Eric,
Your basic outlook is fine, but you can do it much more efficiently
with a single sort. Here's the way I'd approach the task (untested):
# --------------------------------------------------
# first compute the latest date for each id group; uses O(n) time
newest = {}
for q in queryset:
id,date = q.object_id, q.submit_date
if id not in newest or date > newest[id]:
newest[id] = date
# now sort based on the following decorator as a key
data.sort(reverse=True, key=lambda x: (newest[x.object_id], x.object_id, x.submit_date))
# --------------------------------------------------
In essence, you compute the max date within each group, but your
approach (i.e., building explicit sublists and then repeatedly
calling max on those sublists) is far more time-consuming than the
above dictionary based approach.
Note well that using a tuple as a decorator key is more efficient
than calling sort separately for each subgroup. The
lexicographical order of the following
(newest[x.object_id], x.object_id, x.submit_date)
should produce the order that you desire. The reason for the middle
entry is to ensure that items groups by object_id in the case that
two different groups achieve the same maximum date. It wasn't clear
to me whether you wanted elements within groups from oldest to
newest or newest to oldest. I believe that the code I give produces
the ordering that you intend, but you may adjust the sign of the
decorate elements if necessary.
With regard,
Michael
+-----------------------------------------------
| Michael Goldwasser
| Associate Professor
| Dept. Mathematics and Computer Science
| Saint Louis University
| 220 North Grand Blvd.
| St. Louis, MO 63103-2007
On Monday March 3, 2008, Eric Abrahamsen wrote:
> I have a grisly little sorting problem to which I've hacked together a
> solution, but I'm hoping someone here might have a better suggestion.
>
> I have a list of objects, each of which has two attributes, object_id
> and submit_date. What I want is to sort them by content_type, then by
> submit_date within content_type, and then sort each content_type block
> according to which block has the newest object by submit_date. (This
> sequence of sorting might not be optimal, I'm not sure). I'm actually
> creating a list of recent comments on blog entries for a python-based
> web framework, and want to arrange the comments according to blog
> entry (content_type), by submit_date within that entry, with the
> entries with the newest comments showing up on top.
>
> I don't believe a single cmp function fed to list.sort() can do this,
> because you can't know how two objects should be compared until you
> know all the values for all the objects. I'd be happy to be proven
> wrong here.
>
> After some false starts with dictionaries, here's what I've got.
> Queryset is the original list of comments (I'm doing this in django),
> and it returns a list of lists, though I might flatten it afterwards.
> It works, but it's ghastly unreadable and if there were a more
> graceful solution I'd feel better about life in general:
>
>
> def make_com_list(queryset):
> ids = set([com.object_id for com in queryset])
> xlist = [[com for com in queryset if com.object_id == i] for i in
> ids]
> for ls in xlist:
> ls.sort(key=lambda x: x.submit_date)
> xlist.sort(key=lambda x: max([com.submit_date for com in x]),
> reverse=True)
> return xlist
>
> I'd appreciate any hints!
>
> Thanks,
> Eric
>
More information about the Tutor
mailing list