space-efficient top-N algorithm
Christos TZOTZIOY Georgiou
DLNXPEGFQVEB at spammotel.com
Sun Feb 9 08:43:28 EST 2003
On Sun, 09 Feb 2003 18:56:06, rumours say that David Garamond
<lists at zara.6.isreserved.com> might have written:
>Is there a more space-efficient algorithm to produce a list of top N
>recurring items from a large collection? I've thought about sorting the
>large log files externally, but reluctant to do this because of the
>large size of the files and the need to frequently produce reports while
>new logs arrive.
A solution using heapq (for ordering while reading) + sets (for
uniqueness) might be appropriate. You just have to del heap[N:] once in
a while to avoid excess memory consumption.
--
TZOTZIOY, I speak England very best,
Real email address: 'dHpvdEBzaWwtdGVjLmdy\n'.decode('base64')
More information about the Python-list
mailing list