Large amount of files to parse/organize, tips on algorithm?

Paul Rubin http
Tue Sep 2 20:28:39 CEST 2008

cnb <circularfunc at> writes:
> For each file I construct a list of reviews and then for each new file
> I merge the reviews so that in the end have a list of reviewers and
> for each reviewer all their reviews.
> What is the fastest way to do this?

Scan through all the files sequentially, emitting records like

(movie, reviewer, review)

Then use an external sort utility to sort/merge that output file
on each of the 3 columns.  Beats writing code.

More information about the Python-list mailing list