Large amount of files to parse/organize, tips on algorithm?

Tue Sep 2 20:02:03 CEST 2008

> On Tue, 02 Sep 2008 09:48:32 -0700, cnb wrote:
> > I have a bunch of files consisting of moviereviews.
> > For each file I construct a list of reviews and then for each new file I
> > merge the reviews so that in the end have a list of reviewers and for
> > each reviewer all their reviews.
> > What is the fastest way to do this?
> Use the timeit module to find out.
> > 1. Create one file with reviews, open next file an for each review see
> > if the reviewer exists, then add the review else create new reviewer.
> > 2. create all the separate files with reviews then mergesort them?
> The answer will depend on whether you have three reviews or three
> million, whether each review is twenty words or twenty thousand words,
> and whether you have to do the merging once only or over and over again.
I merge once. each review has 3 fields, date rating customerid. in
total ill be parsing between 10K and 100K, eventually 450K reviews.

