Large amount of files to parse/organize, tips on algorithm?

cnb circularfunc at
Tue Sep 2 20:02:03 CEST 2008

On Sep 2, 7:06 pm, Steven D'Aprano <st... at REMOVE-THIS-> wrote:
> On Tue, 02 Sep 2008 09:48:32 -0700, cnb wrote:
> > I have a bunch of files consisting of moviereviews.
> > For each file I construct a list of reviews and then for each new file I
> > merge the reviews so that in the end have a list of reviewers and for
> > each reviewer all their reviews.
> > What is the fastest way to do this?
> Use the timeit module to find out.
> > 1. Create one file with reviews, open next file an for each review see
> > if the reviewer exists, then add the review else create new reviewer.
> > 2. create all the separate files with reviews then mergesort them?
> The answer will depend on whether you have three reviews or three
> million, whether each review is twenty words or twenty thousand words,
> and whether you have to do the merging once only or over and over again.
> --
> Steven

I merge once. each review has 3 fields, date rating customerid. in
total ill be parsing between 10K and 100K, eventually 450K reviews.

More information about the Python-list mailing list