[Tutor] Getting total counts (Steven D'Aprano)

Alan Gauld alan.gauld at btinternet.com
Sun Oct 3 02:04:30 CEST 2010

<aeneas24 at priest.com> wrote

> I can get the code you wrote to work on my toy data, but my real
> input data is actually contained in 10 files that are about 1.5 GB
> each--when I try to run the code on one of those files, everything 
> freezes.

Fot those kind of volumes I'd go for a SQL database every time!
(SQLlite might be OK but I'd be tempted to go to something even
beefier, like MySQL, PostGres or Firebird)

> To solve this, I tried just having the data write to a different csv 
> file:

For huge data volumes sequential files like csv are always going
to be slow. You need random access, and a full blown database
will probably be the best bet IMHO.

> But my guess is that converting from one CSV to another isn't
> going to be as efficient as creating a shelve database.

A shelve is fine for very simple lookups but its still basically
a flat file. And the minute you need to access by anything
other than the single key you are back to sequential processing.


Alan Gauld
Author of the Learn to Program web site

More information about the Tutor mailing list