[Tutor] Getting total counts (Steven D'Aprano)

Alan Gauld alan.gauld at btinternet.com
Sun Oct 3 02:04:30 CEST 2010


<aeneas24 at priest.com> wrote

> I can get the code you wrote to work on my toy data, but my real
> input data is actually contained in 10 files that are about 1.5 GB
> each--when I try to run the code on one of those files, everything 
> freezes.

Fot those kind of volumes I'd go for a SQL database every time!
(SQLlite might be OK but I'd be tempted to go to something even
beefier, like MySQL, PostGres or Firebird)

> To solve this, I tried just having the data write to a different csv 
> file:

For huge data volumes sequential files like csv are always going
to be slow. You need random access, and a full blown database
will probably be the best bet IMHO.

> But my guess is that converting from one CSV to another isn't
> going to be as efficient as creating a shelve database.

A shelve is fine for very simple lookups but its still basically
a flat file. And the minute you need to access by anything
other than the single key you are back to sequential processing.

HTH,

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/




More information about the Tutor mailing list