Processing huge datasets

Thomas Guettler guettli at thomas-guettler.de
Mon May 10 10:36:48 EDT 2004


Am Mon, 10 May 2004 12:00:03 +0000 schrieb Anders Søndergaard:

> Hi,
> 
> I'm trying to process a large filesystem (+20 million files) and keep the
> directories along with summarized information about the files (sizes,
> modification times, newest file and the like) in an instance hierarchy
> in memory. I read the information from a Berkeley Database.
> 
> I'm keeping it in a Left-Child-Right-Sibling instance structure, that I
> operate on recursively.
> 
> First I banged my head on the recursion limit, which could luckily be
> adjusted.
> Now I simply get MemoryError.
> 
> Is there a clever way of processing huge datasets in Python?
> How would a smart Python programmer advance the problem?

Hi Anders,

I use ZODB. 
http://zope.org/Wikis/ZODB/FrontPage/guide/index.html

 HTH,
  Thomas




More information about the Python-list mailing list