python and very large data sets???

David Bolen db3l at fitlinxx.com
Thu Apr 25 16:54:08 EDT 2002


holger krekel <pyth at devel.trillke.net> writes:

> On Thu, Apr 25, 2002 at 07:29:27AM -0700, Rad wrote:
(...)
> > having been learning Python in the last few weeks
> > I kind of like it + I was encouraged by your posts and I'm now pretty
> > convinced that Python2.2 is the way to go. 
> 
> Be careful. Some people pointed out that in the end it might
> be a task which a database can handle more reliably. 

It's likely not an either/or scenario - even if a database is used
there will probably be some Python code driving its use.

I'm not sure I follow all of the database suggestions though.  While
true, there's some correlation needed amongst the data, the
requirements really seem to point to largely linear scans of the data
followed by some reporting.  Just loading the data into a database is
unlikely to prove that much better than just scanning the data in the
first place.

However, I suppose what might prove useful is to scan the data and
only insert into the database the small portion (perhaps keys needed
for searching/joining and an filename/offset pointer to the disk
record) so you can then use the natural joining operation of the
database for queries.  But given the 2GB of memory on the machine in
question I have a feeling that tracking the keys during processing
right in Python is still very viable.

--
-- David
-- 
/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/



More information about the Python-list mailing list