[Tutor] Optimal solution in dealing with huge databases in python

johnf jfabiani at yolo.com
Thu Jan 25 19:20:36 CET 2007


On Wednesday 24 January 2007 19:22, Shadab Sayani wrote:
> Hi,
>   I am working in a biodatabases project.The data I need to deal with is 
> in 100s of GB.I am using postgresql backend and SQLALCHEMY ORM.I need  to
> read the bio datafiles and parse them and then store them in  database.I am
> in the process of storing them. I used the session,flush concept in
> SQLALCHEMY.Initially I used to  flush every query immediately.Later I
> realised that the queries are  independent of each other and so started
> flushing 3-5 Lakh insert  queries at a time.This increased the response
> time.But the memory is  overflowing.Then I released not-in-use memory using
> del command in  python still there is no use as this del statement can only
> free part  of memory.I need to increase the above 3-5 lakh number to a much
> large  one to get a real time response.Other wise my estimation is it will 
> take 1 year to just insert the data into the database.From postgresql  side
> also I turned off WAL. Please suggest some viable solution to handle such
> enormous data from  python.Is there a better solution than SQL alchemy?Any
> solution that  speeds up my program is highly appreciated. Thanks and
> Regards,
>   Shadab.
>
>  Send instant messages to your online friends http://uk.messenger.yahoo.com

To tell the truth I know nothing about SQLALCHEMY ORM.  But I do know how to 
load large amounts of data into Postgres.  You use the 'copy ... from' 
command.  I once had 2 gigs of data and was surprised how fast it added the 
data.  Two things to remember:
1.  no indexes
2.  you have to format the data correctly - see Postgres DOC's

BTW I use psycopg for my connection
-- 
John Fabiani


More information about the Tutor mailing list