[Tutor] Optimal solution in dealing with huge databases in python
johnf
jfabiani at yolo.com
Thu Jan 25 19:20:36 CET 2007
On Wednesday 24 January 2007 19:22, Shadab Sayani wrote:
> Hi,
> I am working in a biodatabases project.The data I need to deal with is
> in 100s of GB.I am using postgresql backend and SQLALCHEMY ORM.I need to
> read the bio datafiles and parse them and then store them in database.I am
> in the process of storing them. I used the session,flush concept in
> SQLALCHEMY.Initially I used to flush every query immediately.Later I
> realised that the queries are independent of each other and so started
> flushing 3-5 Lakh insert queries at a time.This increased the response
> time.But the memory is overflowing.Then I released not-in-use memory using
> del command in python still there is no use as this del statement can only
> free part of memory.I need to increase the above 3-5 lakh number to a much
> large one to get a real time response.Other wise my estimation is it will
> take 1 year to just insert the data into the database.From postgresql side
> also I turned off WAL. Please suggest some viable solution to handle such
> enormous data from python.Is there a better solution than SQL alchemy?Any
> solution that speeds up my program is highly appreciated. Thanks and
> Regards,
> Shadab.
>
> Send instant messages to your online friends http://uk.messenger.yahoo.com
To tell the truth I know nothing about SQLALCHEMY ORM. But I do know how to
load large amounts of data into Postgres. You use the 'copy ... from'
command. I once had 2 gigs of data and was surprised how fast it added the
data. Two things to remember:
1. no indexes
2. you have to format the data correctly - see Postgres DOC's
BTW I use psycopg for my connection
--
John Fabiani
More information about the Tutor
mailing list