[Tutor] Optimal solution in dealing with huge databases inpython

Alan Gauld alan.gauld at btinternet.com
Fri Jan 26 09:40:40 CET 2007


"Shadab Sayani" <shadabsayani at yahoo.com> wrote
>  
> I got your point.But before inserting data I need to store it 
> into  a file in a format supported by postgresql.Wont this 
> operation incur a  performance hit as it includes writing 
> to a file which is on disk?

Unless your data is already in a format the database 
understands you will have to reformat it before loading it.
There are basically two options:
1) read the unformatted data piece by piece, reformat 
it and load it to the database item by item.
2) read the unformatted data and write it to an 
intermediate file in a format supported by the 
database, then load the formatted data in bulk.

The second approach is nearly always faster than 
the first for large data sets. This is due to many things
including transactional overheads in the first approach,
caching issues, availability of bulk optimisations in 
the database itself, etc.

Writing to a flat file is much faster than writing to a 
database. Reformatting data is a complex business. 
Python is good at complex processing and writing 
to flat files. SQL is good at writing to databases but poor 
at complex processing. So use Python for its 
strengths and SQL for its strengths and you get 
optimal results.

HTH,

Alan G




More information about the Tutor mailing list