[Tutor] Optimal solution in dealing with huge databases inpython
Alan Gauld
alan.gauld at btinternet.com
Thu Jan 25 18:23:48 CET 2007
"Shadab Sayani" <shadabsayani at yahoo.com> wrote
> Thank you very much for immediate response.I didnt get
> the point of loading the data using SQL.What does that
> mean?
It means writing a SQL file that can then be run from
the database SQL prompt. I don't know what that means
in PostGres terms since I use Oracle mainly. But in Oracle
it would involve typing
$ sqlplus loadfile.sql
And sqlplus would execute the SQL commands in loadfile.sql.
Loadfile.sql would in turn contain lots of SQL commands like:
INSERT into CUSTOMER
VALUES ( 'fred', 'bloggs', 45, '01773-987321');
One thing to note is that due to caching issues you might find
it works better if you keep the individual loader files fairly small
- say 1000 or so records each. On other databases (DB2 for
example) very big data files seem to be faster, it just depends on
how the internal SQL engine works.
> Does it mean Bulk file load.If so then there will
> I/O performance hit right.
There will aleays be a big I/O load if you are loading data
into a database, at some stage it must get onto the disk!
> Writing the parsed tokens in the csv format in a file.
I would use Python to translate the CSV data into SQL
command like the INSERT statement above.
> Do have any idea about the C api for Postgresql and
> some documentation to use it?
Nope. I don't use Postgres.
There are usually two levels of API, the one that
allows you to write SQL commands from C. That's not
much advantage over the Pytthon DBAPI, and the one
that bypasses SQL to allow you to access the raw data
on the disk, thats the fast one. But it does require a good
knowledge of how the underlying data structuires are
stored within the files. Oracle, DB2, Informix etc will have
that documented somlewhere but open source databases
may require you to read the source! As I say its fast,
but very, very risky.
Alan G.
More information about the Tutor
mailing list