About Databases...

EP EP at zomething.com
Sat Mar 12 14:08:33 CET 2005

andrea_gavana at tin.it wrote:

> Hello NG,


>    I am still quite a newbie with Python (I intensely use wxPython, 
> anyway).
> I would like to know what are, in your opinions, the best/faster 
> databases
> that I could use in Python (and, of course, I should be able to "link" 
> everything
> with a wxPython GUI)? Specifically, I work on Reservoir Simulation, and
> usually I have to store a discrete/huge amount of data (it depends on 
> the
> oil field). As you may have understood, I know almost NOTHING about 
> databases
> ;-)

"... use in Python..."

If you have vast amounts of data you will probably benefit by hooking into a RDBMS (aka relational database) that is "outside" Python.  This is easy to do in Python if you use a module which already exists for that purpose.

Dedicated RDBMS can perform operations very fast - they are specialized for that purpose - however, we can make them run quite slow if we use them poorly, especially on large amounts of data.  There is a lot of piece-meal information online about how to best use databases (either specific RDBMS or in general) but some theorectical background helps as well.  I have "Database Design Using Entity-Relationship Diagrams" on my shelf.  Also, I think Dobbs had one or more articles about (mis)using databases recently.

With RDBMS you have another language to learn: SQL.

At the risk of being stoned, let me say that SQL is like talking pidgin, which is not to imply you can get away with speaking it imprecisely.

[side note, look up programming language namesake "LUA" here:  http://www.extreme-hawaii.com/pidgin/vocab/ to understand why it may not be more popular with hackers in Hawaii]

A nominal starting point might be to install MySQL (a relational database which is open source "free", fast, and very widely used on the Internet) and use the MySQLDB in Python to access it.  From there you can decide if you need more or less of something.

In my experience MySQL is fast and can handle very large datasets, but there seem to be plenty of other good database options.  Oracle seems to be the dominant commercial RDBMS, and may be a good choice if you are offered the option of using it.  The Oracle universe is very commercial, however - it's big business.

There are database approaches you could take directly within Python, including storing your data (in binary or text form) as files under your OS.  However with that approach you will lose or have to re-create a relational query system, and I have reservations that the file access is going to be as optimized (speed wise).

BTW, AFAIK, Python or any RDBMS is going to ride on top of your OS, which means that if you are trying to do very, very large data transactions at high speeds, the OS can be the pacing code.  If you ever find yourself backed up against that wall, there are commercial systems (OS) designed to provide greater data access speed (that is a simple, data access purposed OS, so that you do not need to run on Unix/Linux/__ix or on Windows or on another multi-purpose-higher-overhead OS).

More information about the Python-list mailing list