Using Python for processing of large datasets (convincing managment)

William Park opengeometry at NOSPAM.yahoo.ca
Sun Jul 7 18:47:14 EDT 2002


Thomas Jensen <spam at ob_scure.dk> wrote:
> We already have 2 DB Servers, a master replicating changes to a slave.
> Our analysis shows that most database operations are/will be SELECTs.
> Adding more DB servers is trivial, especially if we migrate to MySQL
> (well, cheaper at least :-)

As I and others have said, deal with algorithm issues first.  Especially,
since you already have something that is working.

It may be that you are getting killed by overheads.  For example, if your
situation goes something like
    Given table of (a, b, x, y, z), 
	select a=1, b=1; then do something with x, y, z; insert it back.
	select a=1, b=2; then do something with x, y, z; insert it back.
	...
	select a=2, b=1; then do something with x, y, z; insert it back.
	select a=2, b=2; then do something with x, y, z; insert it back.
	...
	(1 million lines)
Then, you have
    1e6 x 2 x (connect time, search time, load time, disconnect time)
    
Can you dump the whole thing as text file in one-shot, do whatever with
(a,b,x,y,z), and load it back in one-shot?

-- 
William Park, Open Geometry Consulting, <opengeometry at yahoo.ca>
8-CPU Cluster, Hosting, NAS, Linux, LaTeX, python, vim, mutt, tin



More information about the Python-list mailing list