huge table question

Steve Holden sholden at holdenweb.com
Thu Sep 7 10:52:56 EDT 2000


Bill Tolbert wrote:
> 
> I've got a table in MS Access with 3.5 million records. First, I know we
> should be using something else, but it isn't my call. I need to extract
> newline characters from one of the fields and write it to a tab
> delimited text file.
> 
By which I presume you mean produce tab-delimited lines in a text file
from a newline-delimited database field?

> My problem is getting to the data. I thought I'd fetch a cursor and
> whack away at it. However, just getting the cursor is taking an
> extremely long time. Access will write the data to a 237meg text file in
> about 8 minutes. So far, on the same machine, Python has been at this
> for about 2 hours.
> 
Well, some code might be nice if you want helpful help!  Otherwise it's
difficult to spot the particular inefficiency that's causing your
problem.

Are you using ODBC, mcODBC or some other means of Access access (sorry)?

However, taking a naiive approach using fetchall() might result in the
complete cursor contents being loaded into memory before processing,
which might lead to the situation you describe.  By default fetchall()
will use the default cursor arraysize value of 1, which probably
isn't going to be too good.  Try adjusting this read-write property
upwards, possibly testing with a subset of your data to see if the
efficiency goes up.

Alternatively, split the task up into repeated calls to fetchmany(size=X)
where X is something large.  This overrides the cursor's arraysize.

> Any better ideas? Ways of speeding it up? Is it the jet engine or the
> odbc drivers or just too much data...?
> 
> Thanks,
> 
> Bill
> 
I'm tempted to ask, if you can do this in Access why use Python at all.
Just preparing for the day you move to a more capable databse engine?

> Sent via Deja.com http://www.deja.com/
> Before you buy.

HTH

regards
 Steve
-- 
Helping people meet their information needs with training and technology.
703 967 0887      sholden at bellatlantic.net      http://www.holdenweb.com/



More information about the Python-list mailing list