writing to file very slow

Thu Mar 27 01:12:12 EST 2003

sjmachin at lexicon.net (John Machin) writes:

> Below I have rewritten the whole function, using list comprehensions
> and avoiding range() as much as possible. Together with some cosmetics
> like spaces around "=" and more than 2 chars of indent, plus caching
> some attribute look-ups, it should now be fast enough and legible
> enough -- even looks like it was written in a HLL like Python :-)

This is very good, but I think it can be just a little bit better.
There's actually no need for any range() or xrange() at all.  We want
to iterate through the lists and tuples, not subscript them.  The
listfields() method returns a list of field names, so fieldname()
isn't needed either.  I changed the x_result variable name to row,
since we might as well call a row a row (and row is easier to type).
The str() is needed if we are to join() the results since the DB api
maps SQL types to Python types.  You can expect to see numbers and
strings at the minimum, and possibly booleans and dates/times.  This
is an example where map() is shorter than list comprehension.

def fichier_resultats(results):
    """Write DB rows in results to a |-delimited tempfile and return
the filename."""
    tfilename = tempfile.mktemp('rec.txt')
    f = open(tfilename,'w')

    bar_join = '|'.join

    # Write the field names first.
    f.write(bar_join(results.listfields()))
    f.write('\n')

    # results.getresult() is a list of tuples.  Each tuple is a DB row.
    allrows = results.getresult()
    for row in allrows:
        f.write(bar_join(map(str, row)))
        f.write('\n')

    f.close()
    return(tfilename)

If this is still too slow it may be because the entire result is
stored as a list of tuples.  Memory usage depends on the length of the
rows, but as an example 4000 rows * 4000 bytes/row would be almost
16M.  (Actually 16M is quite reasonable for most modern computers, but
of course it could be much larger.)

Instead of using the low level _pg module the original poster could
consider using a different Python Postgresql interface.  The _pg
module also comes with pgdb which provides the Python DB-API 2.0
interface which is slightly higher level and easer to use.  It allows
fetching the results in smaller chunks if required.  (See
http://www.python.org/topics/database/modules.html.)