emulating read and readline methods

Sean Davis seandavi at gmail.com
Wed Sep 10 19:59:48 CEST 2008


I have a large file that I would like to transform and then feed to a
function (psycopg2 copy_from) that expects a file-like object (needs
read and readline methods).

I have a class like so:

class GeneInfo():
    def __init__(self):
        #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
gene_info.gz',"/tmp/gene_info.gz")
        self.fh = gzip.open("/tmp/gene_info.gz")
        self.fh.readline() #deal with header line

    def _read(self,n=1):
        for line in self.fh:
            if line=='':
                break
            line=line.strip()
            line=re.sub("\t-","\t",line)
            rowvals = line.split("\t")
            yield "\t".join([rowvals[i] for i in
[0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"

    def readline(self,n=1):
        return self._read().next()

    def read(self,n=1):
        return self._read().next()

    def close(self):
        self.fh.close()

and I use it like so:

a=GeneInfo()
cur.copy_from(a,"gene_info")
a.close()

It works well except that the end of file is not caught by copy_from.
I get errors like:

psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
during .read() call
CONTEXT:  COPY gene_info, line 1000: ""

for a 1000 line test file.  Any ideas what is going on?

Thanks,
Sean



More information about the Python-list mailing list