emulating read and readline methods

Sean Davis seandavi at gmail.com
Thu Sep 11 10:23:00 CEST 2008


On Sep 10, 7:54 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Sep 11, 8:01 am, MRAB <goo... at mrabarnett.plus.com> wrote:
>
>
>
> > On Sep 10, 6:59 pm, Sean Davis <seand... at gmail.com> wrote:
>
> > > I have a large file that I would like to transform and then feed to a
> > > function (psycopg2 copy_from) that expects a file-like object (needs
> > > read and readline methods).
>
> > > I have a class like so:
>
> > > class GeneInfo():
> > >     def __init__(self):
> > >         #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
> > > gene_info.gz',"/tmp/gene_info.gz")
> > >         self.fh = gzip.open("/tmp/gene_info.gz")
> > >         self.fh.readline() #deal with header line
>
> > >     def _read(self,n=1):
> > >         for line in self.fh:
> > >             if line=='':
> > >                 break
> > >             line=line.strip()
> > >             line=re.sub("\t-","\t",line)
> > >             rowvals = line.split("\t")
> > >             yield "\t".join([rowvals[i] for i in
> > > [0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"
>
> > >     def readline(self,n=1):
> > >         return self._read().next()
>
> > >     def read(self,n=1):
> > >         return self._read().next()
>
> > Each time readline() and read() call self._read() they are creating a
> > new generator. They then get one value from the newly-created
> > generator and then discard that generator. What you should do is
> > create the generator in __init__ and then use it in readline() and
> > read().
>
> > >     def close(self):
> > >         self.fh.close()
>
> > > and I use it like so:
>
> > > a=GeneInfo()
> > > cur.copy_from(a,"gene_info")
> > > a.close()
>
> > > It works well except that the end of file is not caught by copy_from.
> > > I get errors like:
>
> > > psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
> > > during .read() call
> > > CONTEXT:  COPY gene_info, line 1000: ""
>
> > > for a 1000 line test file.  Any ideas what is going on?
>
> > I wonder whether it's expecting readline() and read() to return an
> > empty string at the end of the file instead of raising StopIteration.
>
> Don't wonder; ReadTheFantasticManual:
>
> read( [size])
>
> ... An empty string is returned when EOF is encountered
> immediately. ...
>
> readline( [size])
>
>  ... An empty string is returned only when EOF is encountered
> immediately.


Thanks.  This was indeed my problem--not reading the manual closely
enough.

And the points about the iterator being re-instantiated were also
right on point.  Interestingly, in this case, the code was working
because read() and readline() were still returning the next line each
time since the file handle was being read one line at a time.

Sean



More information about the Python-list mailing list