emulating read and readline methods

MRAB google at mrabarnett.plus.com
Thu Sep 11 11:36:28 EDT 2008


On Sep 11, 9:23 am, Sean Davis <seand... at gmail.com> wrote:
> On Sep 10, 7:54 pm, John Machin <sjmac... at lexicon.net> wrote:
>
>
>
> > On Sep 11, 8:01 am, MRAB <goo... at mrabarnett.plus.com> wrote:
>
> > > On Sep 10, 6:59 pm, Sean Davis <seand... at gmail.com> wrote:
>
> > > > I have a large file that I would like to transform and then feed to a
> > > > function (psycopg2 copy_from) that expects a file-like object (needs
> > > > read and readline methods).
>
> > > > I have a class like so:
>
> > > > class GeneInfo():
> > > >     def __init__(self):
> > > >         #urllib.urlretrieve('ftp://ftp.ncbi.nih.gov/gene/DATA/
> > > > gene_info.gz',"/tmp/gene_info.gz")
> > > >         self.fh = gzip.open("/tmp/gene_info.gz")
> > > >         self.fh.readline() #deal with header line
>
> > > >     def _read(self,n=1):
> > > >         for line in self.fh:
> > > >             if line=='':
> > > >                 break
> > > >             line=line.strip()
> > > >             line=re.sub("\t-","\t",line)
> > > >             rowvals = line.split("\t")
> > > >             yield "\t".join([rowvals[i] for i in
> > > > [0,1,2,3,6,7,8,9,10,11,12,14]]) + "\n"
>
> > > >     def readline(self,n=1):
> > > >         return self._read().next()
>
> > > >     def read(self,n=1):
> > > >         return self._read().next()
>
> > > Each time readline() and read() call self._read() they are creating a
> > > new generator. They then get one value from the newly-created
> > > generator and then discard that generator. What you should do is
> > > create the generator in __init__ and then use it in readline() and
> > > read().
>
> > > >     def close(self):
> > > >         self.fh.close()
>
> > > > and I use it like so:
>
> > > > a=GeneInfo()
> > > > cur.copy_from(a,"gene_info")
> > > > a.close()
>
> > > > It works well except that the end of file is not caught by copy_from.
> > > > I get errors like:
>
> > > > psycopg2.extensions.QueryCanceledError: COPY from stdin failed: error
> > > > during .read() call
> > > > CONTEXT:  COPY gene_info, line 1000: ""
>
> > > > for a 1000 line test file.  Any ideas what is going on?
>
> > > I wonder whether it's expecting readline() and read() to return an
> > > empty string at the end of the file instead of raising StopIteration.
>
> > Don't wonder; ReadTheFantasticManual:
>
> > read( [size])
>
> > ... An empty string is returned when EOF is encountered
> > immediately. ...
>
> > readline( [size])
>
> >  ... An empty string is returned only when EOF is encountered
> > immediately.
>
> Thanks.  This was indeed my problem--not reading the manual closely
> enough.
>
> And the points about the iterator being re-instantiated were also
> right on point.  Interestingly, in this case, the code was working
> because read() and readline() were still returning the next line each
> time since the file handle was being read one line at a time.
>
After further thought, do you actually need a generator? read() and
readline() could just call _read(), which would read a line from the
file and return the result or an empty string. Or the processing could
be done in readline() and read() just could call readline().



More information about the Python-list mailing list