How to change a generator ? - resolved
Barak, Ron
Ron.Barak at lsi.com
Thu Dec 25 02:27:23 EST 2008
Hi Gabriel,
Your remarks fixed my problem. Now my code looks as below, and behaves as expected.
Thanks Gabriel.
Merry Christmas and Happy Hanukkah,
Ron.
$ cat generator.py
#!/usr/bin/env python
import gzip
from Debug import _line as line
class LogStream():
def __init__(self, filename):
self.filename = filename
self.input_file = self.open_file(filename)
def open_file(self, in_file):
try:
f = gzip.GzipFile(in_file, "r")
f.readline()
except IOError:
f = open(in_file, "r")
f.readline()
f.seek(0)
return(f)
def line_generator(self):
print line()+". self.input_file.tell()==",self.input_file.tell()
while True:
line_ = self.input_file.readline()
print line()+". self.input_file.tell()==",self.input_file.tell()
if not line_:
break
yield line_.strip()
if __name__ == "__main__":
filename = "sac.log.50lines"
log_stream = LogStream(filename)
log_stream.input_file.seek(0)
line_generator = log_stream.line_generator()
line_ = line_generator.next()
$ python generator.py
23. self.input_file.tell()== 0
26. self.input_file.tell()== 247
$ !wc
wc -c sac.log.50lines
6623 sac.log.50lines
$
-----Original Message-----
From: MRAB [mailto:google at mrabarnett.plus.com]
Sent: Wednesday, December 24, 2008 20:00
To: python-list at python.org
Subject: Re: How to change a generator ?
Gabriel Genellina wrote:
> En Wed, 24 Dec 2008 15:03:58 -0200, MRAB <google at mrabarnett.plus.com>
> escribió:
>
>>> I have a generator whose aim is to returns consecutive lines from a
>>> file (the listing below is a simplified version).
>>> However, as it is written now, the generator method changes the text
>>> file pointer to end of file after first invocation.
>>> Namely, the file pointer changes from 0 to 6623 on line 24.
>>>
>> It might be that the generator method of self.input_file is reading
>> the file a chunk at a time for efficiency even though it's yielding a
>> line at a time.
>
> I think this is the case too.
> I can think of 3 alternatives:
>
> a) open the file unbuffered (bufsize=0). But I think this would
> greatly decrease performance.
>
> b) keep track internally of file position (by adding each line length).
> The file should be opened in binary mode in this case (to avoid any '\n'
> translation).
>
> c) return line numbers only, instead of file positions. Seeking to a
> certain line number requires to re-read the whole file from start;
> depending on how often this is required, and how big is the file, this
> might be acceptable.
>
readline() appears to work as expected, leaving the file position at the start of the next line.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20081225/002e0027/attachment-0001.html>
More information about the Python-list
mailing list