Record seperator
Roy Smith
roy at panix.com
Sat Aug 27 13:45:31 EDT 2011
In article <4e592852$0$29965$c3e8da3$5496439d at news.astraweb.com>,
Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
> open("file.txt") # opens the file
> .read() # reads the contents of the file
> .split("\n\n") # splits the text on double-newlines.
The biggest problem with this code is that read() slurps the entire file
into a string. That's fine for moderately sized files, but will fail
(or at least be grossly inefficient) for very large files.
It's always annoyed me a little that while it's easy to iterate over the
lines of a file, it's more complicated to iterate over a file character
by character. You could write your own generator to do that:
for c in getchar(open("file.txt")):
whatever
def getchar(f):
for line in f:
for c in line:
yield c
but that's annoyingly verbose (and probably not hugely efficient).
Of course, the next problem for the specific problem at hand is that
even with an iterator over the characters of a file, split() only works
on strings. It would be nice to have a version of split which took an
iterable and returned an iterator over the split components. Maybe
there is such a thing and I'm just missing it?
More information about the Python-list
mailing list