Canonical way of dealing with null-separated lines?
Douglas Alan
nessus at mit.edu
Sat Feb 26 18:07:39 EST 2005
I wrote:
> Okay, here's the definitive version (or so say I). Some good doobie
> please make sure it makes its way into the standard library:
Oops, I just realized that my previously definitive version did not
handle multi-character newlines. So here is a new definition
version. Oog, now my brain hurts:
def fileLineIter(inputFile, newline='\n', leaveNewline=False, readSize=8192):
"""Like the normal file iter but you can set what string indicates newline.
The newline string can be arbitrarily long; it need not be restricted to a
single character. You can also set the read size and control whether or not
the newline string is left on the end of the iterated lines. Setting
newline to '\0' is particularly good for use with an input file created with
something like "os.popen('find -print0')".
"""
isNewlineMultiChar = len(newline) > 1
outputLineEnd = ("", newline)[leaveNewline]
# 'partialLine' is a list of strings to be concatinated later:
partialLine = []
# Because read() might unfortunately split across our newline string, we
# have to regularly check to see if the newline string appears in what we
# previously thought was only a partial line. We do so with this generator:
def linesInPartialLine():
if isNewlineMultiChar:
linesInPartialLine = "".join(partialLine).split(newline)
if linesInPartialLine > 1:
partialLine[:] = [linesInPartialLine.pop()]
for line in linesInPartialLine:
yield line + outputLineEnd
while True:
charsJustRead = inputFile.read(readSize)
if not charsJustRead: break
lines = charsJustRead.split(newline)
if len(lines) > 1:
for line in linesInPartialLine(): yield line
partialLine.append(lines[0])
lines[0] = "".join(partialLine)
partialLine[:] = [lines.pop()]
else:
partialLine.append(lines.pop())
for line in linesInPartialLine(): yield line
for line in lines: yield line + outputLineEnd
for line in linesInPartialLine(): yield line
if partialLine and partialLine[-1] != '':
yield "".join(partialLine)
|>oug
More information about the Python-list
mailing list