Canonical way of dealing with null-separated lines?

John Machin sjmachin at lexicon.net
Tue Mar 1 03:02:18 CET 2005


Douglas Alan wrote:
> I wrote:
>
> > Oops, I just realized that my previously definitive version did not
> > handle multi-character newlines.  So here is a new definitive
> > version.  Oog, now my brain hurts:
>
> I dunno what I was thinking.  That version sucked!  Here's a version
> that's actually comprehensible, a fraction of the size, and works in
> all cases.  (I think.)
>
> def fileLineIter(inputFile, newline='\n', leaveNewline=False,
readSize=8192):
>    """Like the normal file iter but you can set what string indicates
newline.
>
>    The newline string can be arbitrarily long; it need not be
restricted to a
>    single character. You can also set the read size and control
whether or not
>    the newline string is left on the end of the iterated lines.
Setting
>    newline to '\0' is particularly good for use with an input file
created with
>    something like "os.popen('find -print0')".
>    """
>    outputLineEnd = ("", newline)[leaveNewline]
>    partialLine = ''
>    while True:
>        charsJustRead = inputFile.read(readSize)
>        if not charsJustRead: break
>        lines = (partialLine + charsJustRead).split(newline)

The above line is prepending a short string to what will typically be a
whole buffer full. There's gotta be a better way to do it. Perhaps you
might like to refer back to CdV's solution which was prepending the
residue to the first element of the split() result.

>        partialLine = lines.pop()
>        for line in lines: yield line + outputLineEnd

In the case of leaveNewline being false, you are concatenating an empty
string. IMHO, to quote Jon Bentley, one should "do nothing gracefully".


>    if partialLine: yield partialLine
> 
> |>oug




More information about the Python-list mailing list