Canonical way of dealing with null-separated lines?
sjmachin at lexicon.net
Wed Mar 2 01:10:10 CET 2005
Douglas Alan wrote:
> "John Machin" <sjmachin at lexicon.net> writes:
> >> lines = (partialLine + charsJustRead).split(newline)
> > The above line is prepending a short string to what will typically
> > whole buffer full. There's gotta be a better way to do it.
> If there is, I'm all ears. In a previous post I provided code that
> doesn't concatinate any strings together until the last possible
> moment (i.e. when yielding a value). The problem with that the code
> was that it was complicated and didn't work right in all cases.
> One way of solving the string concatination issue would be to write a
> string find routine that will work on lists of strings while ignoring
> the boundaries between list elements. (I.e., it will consider the
> list of strings to be one long string for its purposes.) Unless it
> written in C, however, I bet it will typically be much slower than
> code I just provided.
> > Perhaps you might like to refer back to CdV's solution which was
> > prepending the residue to the first element of the split() result.
> The problem with that solution is that it doesn't work in all cases
> when the line-separation string is more than one character.
> >> for line in lines: yield line + outputLineEnd
> > In the case of leaveNewline being false, you are concatenating an
> > string. IMHO, to quote Jon Bentley, one should "do nothing
> In Python,
> longString + "" is longString
> evaluates to True. I don't know how you can do nothing more
> gracefully than that.
And also "" + longString is longString
The string + operator provides those graceful *external* results by
ugly special-case testing internally.
It is not graceful IMHO to concatenate a variable which you already
know refers to a null string.
Let's go back to the first point, and indeed further back to the use
(1) multi-byte separator for lines in test files: never heard of one
apart from '\r\n'; presume this is rare, so test for length of 1 and
use Chris's simplification of my effort in this case.
(2) keep newline: with the standard file reading routines, if one is
going to do anything much with the line other than write it out again,
one does buffer = buffer.rstrip('\n') anyway. In the case of a
non-standard separator, one is likely to want to write the line out
with the standard '\n'. So, specialisation for this is indicated:
! if keepNewline:
! for line in lines: yield line + newline
! for line in lines: yield line
More information about the Python-list