Canonical way of dealing with null-separated lines?

Douglas Alan nessus at mit.edu
Tue Mar 1 20:25:29 EST 2005


"John Machin" <sjmachin at lexicon.net> writes:

>> In Python,

>>    longString + "" is longString

>> evaluates to True.  I don't know how you can do nothing more
>> gracefully than that.

> And also "" + longString is longString

> The string + operator provides those graceful *external* results by
> ugly special-case testing internally.

I guess I don't know what you are getting at.  If Python peforms ugly
special-case testing internally so that I can write more simple,
elegant code, then more power to it!  Concentrating ugliness in one,
small, reusable place is a good thing.


> It is not graceful IMHO to concatenate a variable which you already
> know refers to a null string.

It's better than making my code bigger, uglier, and putting in extra
tests for no particularly good reason.


> Let's go back to the first point, and indeed further back to the use
> cases:

> (1) multi-byte separator for lines in test files: never heard of one
> apart from '\r\n'; presume this is rare, so test for length of 1 and
> use Chris's simplification of my effort in this case.

I want to ability to handle multibyte separators, and so I coded for
it.  There are plenty of other uses for an iterator that handles
multi-byte separators.  Not all of them would typically be considered
"newline-delimited lines" as opposed to "records delimited by a
separation string", but a rose by any other name....

If one wants to special case for single-byte separators in the name of
efficiency, I provided one back there in the thread that never
degrades to N^2, as the ones you and Chris provided.


> (2) keep newline: with the standard file reading routines, if one is
> going to do anything much with the line other than write it out again,
> one does buffer = buffer.rstrip('\n') anyway. In the case of a
> non-standard separator, one is likely to want to write the line out
> with the standard '\n'. So, specialisation for this is indicated:

> ! if keepNewline:
> !     for line in lines: yield line + newline
> ! else:
> !     for line in lines: yield line

I would certainly never want the iterator to tack on a standard "\n"
as a replacement for whatever newline string the input used.  That
seems like completely gratuitous functionality to me.  The standard
(but not the only) reason that I want the line terminator left on the
yielded strings is so that you can tell whether or not there is a
line-separator terminating the very last line of the input.  Usually I
want the line-terminator discarded, and it kind of annoys me that the
standard line iterator leaves it on.

|>oug



More information about the Python-list mailing list