Canonical way of dealing with null-separated lines?
nessus at mit.edu
Wed Mar 2 02:25:29 CET 2005
"John Machin" <sjmachin at lexicon.net> writes:
>> In Python,
>> longString + "" is longString
>> evaluates to True. I don't know how you can do nothing more
>> gracefully than that.
> And also "" + longString is longString
> The string + operator provides those graceful *external* results by
> ugly special-case testing internally.
I guess I don't know what you are getting at. If Python peforms ugly
special-case testing internally so that I can write more simple,
elegant code, then more power to it! Concentrating ugliness in one,
small, reusable place is a good thing.
> It is not graceful IMHO to concatenate a variable which you already
> know refers to a null string.
It's better than making my code bigger, uglier, and putting in extra
tests for no particularly good reason.
> Let's go back to the first point, and indeed further back to the use
> (1) multi-byte separator for lines in test files: never heard of one
> apart from '\r\n'; presume this is rare, so test for length of 1 and
> use Chris's simplification of my effort in this case.
I want to ability to handle multibyte separators, and so I coded for
it. There are plenty of other uses for an iterator that handles
multi-byte separators. Not all of them would typically be considered
"newline-delimited lines" as opposed to "records delimited by a
separation string", but a rose by any other name....
If one wants to special case for single-byte separators in the name of
efficiency, I provided one back there in the thread that never
degrades to N^2, as the ones you and Chris provided.
> (2) keep newline: with the standard file reading routines, if one is
> going to do anything much with the line other than write it out again,
> one does buffer = buffer.rstrip('\n') anyway. In the case of a
> non-standard separator, one is likely to want to write the line out
> with the standard '\n'. So, specialisation for this is indicated:
> ! if keepNewline:
> ! for line in lines: yield line + newline
> ! else:
> ! for line in lines: yield line
I would certainly never want the iterator to tack on a standard "\n"
as a replacement for whatever newline string the input used. That
seems like completely gratuitous functionality to me. The standard
(but not the only) reason that I want the line terminator left on the
yielded strings is so that you can tell whether or not there is a
line-separator terminating the very last line of the input. Usually I
want the line-terminator discarded, and it kind of annoys me that the
standard line iterator leaves it on.
More information about the Python-list