[Python-3000] Lines breaking

Thu May 31 09:22:41 CEST 2007

Greg Ewing writes:

 > But an FF or VT is not *just* a line break, it can
 > have other semantics attatched to it as well. So
 > treating it just the same as a \n by default would be
 > wrong, I think.

*Python* does the right thing: it leaves the line break character(s)
in place.  It's not Python's problem if programmers go around
stripping characters just because they happen to be at the end of the
line.  If you do care, you're already in trouble if you strip
willy-nilly:

>>> len("a\014\n")
3
>>> len("a\014\n".strip())
1
>>> len("a\014\n".strip() + "\n")
2
>>> "a\r\n"[:-1]
"a\r"

I think the odds are really good that there are already more people
who will expect Python to be Unicode-ly correct than who have
already-defined semantics for FF or VT that just happen to work right
if you strip the terminating LF but not a terminating FF.

The remaining issue, embedding those characters in the interior of
lines but considering them not line breaks, is considered by the
Unicode technical committee a non-issue.  Those characters are
mandatory breaks because the expectation is *very* consistent (they
say).  I gather you think it's reasonable, too, you just worry that
the additional semantics may get lost with current newline-stripping
heuristics.

As far as existing programs that will go postal if you hand them a
line that's terminated with FF or VT, I don't see any conceptual problem
with a codec (universal newline) that on input of "a\014" returns
"a\014\n".  Getting the details right (ie, respecting POLA) will
require some thought and maybe some fiddly options, but it will work.

Always-do-right-it-will-gratify-some-people-and-astonish-the-rest-ly y'rs