[docs] [issue12855] linebreak sequences should be better documented

Matthew Boehm report at bugs.python.org
Tue Aug 30 06:45:19 CEST 2011

Matthew Boehm <boehm.matthew at gmail.com> added the comment:

I've attached a patch for python2.7 that adds a small not to library/stdtypes.html#str.splitlines explaining which sequences are treated as line breaks:

Note: Python recognizes "\r", "\n", and "\r\n" as line boundaries for strings.

In addition to these, Unicode strings can have line boundaries of u"\x0b", u"\x0c", u"\x85", u"\u2028", and u"\u2029"

Additional thoughts:

* Would it be better to put this note in a different place?

* It looks like \x0b and \x0c (vertical tab and form feed) were first considered line breaks in Python 2.7, probably related to this note from "What's New in 2.7": "The Unicode database provided by the unicodedata module is now used internally to determine which characters are numeric, whitespace, or represent line breaks." It might be worth putting a "changed in 2.7" note somewhere in the docs.

Please let me know of any thoughts you have and I'll be glad to make any desired changes and submit a new patch.

keywords: +patch
title: open() and codecs.open() treat form-feed differently -> linebreak sequences should be better documented
Added file: http://bugs.python.org/file23069/linebreakdoc.py27.patch

Python tracker <report at bugs.python.org>

More information about the docs mailing list