[issue12855] linebreak sequences should be better documented
Matthew Boehm
report at bugs.python.org
Tue Aug 30 06:45:19 CEST 2011
Matthew Boehm <boehm.matthew at gmail.com> added the comment:
I've attached a patch for python2.7 that adds a small not to library/stdtypes.html#str.splitlines explaining which sequences are treated as line breaks:
"""
Note: Python recognizes "\r", "\n", and "\r\n" as line boundaries for strings.
In addition to these, Unicode strings can have line boundaries of u"\x0b", u"\x0c", u"\x85", u"\u2028", and u"\u2029"
"""
Additional thoughts:
* Would it be better to put this note in a different place?
* It looks like \x0b and \x0c (vertical tab and form feed) were first considered line breaks in Python 2.7, probably related to this note from "What's New in 2.7": "The Unicode database provided by the unicodedata module is now used internally to determine which characters are numeric, whitespace, or represent line breaks." It might be worth putting a "changed in 2.7" note somewhere in the docs.
Please let me know of any thoughts you have and I'll be glad to make any desired changes and submit a new patch.
----------
keywords: +patch
title: open() and codecs.open() treat form-feed differently -> linebreak sequences should be better documented
Added file: http://bugs.python.org/file23069/linebreakdoc.py27.patch
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12855>
_______________________________________
More information about the Python-bugs-list
mailing list