[Python-bugs-list] [ python-Feature Requests-691291 ] codecs.open(filename, 'U', 'UTF-16') corrupts text

SourceForge.net noreply@sourceforge.net
Wed, 26 Feb 2003 05:44:53 -0800


Feature Requests item #691291, was opened at 2003-02-22 20:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=691291&group_id=5470

>Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jason Orendorff (jorend)
>Assigned to: Nobody/Anonymous (nobody)
Summary: codecs.open(filename, 'U', 'UTF-16') corrupts text

Initial Comment:
Tested in Python 2.3a1.

If I write u'Hello\r\nworld\r\n' to a file, then read
it back in 'U' mode, I should get u'Hello\nworld\n'.

However, if I do this using codecs.open() and the
UTF-16 encoding, I get u'Hello\n\nworld\n\n'.

codecs.open() is not 'U'-mode-aware.  The underlying
file is opened in universal newline mode, so the byte
'\x0d' is erroneously translated to '\x0a' before the
UTF-16 codec has a chance to decode it.

The attached unit test should show specifically what it
is that I wish would work.


----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2003-02-26 14:44

Message:
Logged In: YES 
user_id=38388

I'm turning this into a feature request. codecs.open()
does not support 'U' as file mode.

Assigning to Jack since he introduced the 'U' mode option.
Jack, what can we do about this ?

----------------------------------------------------------------------

Comment By: Jason Orendorff (jorend)
Date: 2003-02-22 22:17

Message:
Logged In: YES 
user_id=18139

Tested in Python 2.3a2 as well (the bug is still there).

Note that this isn't limited to UTF-16.  It will affect any
encoding that uses the byte '\x0d' to mean anything other
than u'\r'.  The most common American/European encodings are
safe (ASCII, Latin-1 and friends, and UTF-8).


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=691291&group_id=5470