[ python-Bugs-1377394 ] read() / readline() blow up if file has even number of char.

SourceForge.net noreply at sourceforge.net
Sat Dec 10 00:17:18 CET 2005


Bugs item #1377394, was opened at 2005-12-09 15:43
Message generated for change (Comment added) made by superwesman
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1377394&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: superwesman (superwesman)
Assigned to: M.-A. Lemburg (lemburg)
Summary: read() / readline() blow up if file has even number of char.

Initial Comment:
Hello, I am having a problem with the read() and
readline() functions.  I'm using codecs.open() to open
a text file, then using either read() or readline() to
get its contents.  In python 2.4.2, if the file has an
even number of characters, I get a UnicodeDecodeError.
 If python 2.4.1 this works regardless of the character
count.  I've pasted below a sample script and the
sample text file I was running.  This is the command I
executed at the Windows 2000 CMD prompt:

python sample.py sample.txt

Again, in 2.4.1, this works fine - in 2.4.2 it breaks
when the file-to-be-read has an odd number of characters.

Thanks.
-w

# start: sample.py

import codecs
import sys

print "open the file"
in_file = codecs.open( sys.argv[1], "r",
"unicode_internal" )
print "read the file"
the_file = in_file.read()
print "close the file"
in_file.close()
print "done"

# end: sample.py

# start: sample.txt
RESULTHOST=vivaldi
RESULTPORT=a
DB_XML=/test/art/jfw/config/DBList.xml
LOGCHECK_IGNORE=art_actions.txt

# end: sample.txt


----------------------------------------------------------------------

>Comment By: superwesman (superwesman)
Date: 2005-12-09 17:17

Message:
Logged In: YES 
user_id=1401447

I didn't realize that 'unicode_internal' was not a
legitimate value to pass into this function.  If
'unicode_internal' is not a valid 3rd parameter to
codecs.open(), shouldn't that function complain?  If it is a
valid option (that should only be used "Python internally" -
not sure what that means) then it should perform
consistently regardless of the number of characters in the
file, should it not?

Seems to me that pilot-error uncovered a bug.  If this is
not a valid choice, then codecs.open() should complain.  If
it is valid, it should perform consistently, IMHO.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2005-12-09 16:04

Message:
Logged In: YES 
user_id=38388

Why would you want to read a file using the Python internal
Unicode encoding (unicode_internal) ?

This is an encoding that is only used Python internally and
should not be used for anything else.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1377394&group_id=5470


More information about the Python-bugs-list mailing list