[ python-Bugs-1728403 ] reading from malformed big5 document hangs cpython
SourceForge.net
noreply at sourceforge.net
Thu May 31 06:49:18 CEST 2007
Bugs item #1728403, was opened at 2007-05-30 08:36
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1728403&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: tsuraan (tsuraan3)
>Assigned to: Hye-Shik Chang (perky)
Summary: reading from malformed big5 document hangs cpython
Initial Comment:
Python enters some sort of infinite loop when attempting to read data from a malformed file that is big5 encoded (using the codecs library). This behaviour can be observed under Linux and FreeBSD, using Python 2.4 and 2.5 . A really simple example illustrating the bug follows:
Python 2.4.4 (#1, May 15 2007, 13:33:55)
[GCC 4.1.1 (Gentoo 4.1.1-r3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import codecs
>>> fname='out'
>>> outfd=open(fname,'w')
>>> outfd.write(chr(243))
>>> outfd.close()
>>>
>>> infd= codecs.open(fname, encoding='big5')
>>> infd.read(1024)
And then, it hangs forever. If I instead use the following code:
Python 2.5 (r25:51908, Jan 8 2007, 19:09:28)
[GCC 3.4.5 (Gentoo 3.4.5-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import codecs, signal
>>> fname='out'
>>> def handler(*args):
... raise Exception("boo!")
...
>>> signal.signal(signal.SIGALRM, handler)
0
>>> outfd=open(fname, 'w')
>>> outfd.write (chr(243))
>>> outfd.close()
>>>
>>> infd=codecs.open(fname, encoding='big5')
>>> signal.alarm(5)
0
>>> infd.read(1024)
The program still hangs forever. The program can be made to crash if I don't install a signal handler at all, but that's pretty lame. It looks like the entire interpreter is being locked up by this read, so I don't think there's likely to be a pure-python workaround, but I thought it would be a good but to have out there so a future version of python can (hopefully) fix this.
----------------------------------------------------------------------
>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-05-30 21:49
Message:
Logged In: YES
user_id=33168
Originator: NO
Hye-Shik, could you take a look at this. There's an infinite loop in
Modules/cjkcodecs/multibytecodec.c mbstreamreader_iread(). rsize == 1 each
iteration. I don't know if there are more places that might have this
problem.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1728403&group_id=5470
More information about the Python-bugs-list
mailing list