Python enters some sort of infinite loop when attempting to read data from a malformed file that is big5 encoded (using the codecs library). This behaviour can be observed under Linux and FreeBSD, using Python 2.4 and 2.5
. A really simple example illustrating the bug follows:<br><br>Python 2.4.4 (#1, May 15 2007, 13:33:55) <br>[GCC 4.1.1 (Gentoo 4.1.1-r3)] on linux2<br>Type "help", "copyright", "credits" or "license" for more information.
<br>>>> import codecs<br>>>> fname='out' <br>>>> outfd=open(fname,'w')<br>>>> outfd.write(chr(243))<br>>>> outfd.close()<br>>>> <br>>>> infd=
codecs.open(fname, encoding='big5') <br>>>> infd.read(1024)<br><br>And then, it hangs forever. If I instead use the following code:<br><br>Python 2.5 (r25:51908, Jan 8 2007, 19:09:28) <br>[GCC 3.4.5 (Gentoo
3.4.5-r1, ssp-3.4.5-1.0, pie-8.7.9)] on linux2<br>Type "help", "copyright", "credits" or "license" for more information.<br>>>> import codecs, signal<br>>>> fname='out'
<br>>>> def handler(*args):<br>... raise Exception("boo!")<br>... <br>>>> signal.signal(signal.SIGALRM, handler)<br>0<br>>>> outfd=open(fname, 'w')<br>>>> outfd.write
(chr(243))<br>>>> outfd.close()<br>>>> <br>>>> infd=codecs.open(fname, encoding='big5')<br>>>> signal.alarm(5)<br>0<br>>>> infd.read(1024)<br><br>The program still hangs forever. The program can be made to crash if I don't install a signal handler at all, but that's pretty lame. It looks like the entire interpreter is being locked up by this read, so I don't think there's likely to be a pure-python workaround, but I thought it would be a good but to have out there so a future version of python can (hopefully) fix this.
<br><br>