Re: [Python-Dev] Mixing memory management APIs

Neal Norwitz <neal@metaslash.com> writes:
Might this have something to do with bug [ #495401 ] Build troubles: --with-pymalloc http://sourceforge.net/tracker/?func=detail&atid=105470&aid=495401&group_id=5470 ? Is there a reason one of the fixes for this problem hasn't been checked in yet? Cheers, M. -- . <- the point your article -> . |------------------------- a long way ------------------------| -- Cristophe Rhodes, ucam.chat

Michael Hudson wrote:
It is currently assigned to Martin. Perhaps I should just take the Unicode patch and check it in (the first one, not the second one for the reasons stated in the bug-tracker) ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

"M.-A. Lemburg" wrote:
I've checked in a patch for the UTF-8 codec problem. Could you try Purify against the CVS version ? Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Neal Norwitz wrote:
Both if possible -- the leakage showed up with pymalloc AFAIR :-) Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

"M.-A. Lemburg" wrote:
There is a lot of data and it's very hard to follow, but I'm trying to provide as much info as I can. Let me know how I can make this info easier to use. Here is a summary: * I'm using gcc version 2.95.3, on Solaris 8, Purify 2002. * The new patches don't fix all the problems, but it may reduce the problems (I'm not sure). I think there were 13k errors on build before, it's 5.5k now. * test_unicodedata fails: *** mismatch between line 3 of expected output and line 3 of actual output: - Methods: 6c7a7c02657b69d0fdd7a7d174f573194bba2e18 + Methods: 3051e6d4d117133c3e36a5c22b3a1ae362474321 * Purify now has 2 UMRs now w/o pymalloc, but they are in fwrite() and contain no usable stack trace. * It's probably best to try using Electric Fence and/or dbmalloc. This may give better results than Purify. * There is a warning from sre.h that may be significant: Modules/sre.h:24: warning: `SRE_CODE' redefined Modules/sre.h:19: warning: this is the location of the previous definition I'll try some more things to see if I can get better info. Neal -- bash-2.03$ ./configure --with-pymalloc --enable-unicode=ucs4 bash-2.03$ make PURIFY=purify ---> 5542 errors Free Memory Read, Array Bounds Read, and Uninit Memory Read errors at lines unicodeobject.c:2214 & 2875 (both are bogus lines) 2214 is in: PyUnicode_TranslateCharmap() 2875 is in: split_char() bash-2.03$ ./python -E -tt Lib/test/regrtest.py test_unicode.py \ test_unicode_file.py test_unicodedata.py test_unicode test test_unicode crashed -- exceptions.UnicodeError: UTF-8 decoding error: illegal encoding test_unicode_file test_unicodedata test test_unicodedata produced unexpected output: ********************************************************************** *** mismatch between line 3 of expected output and line 3 of actual output: - Methods: 6c7a7c02657b69d0fdd7a7d174f573194bba2e18 + Methods: 3051e6d4d117133c3e36a5c22b3a1ae362474321 ********************************************************************** 1 test OK. 2 tests failed: test_unicode test_unicodedata -------------------------------------------------------------------- Without purify, test_unicode completed successfully, but unicodedata produced the same results. The errors produced in purify for these 3 tests were 99745. The errors were in the same places as for the build step. -------------------------------------------------------------------- bash-2.03$ make clean bash-2.03$ ./configure --enable-unicode=ucs4 bash-2.03$ make PURIFY=purify bash-2.03$ ./python -E -tt Lib/test/regrtest.py test_unicode.py \ test_unicode_file.py test_unicodedata.py test test_unicodedata produced unexpected output: ********************************************************************** *** mismatch between line 3 of expected output and line 3 of actual output: - Methods: 6c7a7c02657b69d0fdd7a7d174f573194bba2e18 + Methods: 84b72943b1d4320bc1e64a4888f7cdf62eea219a ********************************************************************** 2 tests OK. 1 test failed: test_unicodedata -------------------------------------------------------------------- Purify did have 2 UMRs, but both contain almost no information: UMR: Uninitialized memory read This is occurring while in: _write [libc.so.1] _xflsbuf [libc.so.1] _fflush_u [libc.so.1] fseek [libc.so.1] *unknown func* [pc=0xe417c] *unknown func* [pc=0xe4db4] *unknown func* [pc=0xe64c4] *unknown func* [pc=0xe5cf0] *unknown func* [pc=0xe5524] *unknown func* [pc=0xe58a0] *unknown func* [pc=0x160464] *unknown func* [pc=0x159b64] Reading 3609 bytes from 0x6a2fcc in the heap (4 bytes at 0x6a3706 uninit). Address 0x6a2fcc is 4 bytes into a malloc'd block at 0x6a2fc8 of 8200 bytes. This block was allocated from: do_mkvalue [modsupport.c:243] _findbuf [libc.so.1] _wrtchk [libc.so.1] _flsbuf [libc.so.1] putc [libc.so.1] *unknown func* [pc=0xe8b9c] *unknown func* [pc=0xed794] *unknown func* [pc=0xe4104] *unknown func* [pc=0xe4db4] *unknown func* [pc=0xe64c4] *unknown func* [pc=0xe5cf0] *unknown func* [pc=0xe5524] -------------------------------------------------------------------- UMR: Uninitialized memory read This is occurring while in: _write [libc.so.1] _xflsbuf [libc.so.1] _fwrite_unlocked [libc.so.1] fwrite [libc.so.1] *unknown func* [pc=0xeaa50] *unknown func* [pc=0xeadf4] *unknown func* [pc=0xeb3c8] *unknown func* [pc=0xed7e8] *unknown func* [pc=0xe411c] *unknown func* [pc=0xe4db4] *unknown func* [pc=0xe64c4] *unknown func* [pc=0xe5cf0] Reading 8192 bytes from 0x79d88c in the heap (4 bytes at 0x79de8d uninit). Address 0x79d88c is 4 bytes into a malloc'd block at 0x79d888 of 8200 bytes. This block was allocated from: do_mkvalue [modsupport.c:243] _findbuf [libc.so.1] _wrtchk [libc.so.1] _flsbuf [libc.so.1] putc [libc.so.1] *unknown func* [pc=0xe8b9c] *unknown func* [pc=0xed794] *unknown func* [pc=0xe4104] *unknown func* [pc=0xe4db4] *unknown func* [pc=0xe64c4] *unknown func* [pc=0xe5cf0] *unknown func* [pc=0xe5524]

Neal Norwitz wrote:
Hmm, I did run test_unicode, but forgot test_unicodedata. Now, looking at test_unicodedata.py it produces loads of these unpaired Unicode surrogates and then tries to encode them using UTF-8. Since the UTF-8 previously produced wrong results for these, I guess I'll have to recreate the test output.
Hmm, I'll have to look at this one...
That's strange, because at least on my machine, test_unicode runs through just fine. Could you run the test by hand, so that the error location can be localized ?
See above.
Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

I've just checked in a set of fixes for the UTF-8 encoder and decoder and also updated the test output of test_unicodedata. You should now no longer get the test failures you were seeing (test_unicode failure was due to the old marshal format using illegal UTF-8 sequences, test_unicodedata was due to the same UTF-8 problem but shows up in a different hash value). Hope I got it right this time around :-/ -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Neal Norwitz wrote:
So that bug seems to be fixed now. Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Michael Hudson wrote:
It is currently assigned to Martin. Perhaps I should just take the Unicode patch and check it in (the first one, not the second one for the reasons stated in the bug-tracker) ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

"M.-A. Lemburg" wrote:
I've checked in a patch for the UTF-8 codec problem. Could you try Purify against the CVS version ? Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Neal Norwitz wrote:
Both if possible -- the leakage showed up with pymalloc AFAIR :-) Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

"M.-A. Lemburg" wrote:
There is a lot of data and it's very hard to follow, but I'm trying to provide as much info as I can. Let me know how I can make this info easier to use. Here is a summary: * I'm using gcc version 2.95.3, on Solaris 8, Purify 2002. * The new patches don't fix all the problems, but it may reduce the problems (I'm not sure). I think there were 13k errors on build before, it's 5.5k now. * test_unicodedata fails: *** mismatch between line 3 of expected output and line 3 of actual output: - Methods: 6c7a7c02657b69d0fdd7a7d174f573194bba2e18 + Methods: 3051e6d4d117133c3e36a5c22b3a1ae362474321 * Purify now has 2 UMRs now w/o pymalloc, but they are in fwrite() and contain no usable stack trace. * It's probably best to try using Electric Fence and/or dbmalloc. This may give better results than Purify. * There is a warning from sre.h that may be significant: Modules/sre.h:24: warning: `SRE_CODE' redefined Modules/sre.h:19: warning: this is the location of the previous definition I'll try some more things to see if I can get better info. Neal -- bash-2.03$ ./configure --with-pymalloc --enable-unicode=ucs4 bash-2.03$ make PURIFY=purify ---> 5542 errors Free Memory Read, Array Bounds Read, and Uninit Memory Read errors at lines unicodeobject.c:2214 & 2875 (both are bogus lines) 2214 is in: PyUnicode_TranslateCharmap() 2875 is in: split_char() bash-2.03$ ./python -E -tt Lib/test/regrtest.py test_unicode.py \ test_unicode_file.py test_unicodedata.py test_unicode test test_unicode crashed -- exceptions.UnicodeError: UTF-8 decoding error: illegal encoding test_unicode_file test_unicodedata test test_unicodedata produced unexpected output: ********************************************************************** *** mismatch between line 3 of expected output and line 3 of actual output: - Methods: 6c7a7c02657b69d0fdd7a7d174f573194bba2e18 + Methods: 3051e6d4d117133c3e36a5c22b3a1ae362474321 ********************************************************************** 1 test OK. 2 tests failed: test_unicode test_unicodedata -------------------------------------------------------------------- Without purify, test_unicode completed successfully, but unicodedata produced the same results. The errors produced in purify for these 3 tests were 99745. The errors were in the same places as for the build step. -------------------------------------------------------------------- bash-2.03$ make clean bash-2.03$ ./configure --enable-unicode=ucs4 bash-2.03$ make PURIFY=purify bash-2.03$ ./python -E -tt Lib/test/regrtest.py test_unicode.py \ test_unicode_file.py test_unicodedata.py test test_unicodedata produced unexpected output: ********************************************************************** *** mismatch between line 3 of expected output and line 3 of actual output: - Methods: 6c7a7c02657b69d0fdd7a7d174f573194bba2e18 + Methods: 84b72943b1d4320bc1e64a4888f7cdf62eea219a ********************************************************************** 2 tests OK. 1 test failed: test_unicodedata -------------------------------------------------------------------- Purify did have 2 UMRs, but both contain almost no information: UMR: Uninitialized memory read This is occurring while in: _write [libc.so.1] _xflsbuf [libc.so.1] _fflush_u [libc.so.1] fseek [libc.so.1] *unknown func* [pc=0xe417c] *unknown func* [pc=0xe4db4] *unknown func* [pc=0xe64c4] *unknown func* [pc=0xe5cf0] *unknown func* [pc=0xe5524] *unknown func* [pc=0xe58a0] *unknown func* [pc=0x160464] *unknown func* [pc=0x159b64] Reading 3609 bytes from 0x6a2fcc in the heap (4 bytes at 0x6a3706 uninit). Address 0x6a2fcc is 4 bytes into a malloc'd block at 0x6a2fc8 of 8200 bytes. This block was allocated from: do_mkvalue [modsupport.c:243] _findbuf [libc.so.1] _wrtchk [libc.so.1] _flsbuf [libc.so.1] putc [libc.so.1] *unknown func* [pc=0xe8b9c] *unknown func* [pc=0xed794] *unknown func* [pc=0xe4104] *unknown func* [pc=0xe4db4] *unknown func* [pc=0xe64c4] *unknown func* [pc=0xe5cf0] *unknown func* [pc=0xe5524] -------------------------------------------------------------------- UMR: Uninitialized memory read This is occurring while in: _write [libc.so.1] _xflsbuf [libc.so.1] _fwrite_unlocked [libc.so.1] fwrite [libc.so.1] *unknown func* [pc=0xeaa50] *unknown func* [pc=0xeadf4] *unknown func* [pc=0xeb3c8] *unknown func* [pc=0xed7e8] *unknown func* [pc=0xe411c] *unknown func* [pc=0xe4db4] *unknown func* [pc=0xe64c4] *unknown func* [pc=0xe5cf0] Reading 8192 bytes from 0x79d88c in the heap (4 bytes at 0x79de8d uninit). Address 0x79d88c is 4 bytes into a malloc'd block at 0x79d888 of 8200 bytes. This block was allocated from: do_mkvalue [modsupport.c:243] _findbuf [libc.so.1] _wrtchk [libc.so.1] _flsbuf [libc.so.1] putc [libc.so.1] *unknown func* [pc=0xe8b9c] *unknown func* [pc=0xed794] *unknown func* [pc=0xe4104] *unknown func* [pc=0xe4db4] *unknown func* [pc=0xe64c4] *unknown func* [pc=0xe5cf0] *unknown func* [pc=0xe5524]

Neal Norwitz wrote:
Hmm, I did run test_unicode, but forgot test_unicodedata. Now, looking at test_unicodedata.py it produces loads of these unpaired Unicode surrogates and then tries to encode them using UTF-8. Since the UTF-8 previously produced wrong results for these, I guess I'll have to recreate the test output.
Hmm, I'll have to look at this one...
That's strange, because at least on my machine, test_unicode runs through just fine. Could you run the test by hand, so that the error location can be localized ?
See above.
Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

I've just checked in a set of fixes for the UTF-8 encoder and decoder and also updated the test output of test_unicodedata. You should now no longer get the test failures you were seeing (test_unicode failure was due to the old marshal format using illegal UTF-8 sequences, test_unicodedata was due to the same UTF-8 problem but shows up in a different hash value). Hope I got it right this time around :-/ -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Neal Norwitz wrote:
So that bug seems to be fixed now. Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
participants (3)
-
M.-A. Lemburg
-
Michael Hudson
-
Neal Norwitz