[Python-Dev] Mixing memory management APIs

M.-A. Lemburg mal@lemburg.com
Thu, 07 Feb 2002 09:55:11 +0100


Neal Norwitz wrote:
> 
> "M.-A. Lemburg" wrote:
> >
> > Neal Norwitz wrote:
> >
> > > "M.-A. Lemburg" wrote:
> > >
> > >
> > >>I've checked in a patch for the UTF-8 codec problem. Could you
> > >>try Purify against the CVS version ?
> > >>
> > >
> > > with-pymalloc or without or both?
> >
> > Both if possible -- the leakage showed up with pymalloc AFAIR :-)
> 
> There is a lot of data and it's very hard to follow,
> but I'm trying to provide as much info as I can.
> Let me know how I can make this info easier to use.
> 
> Here is a summary:
> 
>     * I'm using gcc version 2.95.3, on Solaris 8, Purify 2002.
> 
>     * The new patches don't fix all the problems, but it may
>         reduce the problems (I'm not sure).  I think there were
>         13k errors on build before, it's 5.5k now.
> 
>     * test_unicodedata fails:
>                 *** mismatch between line 3 of expected output and
>                         line 3 of actual output:
>                 - Methods: 6c7a7c02657b69d0fdd7a7d174f573194bba2e18
>                 + Methods: 3051e6d4d117133c3e36a5c22b3a1ae362474321

Hmm, I did run test_unicode, but forgot test_unicodedata. Now, looking
at test_unicodedata.py it produces loads of these unpaired Unicode
surrogates and then tries to encode them using UTF-8. Since the
UTF-8 previously produced wrong results for these, I guess I'll have
to recreate the test output.
 
>     * Purify now has 2 UMRs now w/o pymalloc, but they are in
>         fwrite() and contain no usable stack trace.
> 
>     * It's probably best to try using Electric Fence and/or dbmalloc.
>         This may give better results than Purify.
> 
>     * There is a warning from sre.h that may be significant:
>         Modules/sre.h:24: warning: `SRE_CODE' redefined
>         Modules/sre.h:19: warning: this is the location
>                           of the previous definition
> 
> I'll try some more things to see if I can get better info.
> 
> Neal
> --
> 
> bash-2.03$ ./configure --with-pymalloc --enable-unicode=ucs4
> bash-2.03$ make PURIFY=purify
> 
> --->  5542 errors
>         Free Memory Read, Array Bounds Read, and Uninit Memory Read errors
>                 at lines unicodeobject.c:2214 & 2875
>                 (both are bogus lines)
> 
>         2214 is in:  PyUnicode_TranslateCharmap()
>         2875 is in:  split_char()

Hmm, I'll have to look at this one...
 
> bash-2.03$ ./python -E -tt Lib/test/regrtest.py test_unicode.py \
>                         test_unicode_file.py test_unicodedata.py
> test_unicode
> test test_unicode crashed -- exceptions.UnicodeError: UTF-8 decoding error: illegal encoding

That's strange, because at least on my machine, test_unicode runs
through just fine. Could you run the test by hand, so that the error
location
can be localized ?

> test_unicode_file
> test_unicodedata
> test test_unicodedata produced unexpected output:
> **********************************************************************
> *** mismatch between line 3 of expected output and line 3 of actual output:
> - Methods: 6c7a7c02657b69d0fdd7a7d174f573194bba2e18
> + Methods: 3051e6d4d117133c3e36a5c22b3a1ae362474321
> **********************************************************************

See above.

> 1 test OK.
> 2 tests failed:
>     test_unicode test_unicodedata
> 
> --------------------------------------------------------------------
> 
> Without purify, test_unicode completed successfully, but unicodedata
> produced the same results.
> 
> The errors produced in purify for these 3 tests were 99745.
> The errors were in the same places as for the build step.
> 
> --------------------------------------------------------------------
> 
> bash-2.03$ make clean
> bash-2.03$ ./configure --enable-unicode=ucs4
> bash-2.03$ make PURIFY=purify
> 
> bash-2.03$ ./python -E -tt Lib/test/regrtest.py test_unicode.py \
>                         test_unicode_file.py test_unicodedata.py
> test test_unicodedata produced unexpected output:
> **********************************************************************
> *** mismatch between line 3 of expected output and line 3 of actual output:
> - Methods: 6c7a7c02657b69d0fdd7a7d174f573194bba2e18
> + Methods: 84b72943b1d4320bc1e64a4888f7cdf62eea219a
> **********************************************************************
> 2 tests OK.
> 1 test failed:
>     test_unicodedata
> 
> --------------------------------------------------------------------
> 
> Purify did have 2 UMRs, but both contain almost no information:
> 
>       UMR: Uninitialized memory read
>       This is occurring while in:
>             _write         [libc.so.1]
>             _xflsbuf       [libc.so.1]
>             _fflush_u      [libc.so.1]
>             fseek          [libc.so.1]
>             *unknown func* [pc=0xe417c]
>             *unknown func* [pc=0xe4db4]
>             *unknown func* [pc=0xe64c4]
>             *unknown func* [pc=0xe5cf0]
>             *unknown func* [pc=0xe5524]
>             *unknown func* [pc=0xe58a0]
>             *unknown func* [pc=0x160464]
>             *unknown func* [pc=0x159b64]
>       Reading 3609 bytes from 0x6a2fcc in the heap (4 bytes at 0x6a3706 uninit).
>       Address 0x6a2fcc is 4 bytes into a malloc'd block at 0x6a2fc8 of 8200 bytes.
>       This block was allocated from:
>             do_mkvalue     [modsupport.c:243]
>             _findbuf       [libc.so.1]
>             _wrtchk        [libc.so.1]
>             _flsbuf        [libc.so.1]
>             putc           [libc.so.1]
>             *unknown func* [pc=0xe8b9c]
>             *unknown func* [pc=0xed794]
>             *unknown func* [pc=0xe4104]
>             *unknown func* [pc=0xe4db4]
>             *unknown func* [pc=0xe64c4]
>             *unknown func* [pc=0xe5cf0]
>             *unknown func* [pc=0xe5524]
> 
> --------------------------------------------------------------------
> 
>       UMR: Uninitialized memory read
>       This is occurring while in:
>             _write         [libc.so.1]
>             _xflsbuf       [libc.so.1]
>             _fwrite_unlocked [libc.so.1]
>             fwrite         [libc.so.1]
>             *unknown func* [pc=0xeaa50]
>             *unknown func* [pc=0xeadf4]
>             *unknown func* [pc=0xeb3c8]
>             *unknown func* [pc=0xed7e8]
>             *unknown func* [pc=0xe411c]
>             *unknown func* [pc=0xe4db4]
>             *unknown func* [pc=0xe64c4]
>             *unknown func* [pc=0xe5cf0]
>       Reading 8192 bytes from 0x79d88c in the heap (4 bytes at 0x79de8d uninit).
>       Address 0x79d88c is 4 bytes into a malloc'd block at 0x79d888 of 8200 bytes.
>       This block was allocated from:
>             do_mkvalue     [modsupport.c:243]
>             _findbuf       [libc.so.1]
>             _wrtchk        [libc.so.1]
>             _flsbuf        [libc.so.1]
>             putc           [libc.so.1]
>             *unknown func* [pc=0xe8b9c]
>             *unknown func* [pc=0xed794]
>             *unknown func* [pc=0xe4104]
>             *unknown func* [pc=0xe4db4]
>             *unknown func* [pc=0xe64c4]
>             *unknown func* [pc=0xe5cf0]
>             *unknown func* [pc=0xe5524]

Thanks,
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                   http://www.egenix.com/files/python/