[Python-bugs-list] [ python-Bugs-495401 ] Build troubles: --with-pymalloc

Mon, 31 Dec 2001 11:36:04 -0800

Bugs item #495401, was opened at 2001-12-20 05:24
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=495401&group_id=5470

Category: Build
Group: Platform-specific
Status: Open
Resolution: None
Priority: 5
Submitted By: Walter Dörwald (doerwalter)
Assigned to: Martin v. Löwis (loewis)
Summary: Build troubles: --with-pymalloc

Initial Comment:
The build process segfaults with the current CVS 
version when using --with-pymalloc

System is SuSE Linux 7.0

> uname -a
Linux amazonas 2.2.16-SMP #1 SMP Wed Aug 2 20:01:21 
GMT 2000 i686 unknown
> gcc -v
Reading specs from /usr/lib/gcc-lib/i486-suse-
linux/2.95.2/specs
gcc version 2.95.2 19991024 (release)

Attached is the complete build log.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2001-12-31 11:36

Message:
Logged In: YES 
user_id=31435

Martin, I like your second patch fine, but would like it a 
lot better with

assert(p - PyString_AS_STRING(v) == cbAllocated);

at the end.  Else a piddly change in either loop can cause 
more miserably hard-to-track-down problems.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-31 05:46

Message:
Logged In: YES 
user_id=21627

MAL,
I'm 100% positive that the crash on my system was caused by
the UTF-8 encoding; I've seen it in the debugger overwrite
memory that it doesn't own.

As for unicode.diff: Tim has proposed that this should not
be done, but that 4*size should be allocated in advance.
What do you think?

On unicode2.diff: If pymalloc was changed to shrink the
memory, it would have to copy the original string, since it
would likely be in a different size class. This is less
efficient than the approach taken in unicode2.diff. What
specifically is it that you dislike about first counting the
memory requirements? 

It actually simplifies the code. Notice that the current
code is still buggy with regard to surrogates. If there is a
high surrogate, but not a low one, it will write bogus UTF-8
 (with no lead byte). This is fixed in unicode2.diff as well.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-12-31 04:48

Message:
Logged In: YES 
user_id=38388

I like the unicode.diff, but not the unicode2.diff. Instead of fixing the UTF-8 codec here we should fix the 
pymalloc implementation, since overallocation is common thing to do and not only used in codecs. (Note 
that all Python extensions will start to use pymalloc too once we enable it per default.)

I don't know whether it's relevant, but I found that core dumps can easily be triggered by mixing the 
various 
memory allocation APIs in Python and the C lib. The bug may not only be related to the UTF-8 codec but 
may also linger in some other extension modules.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-30 02:26

Message:
Logged In: YES 
user_id=21627

I disabled threading, which fortunately gave me memory
watchpoints back. Then I noticed that the final *p=0
corrupted a non-NULL freeblock pointer, slightly decreasing
it. Then following the freeblock pointer, freeblock was
changed to a bogus block, which had its next pointer as
garbage. I had to trace this from the back, of course.

As for overallocation, I wonder whether the UTF-8 encoding
should overallocate at all. unicode2.diff is a modification
where it runs over the string twice, counting the number of
needed bytes the first time. This is likely slower (atleast
if no reallocations occur), but doesn't waste that much
memory (I notice that pymalloc will never copy objects when
they shrink, to return the extra space).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-12-29 19:54

Message:
Logged In: YES 
user_id=31435

Good eye, Martin!  It's clearly possible for the unpatched 
code to write beyond the memory allocated.  The one thing 
that doesn't jibe is that you said bp is 0x2, which means 3 
of its 4 bytes are 0x00, but UTF-8 doesn't produce 0 bytes 
except for one per \u0000 input character.  Right?  So, if 
this routine is the cause, where are the 0 bytes coming 
from?  (It could be test_unicode sets up a UTF-8 encoding 
case with several \u0000 characters, but if so I didn't 
stumble into it.)  Plausible:  when a new pymalloc "page" 
is allocated, the 40-byte chunks in it are *not* linked 
together at the start.  Instead a NULL pointer is stored at 
just the start of "the first" 40-byte chunk, and pymalloc-- 
on subsequent mallocs --finds that NULL and incrementally 
carves out additional 40-byte chunks.  So as a startup-- 
but not a steady-state --condition, the "next free block" 
pointers will very often be NULLs, and then if this is a 
little-endian machine, writing a single 2 byte at the start 
of a free block would lead to a bogus pointer value of 0x2.

About a fix, I'm in favor of junking all the cleverness 
here, by allocating size*4 bytes from the start.  It's 
overallocating in all normal cases already, so we're going 
to incur the expense of cutting the result string back 
anyway; how *much* we overallocate doesn't matter to speed, 
except that if we don't have to keep checking inside the 
loop, the code gets simpler and quicker and more robust.  
The loop should instead merely assert that cbWritten <= 
cbAllocated at the end of each trip.  Had this been done 
from the start, a debug build would have assert-failed a 
few nanoseconds after the wild store.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-29 17:44

Message:
Logged In: YES 
user_id=21627

I've found one source of troubles, see the attached
unicode.diff. Guido's intuition was right; it was an UCS-4
problem: EncodeUTF8 would over-allocate 3*size bytes, but
can actually write 4*size in the worst case, which occurs in
test_unicode.

I'll leave the patch for review and experiments; it fixes
the problem for me.

The existing adjustment for surrogates is pointless, IMO:
for the surrogate pair, it will allocate 6 bytes UTF-8 in
advance, which is more than actually needed.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-29 15:13

Message:
Logged In: YES 
user_id=21627

It's a Heisenbug. I found that even slightest modifications
to the Python source make it come and go, or appear at
different places. On my system, the crashes normally occur
in the first run (.pyc). So I don't think the order of make
invocations is the source of the problem. It's likely as Tim
says: somebody overwrites memory somewhere.

Unfortunately, even though I can reproduce crashes for the
same pool, for some reason, my gdb memory watches don't
trigger. Tim's approach of checking whether a the value came
from following the free list did not help, either: the bug
disappeared under the change.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2001-12-29 14:01

Message:
Logged In: YES 
user_id=6656

Hmm.  I now think that the stuff about extension modules is 
almost certainly a read herring.  What I said about "make && 
make altinstall" vs "make altinstall" still seems to be true, 
though.

If you compile with --with-pydebug, you crash right at the 
end of the second (-O) run of compileall.py -- I suspect this 
is something else, but it might not be.

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2001-12-29 13:29

Message:
Logged In: YES 
user_id=6656

I don't know if these are helpful observations or not, but 
anyway:

(1) it doesn't core without the --enable-unicode=ucs4 option
(2) if you just run "make altinstall"  the library files are 
installed *and compiled* before the dynamically linked 
modules are built.  Then we don't crash.
(3) if you run "make altinstall" again, we crash.
If you initially ran "make && make install", we crash.
(4) when we crash, it's not long after the unicode tests are 
compiled.

Are these real clues or just red herrings?  I'm afraid I 
can't tell :(

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2001-12-29 10:43

Message:
Logged In: YES 
user_id=31435

Ouch.  Boosted priority back to 5, since Martin can 
reproduce it.  Alas, where pymalloc got called *from* is 
almost certainly irrelevant -- we're seeing the end result 
of earlier corruption.

Note that pymalloc is unusually sensitive to off-by-1 
stores, since the chunks it hands out are contiguous 
(there's no hidden bookkeeping padding between them).  
Plausible:  an earlier bogus store went beyond the end of 
its allocated chunk, overwriting the "next free block" 
pointer at the start of a previously free()'ed chunk of the 
same size (rounded up to a multiple of 8; 40 bytes in this 
case).

At the time this blows up, bp is supposed to point to a 
previously free()'ed chunk of size 40 bytes (if there were 
none free()'ed and available, the earlier "pool != pool-
>nextpool" guard should have failed).  The first 4 bytes 
(let's simplify by assuming this is a 32-bit box) of the 
free chunks link the free chunks together, most recently 
free()'ed at the start of the (singly linked) list.  So the 
code at this point is intent on returning bp, and "pool-
>freeblock = *(block **)bp" is setting the 40-byte-chunk 
list header's idea of the *next* available 40-byte chunk.

But bp is bogus.  The value of bp is gotten out of the free 
list headers, the static array usedpools.  This mechanism 
is horridly obscure, an array of pointer pairs that, in 
effect, capture just the first two members of the 
pool_header struct, once for each chunk size.  It's 
possible that someone is overwriting usedpools[4 + 4]-
>freeblock directly with 2, but that seems unlikely.

More likely is that a free() operation linked a 40-byte 
chunk into the list headed at usedpools[4+4]->freeblock 
correctly, and a later bad store overwrote the first 4 
bytes of the free()'ed block with 2.  Then the "pool-
>freeblock = *(block **)bp)" near the start of an 
unexceptional pymalloc would copy the 2 into the list 
header's freeblock without complaint.  The error wouldn't 
show up until a subsequent malloc tried to use it.

So that's one idea to get closer to the cause:  add code to 
dereference pool->freeblock, before the "return (void *)
bp".  If that blows up earlier, then the first four bytes 
of bp were corrupted, and that gives you a useful data 
breakpoint address for the next run.  If it doesn't blow up 
earlier, the corruption will be harder to find, but let's 
count on being lucky at first <wink>.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-29 08:00

Message:
Logged In: YES 
user_id=21627

Ok, I can reproduce it now; I did not 'make install' before.
Here is a gdb back trace

#0  _PyCore_ObjectMalloc (nbytes=33) at Objects/obmalloc.c:417
#1  0x805885c in PyString_FromString (str=0x816c6e8
"checkJoin") at Objects/stringobject.c:136
#2  0x805d772 in PyString_InternFromString (cp=0x816c6e8
"checkJoin") at Objects/stringobject.c:3640
#3  0x807c9c6 in com_addop_varname (c=0xbfffe87c, kind=0,
name=0x816c6e8 "checkJoin")
    at Python/compile.c:939
#4  0x807de24 in com_atom (c=0xbfffe87c, n=0x816c258) at
Python/compile.c:1478
#5  0x807f01c in com_power (c=0xbfffe87c, n=0x816c8b8) at
Python/compile.c:1846
#6  0x807f545 in com_factor (c=0xbfffe87c, n=0x816c898) at
Python/compile.c:1975
#7  0x807f56c in com_term (c=0xbfffe87c, n=0x816c878) at
Python/compile.c:1985
#8  0x807f6bc in com_arith_expr (c=0xbfffe87c, n=0x816c858)
at Python/compile.c:2020
#9  0x807f7dc in com_shift_expr (c=0xbfffe87c, n=0x816c838)
at Python/compile.c:2046
#10 0x807f8fc in com_and_expr (c=0xbfffe87c, n=0x816c818) at
Python/compile.c:2072
#11 0x807fa0c in com_xor_expr (c=0xbfffe87c, n=0x816c7f8) at
Python/compile.c:2094
...

The access that crashes is *(block **)bp, since bp is 0x2.

Not sure how that happens; I'll investigate (but would
appreciate a clue). It seems that the pool chain got corrupted.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-29 06:52

Message:
Logged In: YES 
user_id=6380

Aha!  The --enable-unicode=ucs4 is more suspicious than the
--with-pymalloc. I had missed that info when this was first
reported.

Not that I'm any closer to solving it... :-(

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-12-29 02:53

Message:
Logged In: YES 
user_id=89016

OK, I did a "make distclean" which removed .o files and
the build directory and redid a "./configure --enable-
unicode=ucs4 --with-pymalloc && make && make altinstall".

The build process still crashes in the same spot:
Compiling /usr/local/lib/python2.2/test/test_urlparse.py ...
make: *** [libinstall] Segmentation fault

I also retried with a fresh untarred Python-2.2.tgz. This 
shows the same behaviour.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-29 01:23

Message:
Logged In: YES 
user_id=21627

Atleast I cannot reproduce it, on SuSE 7.3. Can you retry
this, building from a clean source tree (no .o files, no
build directory)?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2001-12-28 14:30

Message:
Logged In: YES 
user_id=6380

My prediction: this is irreproducible. Lowering the priority
accordingly.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2001-12-23 04:24

Message:
Logged In: YES 
user_id=89016

Unfortunately no core file was generated. Can I somehow 
force core file generation?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-12-22 06:42

Message:
Logged In: YES 
user_id=21627

Did that produce a core file? If so, can you attach a gdb
backtrace as well?

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=495401&group_id=5470