[ python-Bugs-1526585 ] Concatenation on a long string breaks

SourceForge.net noreply at sourceforge.net
Wed Aug 9 17:40:03 CEST 2006


Bugs item #1526585, was opened at 2006-07-21 17:18
Message generated for change (Settings changed) made by arigo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1526585&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: Python 2.5
>Status: Closed
>Resolution: Fixed
Priority: 8
Submitted By: Jp Calderone (kuran)
Assigned to: Armin Rigo (arigo)
Summary: Concatenation on a long string breaks

Initial Comment:
Consider this transcript:

exarkun at charm:~/Projects/python/trunk$ ./python
Python 2.5b2 (trunk:50698, Jul 18 2006, 10:08:36)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for
more information.
>>> x = 'x' * (2 ** 31 - 1)
>>> x = x + 'x'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: Objects/stringobject.c:4103: bad argument
to internal function
>>> len(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> 


I would expect some exception other than SystemError
and for the locals namespace to not become corrupted.


----------------------------------------------------------------------

>Comment By: Armin Rigo (arigo)
Date: 2006-08-09 15:39

Message:
Logged In: YES 
user_id=4771

Committed in rev 51178.

Closing this report; the repetition problem is in
another tracker and is mentioned in PEP 356
(the 2.5 release schedule).

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2006-08-08 09:24

Message:
Logged In: YES 
user_id=4771

I was away.  I will try to get around to it before
release candidate one.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-08-04 05:27

Message:
Logged In: YES 
user_id=33168

Armin, yes that sounds reasonable.  Please checkin as soon
as possible now that the trunk is not frozen.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2006-07-27 08:48

Message:
Logged In: YES 
user_id=4771

Almost missed kuran's note.  Kuran: I suppose you meant to
use 2**31 instead of 2**32, but you've found another
important bug:

>>> s = 'x' * (2**32-2)
>>> N = len(s)
>>> N
2147483647
>>> 2**32
4294967296L

Argh!  Another check is missing somewhere.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2006-07-27 08:38

Message:
Logged In: YES 
user_id=4771

We could reuse the --memlimit option of regrtest in the
following way:

At the moment it makes no sense to specify a --memlimit
larger than Py_ssize_t, like 3GB on 32-bit systems.  At
least test_bigmem fails completely in this case.  From this
it seems that the --memlimit actually tells, more precisely,
how much of its *address space* the Python test process is
allowed to consume.  So the value should be clamped to a
maximum of MAX(Py_ssize_t).  This would solve the current
test_bigmem issue.

If we do so, then the condition "--memlimit >=
MAX(Py_ssize_t)" is precisely what should be checked to know
if we can run the test for the bug in the present tracker,
and other tests of the same kind, which check what occurs
when the *address space* is exhausted.

In this way, specifying --memlimit=3G would enable either
test_bigmem (on 64-bit systems) or some new
test_filladdressspace (on 32-bit systems), as appropriate.

Sounds reasonable?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-07-26 15:50

Message:
Logged In: YES 
user_id=33168

You're correct that bigmem is primarily for testing
int/Py_ssize_t.  But it doesn't have to be.  It has support
for machines with largish amounts of memory (and limiting
test runs).  I didn't know where else to put such a test.  I
agree that this bug would only occur on 32-bit platforms. 
Most machines can't run it, so about the only other option I
can think of would be to put it in it's own file and add a
-u option.  That seemed like even more work.

I'm not tied to bigmem at all, but it would be nice to have
a test for this somewhere.  I'm sure there are a bunch of
other places we have this sort of overflow and it would be
good to test them somewhere.  Do whatever you think is best.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2006-07-26 09:01

Message:
Logged In: YES 
user_id=4771

I'm unsure about how the bigmem tests should be used.
I think that I am not supposed to set a >2G limit on
a 32-bit machine, right?  When I do, I get 9 failures:
8 OverflowErrors and a stranger AssertionError in
test_hash.  I think that these tests are meant to
test the int/Py_ssize_t difference on 64-bit 
machines instead.  The bug this tracker is about
only shows up on 32-bit machines.

My concern is that if we add a test for the current
bug in test_bigmem, then with a memory limit < 2GB
the test is essentially skipped, and with a memory
limit > 2GB then 9 other tests fail anyway.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-07-26 04:19

Message:
Logged In: YES 
user_id=33168

Armin, can you check in your patch and backport?  Also a
news entry and a test in test_bigmem would be great. 
Thanks.  If not let me know (assign to me) and I'll finish
this up.

This fix should be backported as well.

----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2006-07-25 20:06

Message:
Logged In: YES 
user_id=366566

Tested Armin's patch, looks like it addresses the issue. 
One thing I noticed though, ('x' * (2 ** 32 - 2) + 'x')
fails the same way ('x' * (2 ** 32 - 1) + 'x').  Don't
really understand why, just thought I'd mention it.


----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2006-07-25 18:25

Message:
Logged In: YES 
user_id=4771

The check should be done before the variable is removed
from the namespace, so that 'x' doesn't just disappear.
Attached a patch that does this.  Also, let's produce
an exception consistent with what PyString_Concat()
raises in the same case, which is an OverflowError
instead of a MemoryError.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2006-07-25 03:57

Message:
Logged In: YES 
user_id=31435

[Neal]
>  Tim since we know both lens are >= 0, is this test
> sufficient for verifying overflow:
>
>     Py_ssize_t new_len = v_len + w_len;
>     if (new_len < 0) {

Right!  That's all the new checking needed in
string_concatenate().

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-07-25 03:46

Message:
Logged In: YES 
user_id=33168

Attached a patch against 2.5.  The patch should work against
2.4 too, but you will need to change all Py_ssize_t to int.
 Tim since we know both lens are >= 0, is this test
sufficient for verifying overflow:

      Py_ssize_t new_len = v_len + w_len;
      if (new_len < 0) {

Jp, can you test the patch?

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2006-07-22 14:11

Message:
Logged In: YES 
user_id=31435

Part of the problem appears to be that ceval.c's
string_concatenate() doesn't check for overflow in the
second argument here:

if (_PyString_Resize(&v, v_len + w_len) != 0)

The Windows malloc on my box returns NULL (and so Python
raises MemoryError) on the initial:

x = 'x' * (2 ** 31 - 1)

attempt, so I never get that far.  I'm afraid this is
muddier in strange ways on Linux because the Linux malloc()
is pathologically optimistic (can return a non-NULL value
pointing at "memory" that can't actually be used for anything).

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2006-07-22 09:00

Message:
Logged In: YES 
user_id=6656

Confirmed with 2.4.  Ouch.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1526585&group_id=5470


More information about the Python-bugs-list mailing list