[Python-bugs-list] [ python-Bugs-680789 ] repr() of large array objects takes quadratic time

Wed, 23 Apr 2003 10:33:01 -0700

Bugs item #680789, was opened at 2003-02-05 06:00
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=680789&group_id=5470

Category: Python Library
Group: Python 2.2
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Jurjen N.E. Bos (jneb)
>Assigned to: Raymond Hettinger (rhettinger)
Summary: repr() of large array objects takes quadratic time

Initial Comment:
This is a bug and a partial patch.

If I debug a program that contains a ridiculously large array (8M entries in my case), the debugger takes forever.
It happens in Mac OS X, Python 2.2, but I found the bug in is the repr module, so it is probably universal.
The thing is, that after the fix below, it still doesn't work!
Did I miss something trivial (like repr is builtin, or something like that?). Would someone with Mac OS X experience help out here, please (Jack?).

Here's the diff to make repr.repr work with large arrays:
13a14
>       self.maxarray = 5
50a52,62
>     def repr_array(self, x, level):
>         n = len(x)
>       header = "array('"+x.typecode+"', ["
>         if n == 0: return header+"])"
>         if level <= 0: return header+"...])"
>         s = ''
>         for i in range(min(n, self.maxarray)):
>             if s: s = s + ', '
>             s = s + self.repr1(x[i], level-1)
>         if n > self.maxarray: s = s + ', ...'
>         return header + s + "])"

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2003-04-23 12:33

Message:
Logged In: YES 
user_id=80475

Fixed this up by converting the array to a list and then 
using the list object's efficient repr().

See Modules/arraymodule.c 2.87.

Since I categorize this as a performance issue and not a 
bug, I've applied the fix to Py2.3 but am not 
recommending for backport.

----------------------------------------------------------------------

Comment By: logistix (logistix)
Date: 2003-02-11 20:43

Message:
Logged In: YES 
user_id=699438

arraymodule's repr used "string += ',' + el" for each element in 
the array.  Lists and dicts did a string.join to build the repr.

Attached patch builds a tuple of array elements and then 
joins them.  (actually for some reason I can't attach now, I'll 
post the patch in patch manager)

This fixes the time issue, but I don't know enough about how 
you guys manage memory in each case to tell what impact 
that'll have on really, really big arrays (although I imagine it 
takes more memory).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-02-06 20:40

Message:
Logged In: YES 
user_id=31435

I can't make time for this, so unassigned it.  It would make 
a good, brief project for someone -- the list and dict 
tp_reprs are linear-time, and tp_repr for array.array 
objects shouldn't be any harder than they were.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2003-02-06 16:37

Message:
Logged In: YES 
user_id=45365

Okay, so the real bug is that tp_repr of array objects takes quadratic time.

I'm changing the summary of this report then, and assigning back to you (Tim), on the basis that you did more checkins on arraymodule than I did. Feel free to pass the potato on:-)

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-02-05 18:08

Message:
Logged In: YES 
user_id=31435

pdb does import repr.py, but probably doesn't use it in 
whatever way Jurjen is using to display his big array.

WRT that, note that Jurjen is using array.array objects, not 
lists.  The internal array.array tp_repr slot is quadratic-time in 
the size of the array, while list's tp_repr is linear time.

----------------------------------------------------------------------

Comment By: Jack Jansen (jackjansen)
Date: 2003-02-05 17:40

Message:
Logged In: YES 
user_id=45365

The fix is fine (it works for me the same way as for Tim), but I think we're shooting past the problem here.

First, pdb doesn't use repr.repr(), it uses the normal builtin repr().

Second, I don't see any sluggishness in pdb with large arrays. I tried
debugging
def foo():
    a = range(8000000)
and there was no problem. Allocating the object took a bit of time, yes, and if you actually try to print it you'll stare at about 800K lines filled with digits scrolling over your screen, but that is to be expected.

Could it be your sluggishness is coming from something else? For example, MacOSX starts behaving *very* badly if your root disk is full, because then it can't allocate swap space, and due to its optimistic behaviour it comes to a grinding halt.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-02-05 13:36

Message:
Logged In: YES 
user_id=31435

Nice to see you, Jurgen!  I checked this into current CVS, 
and it works fine for me in isolation:

>>> len(a)
11055060
>>> repr.repr(a)
"array('i', [0, 1, 2, 3, 4, ...])"
>>>

That goes in an eyeblink.  So more detail is needed about 
what "it still doesn't work!" means.  Assigned to Jack, and he 
can use current CVS to try it.

Lib/repr.py; new revision: 1.15
Lib/test/test_repr.py; new revision: 1.16
Misc/NEWS; new revision: 1.642

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=680789&group_id=5470