[Patches] [ python-Patches-465298 ] Fix for xmlrpclib's recursive dump

Mon, 01 Oct 2001 08:24:07 -0700

Patches item #465298, was opened at 2001-09-26 10:09
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=465298&group_id=5470

Category: Library (Lib)
Group: None
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Mihai Ibanescu (misa)
>Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for xmlrpclib's recursive dump

Initial Comment:
A perfectly valid case:

p = []
a = [p, p]

print xmlrpclib.dumps((a,))

xmlrpclib is fooled by having the same container object
twice, even if it's not a recursive data structure.

----------------------------------------------------------------------

>Comment By: Mihai Ibanescu (misa)
Date: 2001-10-01 08:24

Message:
Logged In: YES 
user_id=205865

Just to follow up your tim.py:
On a Linux box, it's about 2.2 times slower for the lists
implementation.
While I have not tested this, it turns out my guts were
right at some point: if you run it for 5-item collections,
lists are faster.
I still think in this particular case lists should be used,
since the depth will in seldom cases exceed 5 (think about
it, how often do you have a list inside a list inside a
dictionary inside a dictionary inside a list? :-)
Anyway, I'm really happy we have a fix for it. Thanks.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-30 13:17

Message:
Logged In: YES 
user_id=21627

Thanks for the patch and the discussion. The modified patch
has been committed as xmlrpclib 1.8.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-30 10:28

Message:
Logged In: YES 
user_id=21627

Please have a look at the attached tim.py; it shows that for
a 20-element set, finding an element that is not in the set
is significantly slower when lists are used compared to
sets. Using 2.2a4, on a Solaris machine, lists turn out to
be three times slower than dictionaries.

I'll revise this patch to use dictionaries.

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2001-09-27 12:08

Message:
Logged In: YES 
user_id=205865

I think you're right, I only have to pop the last element.
See the last patch. I was too lazy to count. That's why a
second opinion is always welcome.

Now, about the list vs dictionary performance: it is 
true for large collections that a dictionary performs better
than a list. Now, look at how this list is used. It's the
depth. How deep do you think the data structure would be?
10? 20? I doubt that lists vs. dictionaries for < 100 makes
any difference whatsoever.
Yes, the complexity works against the patch, but as you know
complexity makes sense asimptotically. For small values (and
I really doubt the depth exceeds 5 in the first place),
lists should work just as well.

Anyway, check the last patch and then it's your call. You
got the idea, you know what I wanted from the patch and you
can easily change the list back to a dictionary if you think
performance would be affected. I personally doubt that.
Good work, you helped me a lot to clean the patch.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-27 11:48

Message:
Logged In: YES 
user_id=21627

The patch looks good, except that it still has a 
performance bottleneck: The recursion test now has linear 
complexity (i in self.memo), whereas using a dict would 
give you nearly-constant complexity (self.memo.has_key(i)).

I see that you use the list's "ordered" property for 
popping an arbitrary amount of elements. However, I cannot 
see why this is necessary: Isn't it an invariant that you 
always pop exactly one element, and that this element is 
always "self"?

----------------------------------------------------------------------

Comment By: Mihai Ibanescu (misa)
Date: 2001-09-27 11:21

Message:
Logged In: YES 
user_id=205865

Okay, the patch was broken. I was working on the wrong tree
and generated the patch against the wrong version. The
second patch fixes this but it still uses the sloppy
list-creation stuff.

Please have a look at the third patch. Now I am really using
a stack and performance-wise it should be much better. It
also makes the change more local.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2001-09-26 13:46

Message:
Logged In: YES 
user_id=21627

I have a number of comments on this patch:
- please reread your patch carefully. E.g. why did you 
change 'use the "dumps" method' into 'use "dumps" method'?
- It appears that all dump methods now require the 
additional memo argument. For this implementation 
strategy, it appears that argument should not be optional.
- Please analyse the performance properties of this 
implementation strategy. It appears that you create a new 
list for each container. Instead, it seems better to use a 
dictionary as a set, adding the ids every time you descend 
into the container, and removing it when you are done. 
That would be a much smaller change also.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=465298&group_id=5470