[Patches] [ python-Patches-465298 ] Fix for xmlrpclib's recursive dump
noreply@sourceforge.net
noreply@sourceforge.net
Mon, 01 Oct 2001 08:24:07 -0700
Patches item #465298, was opened at 2001-09-26 10:09
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=465298&group_id=5470
Category: Library (Lib)
Group: None
Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Mihai Ibanescu (misa)
>Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for xmlrpclib's recursive dump
Initial Comment:
A perfectly valid case:
p = []
a = [p, p]
print xmlrpclib.dumps((a,))
xmlrpclib is fooled by having the same container object
twice, even if it's not a recursive data structure.
----------------------------------------------------------------------
>Comment By: Mihai Ibanescu (misa)
Date: 2001-10-01 08:24
Message:
Logged In: YES
user_id=205865
Just to follow up your tim.py:
On a Linux box, it's about 2.2 times slower for the lists
implementation.
While I have not tested this, it turns out my guts were
right at some point: if you run it for 5-item collections,
lists are faster.
I still think in this particular case lists should be used,
since the depth will in seldom cases exceed 5 (think about
it, how often do you have a list inside a list inside a
dictionary inside a dictionary inside a list? :-)
Anyway, I'm really happy we have a fix for it. Thanks.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2001-09-30 13:17
Message:
Logged In: YES
user_id=21627
Thanks for the patch and the discussion. The modified patch
has been committed as xmlrpclib 1.8.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2001-09-30 10:28
Message:
Logged In: YES
user_id=21627
Please have a look at the attached tim.py; it shows that for
a 20-element set, finding an element that is not in the set
is significantly slower when lists are used compared to
sets. Using 2.2a4, on a Solaris machine, lists turn out to
be three times slower than dictionaries.
I'll revise this patch to use dictionaries.
----------------------------------------------------------------------
Comment By: Mihai Ibanescu (misa)
Date: 2001-09-27 12:08
Message:
Logged In: YES
user_id=205865
I think you're right, I only have to pop the last element.
See the last patch. I was too lazy to count. That's why a
second opinion is always welcome.
Now, about the list vs dictionary performance: it is
true for large collections that a dictionary performs better
than a list. Now, look at how this list is used. It's the
depth. How deep do you think the data structure would be?
10? 20? I doubt that lists vs. dictionaries for < 100 makes
any difference whatsoever.
Yes, the complexity works against the patch, but as you know
complexity makes sense asimptotically. For small values (and
I really doubt the depth exceeds 5 in the first place),
lists should work just as well.
Anyway, check the last patch and then it's your call. You
got the idea, you know what I wanted from the patch and you
can easily change the list back to a dictionary if you think
performance would be affected. I personally doubt that.
Good work, you helped me a lot to clean the patch.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2001-09-27 11:48
Message:
Logged In: YES
user_id=21627
The patch looks good, except that it still has a
performance bottleneck: The recursion test now has linear
complexity (i in self.memo), whereas using a dict would
give you nearly-constant complexity (self.memo.has_key(i)).
I see that you use the list's "ordered" property for
popping an arbitrary amount of elements. However, I cannot
see why this is necessary: Isn't it an invariant that you
always pop exactly one element, and that this element is
always "self"?
----------------------------------------------------------------------
Comment By: Mihai Ibanescu (misa)
Date: 2001-09-27 11:21
Message:
Logged In: YES
user_id=205865
Okay, the patch was broken. I was working on the wrong tree
and generated the patch against the wrong version. The
second patch fixes this but it still uses the sloppy
list-creation stuff.
Please have a look at the third patch. Now I am really using
a stack and performance-wise it should be much better. It
also makes the change more local.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2001-09-26 13:46
Message:
Logged In: YES
user_id=21627
I have a number of comments on this patch:
- please reread your patch carefully. E.g. why did you
change 'use the "dumps" method' into 'use "dumps" method'?
- It appears that all dump methods now require the
additional memo argument. For this implementation
strategy, it appears that argument should not be optional.
- Please analyse the performance properties of this
implementation strategy. It appears that you create a new
list for each container. Instead, it seems better to use a
dictionary as a set, adding the ids every time you descend
into the container, and removing it when you are done.
That would be a much smaller change also.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=465298&group_id=5470