“Note this happens only if there is a tuple in the tuple of the datalist.”
This is rather odd.
Protocol 3 adds support for object instancing. Non-trivial Objects are looked up in the memo dictionary if they have a reference count larger than 1.
I suspect that the internal tuple has this property, for some reason.
However, my little test in 2.7 does not bear out this hypothesis:
def genData(amount=500000):
for i in range(amount):
yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, True)
l = list(genData())
import sys
print sys.getrefcount(l[1000])
print sys.getrefcount(l[1000][0])
print sys.getrefcount(l[1000][3])
C:\Program Files\Perforce>python d:\pyscript\data.py
2
3
2
K
From: Python-Dev [mailto:python-dev-bounces+kristjan=ccpgames.com@python.org]
On Behalf Of Wolfgang
Sent: Monday, January 27, 2014 22:41
To: Python-Dev
Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol)
Hi,
I tested the latest beta from 3.4 (b3) and noticed there is a new marshal protocol version 3.
The documentation is a little silent about the new features, not going into detail.
I've run a performance test with the new protocol version and noticed the new version is two times slower in serialization than version 2. I tested it with a simple value tuple in a list (500000 elements).
Nothing special. (happens only if the tuple contains also a tuple)
Copy of the test code:
from time import time
from marshal import dumps
def genData(amount=500000):
for i in range(amount):
yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, True)
data = list(genData())
print(len(data))
t0 = time()
result = dumps(data, 2)
t1 = time()
print("duration p2: %f" % (t1-t0))
t0 = time()
result = dumps(data, 3)
t1 = time()
print("duration p3: %f" % (t1-t0))
Is the overhead for the recursion detection so high ?
Note this happens only if there is a tuple in the tuple of the datalist.
Regards,
Wolfgang