“Note this happens only if there is a tuple in the tuple of the datalist.”

This is rather odd.

Protocol 3 adds support for object instancing.  Non-trivial Objects are looked up in the memo dictionary if they have a reference count larger than 1.

I suspect that the internal tuple has this property, for some reason.

However, my little test in 2.7 does not bear out this hypothesis:

 

 

def genData(amount=500000):

  for i in range(amount):

    yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, True)

 

l = list(genData())

import sys

print sys.getrefcount(l[1000])

print sys.getrefcount(l[1000][0])

print sys.getrefcount(l[1000][3])

 

C:\Program Files\Perforce>python d:\pyscript\data.py

2

3

2

 

K

 

From: Python-Dev [mailto:python-dev-bounces+kristjan=ccpgames.com@python.org] On Behalf Of Wolfgang
Sent: Monday, January 27, 2014 22:41
To: Python-Dev
Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol)

 

Hi,

I tested the latest beta from 3.4 (b3) and noticed there is a new marshal protocol version 3.

The documentation is a little silent about the new features, not going into detail.

I've run a performance test with the new protocol version and noticed the new version is two times slower in serialization than version 2. I tested it with a simple value tuple in a list (500000 elements).

Nothing special. (happens only if the tuple contains also a tuple)

Copy of the test code:


from time import time
from marshal import dumps

def genData(amount=500000):
  for i in range(amount):
    yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, True)

data = list(genData())
print(len(data))
t0 = time()
result = dumps(data, 2)
t1 = time()
print("duration p2: %f" % (t1-t0))
t0 = time()
result = dumps(data, 3)
t1 = time()
print("duration p3: %f" % (t1-t0))



Is the overhead for the recursion detection so high ?


Note this happens only if there is a tuple in the tuple of the datalist.

 

 

Regards,

Wolfgang