[Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol)

"Martin v. Löwis" martin at v.loewis.de
Tue Jan 28 10:00:29 CET 2014


I've debugged this a little bit. I couldn't originally see where the
problem is, since I expected that the code dealing with shared
references shouldn't ever trigger - none of the tuples in the example
are actually shared (i.e. they all have a ref-count of 1, except for
the outer list, which is both a parameter and bound in a variable).

Debugging reveals that it is actually the many integer objects which
trigger the sharing code. So a much simplified example of Victor's
benchmarking code can use

data = [0]*10000000

The difference between version 2 and version 3 here is that v2 marshals
a lot of "0" integers, whereas version 3 marshals a single one, and then
a lot of references to this integer.

Since "0" is a small integer, and thus a singleton anyway, this doesn't
affect the unmarshal result. If the integers were larger, and actually
shared, the umarshal result under v2 would be "more correct".

If the integers are not shared, v2 and v3 have about the same runtime,
e.g. seen when using

data = [1000*1000 for i in range(10000000)]

Regards,
Martin



More information about the Python-Dev mailing list