Python 3.4, marshal dumps slower (version 3 protocol)
Hi, I tested the latest beta from 3.4 (b3) and noticed there is a new marshal protocol version 3. The documentation is a little silent about the new features, not going into detail. I've run a performance test with the new protocol version and noticed the new version is two times slower in serialization than version 2. I tested it with a simple value tuple in a list (500000 elements). Nothing special. (happens only if the tuple contains also a tuple) Copy of the test code: from time import time from marshal import dumps def genData(amount=500000): for i in range(amount): yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, True) data = list(genData()) print(len(data)) t0 = time() result = dumps(data, 2) t1 = time() print("duration p2: %f" % (t1-t0)) t0 = time() result = dumps(data, 3) t1 = time() print("duration p3: %f" % (t1-t0)) Is the overhead for the recursion detection so high ? Note this happens only if there is a tuple in the tuple of the datalist. Regards, Wolfgang
Hi, I'm surprised: marshal.dumps() doesn't raise an error if you pass an invalid version. In fact, Python 3.3 only supports versions 0, 1 and 2. If you pass 3, it will use the version 2. (Same apply for version 99.) Python 3.4 has two new versions: 3 and 4. The version 3 "shares common object references", the version 4 adds short tuples and short strings (produce smaller files). It would be nice to document the differences between marshal versions. And what do you think of raising an error if the version is unknown in marshal.dumps()? I modified your benchmark to test also loads() and run the benchmark 10 times. Results: --- Python 3.3.3+ (3.3:50aa9e3ab9a4, Jan 27 2014, 16:11:26) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux dumps v0: 391.9 ms data size v0: 45582.9 kB loads v0: 616.2 ms dumps v1: 384.3 ms data size v1: 45582.9 kB loads v1: 594.0 ms dumps v2: 153.1 ms data size v2: 41395.4 kB loads v2: 549.6 ms dumps v3: 152.1 ms data size v3: 41395.4 kB loads v3: 535.9 ms dumps v4: 152.3 ms data size v4: 41395.4 kB loads v4: 549.7 ms --- And: --- Python 3.4.0b3+ (default:dbad4564cd12, Jan 27 2014, 16:09:40) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux dumps v0: 389.4 ms data size v0: 45582.9 kB loads v0: 564.8 ms dumps v1: 390.2 ms data size v1: 45582.9 kB loads v1: 545.6 ms dumps v2: 165.5 ms data size v2: 41395.4 kB loads v2: 470.9 ms dumps v3: 425.6 ms data size v3: 41395.4 kB loads v3: 528.2 ms dumps v4: 369.2 ms data size v4: 37000.9 kB loads v4: 550.2 ms --- Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with Python 3.4 produces the smallest file. Victor 2014-01-27 Wolfgang <tds333@gmail.com>:
Hi,
I tested the latest beta from 3.4 (b3) and noticed there is a new marshal protocol version 3. The documentation is a little silent about the new features, not going into detail.
I've run a performance test with the new protocol version and noticed the new version is two times slower in serialization than version 2. I tested it with a simple value tuple in a list (500000 elements). Nothing special. (happens only if the tuple contains also a tuple)
Copy of the test code:
from time import time from marshal import dumps
def genData(amount=500000): for i in range(amount): yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, True)
data = list(genData()) print(len(data)) t0 = time() result = dumps(data, 2) t1 = time() print("duration p2: %f" % (t1-t0)) t0 = time() result = dumps(data, 3) t1 = time() print("duration p3: %f" % (t1-t0))
Is the overhead for the recursion detection so high ?
Note this happens only if there is a tuple in the tuple of the datalist.
Regards,
Wolfgang
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.co...
On 27 January 2014 15:35, Victor Stinner <victor.stinner@gmail.com> wrote:
Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with Python 3.4 produces the smallest file.
Which version is used when creating pyc files? This benchmark might suggest that version 2 is the best... Paul
On Mon, Jan 27, 2014 at 10:42 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 27 January 2014 15:35, Victor Stinner <victor.stinner@gmail.com> wrote:
Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with Python 3.4 produces the smallest file.
Which version is used when creating pyc files? This benchmark might suggest that version 2 is the best...
Importlib just uses the default: http://hg.python.org/cpython/file/dbad4564cd12/Lib/importlib/_bootstrap.py#l...
Thanks Victor for improving this. I also have to note, version 3 is only in the case of tuple in tuple slower. If you use a flat tuple it is faster than version 2. So I asked for this corner case and thought the recursion detection or something else has a huge cost. For pyc files, I think the highest available version is the used default. I didn't know version 4, nowhere mentioned in the docs. Also figured out, that every integer is accepted as protocol version. But was usable for tests against 3.3 and 2.7. :-) On Mon, Jan 27, 2014 at 5:02 PM, Brett Cannon <brett@python.org> wrote:
On Mon, Jan 27, 2014 at 10:42 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 27 January 2014 15:35, Victor Stinner <victor.stinner@gmail.com> wrote:
Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with Python 3.4 produces the smallest file.
Which version is used when creating pyc files? This benchmark might suggest that version 2 is the best...
Importlib just uses the default: http://hg.python.org/cpython/file/dbad4564cd12/Lib/importlib/_bootstrap.py#l...
-- bye by Wolfgang
27.01.14 17:35, Victor Stinner написав(ла):
Python 3.4 has two new versions: 3 and 4. The version 3 "shares common object references", the version 4 adds short tuples and short strings (produce smaller files).
Why we need two new versions added in one Python release?
Hi there. I think you should modify your program to marshal (and load) a compiled module. This is where the optimizations in versions 3 and 4 become important. K
-----Original Message----- From: Python-Dev [mailto:python-dev- bounces+kristjan=ccpgames.com@python.org] On Behalf Of Victor Stinner Sent: Monday, January 27, 2014 23:35 To: Wolfgang Cc: Python-Dev Subject: Re: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol)
Hi,
I'm surprised: marshal.dumps() doesn't raise an error if you pass an invalid version. In fact, Python 3.3 only supports versions 0, 1 and 2. If you pass 3, it will use the version 2. (Same apply for version 99.)
Python 3.4 has two new versions: 3 and 4. The version 3 "shares common object references", the version 4 adds short tuples and short strings (produce smaller files).
It would be nice to document the differences between marshal versions.
And what do you think of raising an error if the version is unknown in marshal.dumps()?
I modified your benchmark to test also loads() and run the benchmark 10 times. Results: --- Python 3.3.3+ (3.3:50aa9e3ab9a4, Jan 27 2014, 16:11:26) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux
dumps v0: 391.9 ms data size v0: 45582.9 kB loads v0: 616.2 ms
dumps v1: 384.3 ms data size v1: 45582.9 kB loads v1: 594.0 ms
dumps v2: 153.1 ms data size v2: 41395.4 kB loads v2: 549.6 ms
dumps v3: 152.1 ms data size v3: 41395.4 kB loads v3: 535.9 ms
dumps v4: 152.3 ms data size v4: 41395.4 kB loads v4: 549.7 ms ---
And: --- Python 3.4.0b3+ (default:dbad4564cd12, Jan 27 2014, 16:09:40) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux
dumps v0: 389.4 ms data size v0: 45582.9 kB loads v0: 564.8 ms
dumps v1: 390.2 ms data size v1: 45582.9 kB loads v1: 545.6 ms
dumps v2: 165.5 ms data size v2: 41395.4 kB loads v2: 470.9 ms
dumps v3: 425.6 ms data size v3: 41395.4 kB loads v3: 528.2 ms
dumps v4: 369.2 ms data size v4: 37000.9 kB loads v4: 550.2 ms ---
Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with Python 3.4 produces the smallest file.
Victor
2014-01-27 Wolfgang <tds333@gmail.com>:
Hi,
I tested the latest beta from 3.4 (b3) and noticed there is a new marshal protocol version 3. The documentation is a little silent about the new features, not going into detail.
I've run a performance test with the new protocol version and noticed the new version is two times slower in serialization than version 2. I tested it with a simple value tuple in a list (500000 elements). Nothing special. (happens only if the tuple contains also a tuple)
Copy of the test code:
from time import time from marshal import dumps
def genData(amount=500000): for i in range(amount): yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, True)
data = list(genData()) print(len(data)) t0 = time() result = dumps(data, 2) t1 = time() print("duration p2: %f" % (t1-t0)) t0 = time() result = dumps(data, 3) t1 = time() print("duration p3: %f" % (t1-t0))
Is the overhead for the recursion detection so high ?
Note this happens only if there is a tuple in the tuple of the datalist.
Regards,
Wolfgang
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python- dev/victor.stinner%40gm ail.com
Hi, yes I know the main usage is to generate pyc files. But marshal is also used for other stuff and is the fastest built in serialization method. For some use cases it makes sense to use it instead of pickle or others. And people use it not only to generate pyc files. I only found one case with a performance regression in the newer protocol versions for 3.4. We should take care of it and improve it. Now it is possible to handle this in a beta phase and fix it for the upcoming release. Or even document all this. I think it is also useful for others to know about the new versions and their usage and the behavior. I also noticed the new versions can be faster in some use cases. I like the work done for this and think it was also useful to reduce the size of the resulting serialization. I 'm not against it nor want to criticize it. I only want to improve all this further. Regards, Wolfgang On 28.01.2014 06:14, Kristján Valur Jónsson wrote:
Hi there. I think you should modify your program to marshal (and load) a compiled module. This is where the optimizations in versions 3 and 4 become important. K
-----Original Message----- From: Python-Dev [mailto:python-dev- bounces+kristjan=ccpgames.com@python.org] On Behalf Of Victor Stinner Sent: Monday, January 27, 2014 23:35 To: Wolfgang Cc: Python-Dev Subject: Re: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol)
Hi,
I'm surprised: marshal.dumps() doesn't raise an error if you pass an invalid version. In fact, Python 3.3 only supports versions 0, 1 and 2. If you pass 3, it will use the version 2. (Same apply for version 99.)
Python 3.4 has two new versions: 3 and 4. The version 3 "shares common object references", the version 4 adds short tuples and short strings (produce smaller files).
It would be nice to document the differences between marshal versions.
And what do you think of raising an error if the version is unknown in marshal.dumps()?
I modified your benchmark to test also loads() and run the benchmark 10 times. Results: --- Python 3.3.3+ (3.3:50aa9e3ab9a4, Jan 27 2014, 16:11:26) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux
dumps v0: 391.9 ms data size v0: 45582.9 kB loads v0: 616.2 ms
dumps v1: 384.3 ms data size v1: 45582.9 kB loads v1: 594.0 ms
dumps v2: 153.1 ms data size v2: 41395.4 kB loads v2: 549.6 ms
dumps v3: 152.1 ms data size v3: 41395.4 kB loads v3: 535.9 ms
dumps v4: 152.3 ms data size v4: 41395.4 kB loads v4: 549.7 ms ---
And: --- Python 3.4.0b3+ (default:dbad4564cd12, Jan 27 2014, 16:09:40) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux
dumps v0: 389.4 ms data size v0: 45582.9 kB loads v0: 564.8 ms
dumps v1: 390.2 ms data size v1: 45582.9 kB loads v1: 545.6 ms
dumps v2: 165.5 ms data size v2: 41395.4 kB loads v2: 470.9 ms
dumps v3: 425.6 ms data size v3: 41395.4 kB loads v3: 528.2 ms
dumps v4: 369.2 ms data size v4: 37000.9 kB loads v4: 550.2 ms ---
Version 2 is the fastest in Python 3.3 and 3.4, but version 4 with Python 3.4 produces the smallest file.
Victor
2014-01-27 Wolfgang <tds333@gmail.com>:
Hi,
I tested the latest beta from 3.4 (b3) and noticed there is a new marshal protocol version 3. The documentation is a little silent about the new features, not going into detail.
I've run a performance test with the new protocol version and noticed the new version is two times slower in serialization than version 2. I tested it with a simple value tuple in a list (500000 elements). Nothing special. (happens only if the tuple contains also a tuple)
Copy of the test code:
from time import time from marshal import dumps
def genData(amount=500000): for i in range(amount): yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, True)
data = list(genData()) print(len(data)) t0 = time() result = dumps(data, 2) t1 = time() print("duration p2: %f" % (t1-t0)) t0 = time() result = dumps(data, 3) t1 = time() print("duration p3: %f" % (t1-t0))
Is the overhead for the recursion detection so high ?
Note this happens only if there is a tuple in the tuple of the datalist.
Regards,
Wolfgang
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python- dev/victor.stinner%40gm ail.com
On Jan 28, 2014, at 09:17 AM, tds333@gmail.com wrote:
yes I know the main usage is to generate pyc files. But marshal is also used for other stuff and is the fastest built in serialization method. For some use cases it makes sense to use it instead of pickle or others. And people use it not only to generate pyc files.
marshall is not guaranteed to be backward compatible between Python versions, so it's generally not a good idea to use it for serialization. -Barry
On 28.01.2014 10:23, Barry Warsaw wrote:
On Jan 28, 2014, at 09:17 AM, tds333@gmail.com wrote:
yes I know the main usage is to generate pyc files. But marshal is also used for other stuff and is the fastest built in serialization method. For some use cases it makes sense to use it instead of pickle or others. And people use it not only to generate pyc files. marshall is not guaranteed to be backward compatible between Python versions, so it's generally not a good idea to use it for serialization.
Yes I know. And because of that I use it only if nothing persists and the exchange is between the same Python version (even the same architecture and Interpreter type). But there are use cases for inter process communication with no persistence and no need to serialize custom classes and so on. And if speed matters and security is not the problem you use the marshal module to serialize data. Assume something like multiprocessing for Windows (no fork available) and only a pipe to exchange a lot of simple data and pickle is to slow. (Sometimes distributed to other computers.) Another use case can be a persistent cache with ultra fast serialization (dump/load) needs but not with critical data normally stored in a database. Can be regenerated easily if Python version changes from main data. (think pyc files are such a use case) I have tested a lot of modules for some needs (JSON, Thrift, MessagePack, Pickle, ProtoBuffers, ...) all are very useful and has their usage scenario. The same applies to marshal if all the limitations are no problem for you. (I've read the manual and have some knowledge about the limitations) But all these serialization modules are not as fast as marshal. (for my use case) I hear you and registered the warning about this. And will not complain if something will be incompatible. :-) If someone knows something faster to serialize basic Python types. I'm glad to use it. Regards, Wolfgang
How often I hear this argument :) For many people, serialized data is not persisted. But used e.g. for sending information over the wire, or between processes. Marshal is very good for that. Additionally, it doesn't have any side effects since it just stores primitive types and is thus "safe". EVE Online uses its own extended version of the marshal system, and has for years, because it is fast and it can be tuned to an application domain by adding custom opcodes.
-----Original Message----- From: Python-Dev [mailto:python-dev- bounces+kristjan=ccpgames.com@python.org] On Behalf Of Barry Warsaw Sent: Tuesday, January 28, 2014 17:23 To: python-dev@python.org Subject: Re: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol)
marshall is not guaranteed to be backward compatible between Python versions, so it's generally not a good idea to use it for serialization.
On 1/28/2014 10:02 PM, Kristján Valur Jónsson wrote:
marshall is not guaranteed to be backward compatible between Python versions, so it's generally not a good idea to use it for serialization.
How often I hear this argument :) For many people, serialized data is not persisted. But used e.g. for sending information over the wire, or between processes. Marshal is very good for that. Additionally, it doesn't have any side effects since it just stores primitive types and is thus "safe". EVE Online uses its own extended version of the marshal system, and has for years, because it is fast and it can be tuned to an application domain by adding custom opcodes.
I think the proper message is this: "Marshal is designed for caching compiled message objects and has the function needed for that goal. When the need changes, marshal changes (with a change in magic number). Other uses should take into account the limitations of function and stability." It appears you did just that by making a custom version with the function and stability you need. -- Terry Jan Reedy
I've debugged this a little bit. I couldn't originally see where the problem is, since I expected that the code dealing with shared references shouldn't ever trigger - none of the tuples in the example are actually shared (i.e. they all have a ref-count of 1, except for the outer list, which is both a parameter and bound in a variable). Debugging reveals that it is actually the many integer objects which trigger the sharing code. So a much simplified example of Victor's benchmarking code can use data = [0]*10000000 The difference between version 2 and version 3 here is that v2 marshals a lot of "0" integers, whereas version 3 marshals a single one, and then a lot of references to this integer. Since "0" is a small integer, and thus a singleton anyway, this doesn't affect the unmarshal result. If the integers were larger, and actually shared, the umarshal result under v2 would be "more correct". If the integers are not shared, v2 and v3 have about the same runtime, e.g. seen when using data = [1000*1000 for i in range(10000000)] Regards, Martin
2014-01-28 "Martin v. Löwis" <martin@v.loewis.de>:
Debugging reveals that it is actually the many integer objects which trigger the sharing code. So a much simplified example of Victor's benchmarking code can use
data = [0]*10000000
The difference between version 2 and version 3 here is that v2 marshals a lot of "0" integers, whereas version 3 marshals a single one, and then a lot of references to this integer.
Since the output size looks to be the same, it may be interesting to special-case small integers, or even integers and floats in general. Handling references to these numbers takes probably more CPU, whereas the gain on the file size is probably minor. I wrote a short patch: http://bugs.python.org/issue20416 "dumps v3 is 60% faster, loads v3 is also 14% *faster*." "dumps v4 is 66% faster, loads v4 is 16% faster." "file size (on version 3 and 4) is unchanged with my patch." "So with the patch, the Python 3.4 default version (4) is *faster* (dump 20% faster, load 16% faster) and produces *smaller files* (10% smaller)." It looks like a win-win patch :-) The drawback is that files storing many duplicated huge numbers will not be smaller with marshal version >= 3. Victor
On Tue, 28 Jan 2014 11:22:40 +0100 Victor Stinner <victor.stinner@gmail.com> wrote:
2014-01-28 "Martin v. Löwis" <martin@v.loewis.de>:
Debugging reveals that it is actually the many integer objects which trigger the sharing code. So a much simplified example of Victor's benchmarking code can use
data = [0]*10000000
The difference between version 2 and version 3 here is that v2 marshals a lot of "0" integers, whereas version 3 marshals a single one, and then a lot of references to this integer.
Since the output size looks to be the same, it may be interesting to special-case small integers, or even integers and floats in general. Handling references to these numbers takes probably more CPU, whereas the gain on the file size is probably minor.
Please remember file size is only one factor. Another factor is runtime size after unmarshalling. For the typical case of pyc files, dump times are not very important. Load times are. Regards Antoine.
“Note this happens only if there is a tuple in the tuple of the datalist.” This is rather odd. Protocol 3 adds support for object instancing. Non-trivial Objects are looked up in the memo dictionary if they have a reference count larger than 1. I suspect that the internal tuple has this property, for some reason. However, my little test in 2.7 does not bear out this hypothesis: def genData(amount=500000): for i in range(amount): yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, True) l = list(genData()) import sys print sys.getrefcount(l[1000]) print sys.getrefcount(l[1000][0]) print sys.getrefcount(l[1000][3]) C:\Program Files\Perforce>python d:\pyscript\data.py 2 3 2 K From: Python-Dev [mailto:python-dev-bounces+kristjan=ccpgames.com@python.org] On Behalf Of Wolfgang Sent: Monday, January 27, 2014 22:41 To: Python-Dev Subject: [Python-Dev] Python 3.4, marshal dumps slower (version 3 protocol) Hi, I tested the latest beta from 3.4 (b3) and noticed there is a new marshal protocol version 3. The documentation is a little silent about the new features, not going into detail. I've run a performance test with the new protocol version and noticed the new version is two times slower in serialization than version 2. I tested it with a simple value tuple in a list (500000 elements). Nothing special. (happens only if the tuple contains also a tuple) Copy of the test code: from time import time from marshal import dumps def genData(amount=500000): for i in range(amount): yield (i, i+2, i*2, (i+1,i+4,i,4), "my string template %s" % i, 1.01*i, True) data = list(genData()) print(len(data)) t0 = time() result = dumps(data, 2) t1 = time() print("duration p2: %f" % (t1-t0)) t0 = time() result = dumps(data, 3) t1 = time() print("duration p3: %f" % (t1-t0)) Is the overhead for the recursion detection so high ? Note this happens only if there is a tuple in the tuple of the datalist. Regards, Wolfgang
participants (11)
-
"Martin v. Löwis" -
Antoine Pitrou -
Barry Warsaw -
Brett Cannon -
Kristján Valur Jónsson -
Paul Moore -
Serhiy Storchaka -
tds333@gmail.com -
Terry Reedy -
Victor Stinner -
Wolfgang