[New-bugs-announce] [issue12778] JSON-serializing a large container takes too much memory

Thu Aug 18 17:23:16 CEST 2011

New submission from Antoine Pitrou <pitrou at free.fr>:

On a 8GB RAM box (more than 6GB free), serializing many small objects can eat all memory, while the end result would take around 600MB on an UCS2 build:

$ LANG=C time opt/python -c "import json; l = [1] * (100*1024*1024); encoded = json.dumps(l)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/antoine/cpython/opt/Lib/json/__init__.py", line 224, in dumps
    return _default_encoder.encode(obj)
  File "/home/antoine/cpython/opt/Lib/json/encoder.py", line 188, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/home/antoine/cpython/opt/Lib/json/encoder.py", line 246, in iterencode
    return _iterencode(o, 0)
MemoryError
Command exited with non-zero status 1
11.25user 2.43system 0:13.72elapsed 99%CPU (0avgtext+0avgdata 27820320maxresident)k
2920inputs+0outputs (12major+1261388minor)pagefaults 0swaps

I suppose the encoder internally builds a large list of very small unicode objects, and only joins them at the end. Probably we could join it by chunks so as to avoid this behaviour.

----------
messages: 142338
nosy: ezio.melotti, pitrou, rhettinger
priority: normal
severity: normal
status: open
title: JSON-serializing a large container takes too much memory
type: resource usage
versions: Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12778>
_______________________________________