[Baypiggies] json using huge memory footprint and not releasing
Bob Ippolito
bob at redivi.com
Fri Jun 15 23:22:50 CEST 2012
On Fri, Jun 15, 2012 at 4:15 PM, David Lawrence <david at bitcasa.com> wrote:
> When I load the file into json, pythons memory usage spike to about 1.8GB
> and I can't seem to get that memory to be released. I put together a test
> case that's very simple:
>
> with open("test_file.json", 'r') as f:
> j = json.load(f)
>
> I'm sorry that I can't provide a sample json file, my test file has a lot
> of sensitive information, but for context, I'm dealing with a file in the
> order of 240MB. After running the above 2 lines I have the
> previously mentioned 1.8GB of memory in use. If I then do "del j" memory
> usage doesn't drop at all. If I follow that with a "gc.collect()" it still
> doesn't drop. I even tried unloading the json module and running another
> gc.collect.
>
> I'm trying to run some memory profiling but heapy has been churning 100%
> CPU for about an hour now and has yet to produce any output.
>
> Does anyone have any ideas? I've also tried the above using cjson rather
> than the packaged json module. cjson used about 30% less memory but
> otherwise displayed exactly the same issues.
>
> I'm running Python 2.7.2 on Ubuntu server 11.10.
>
> I'm happy to load up any memory profiler and see if it does better then
> heapy and provide any diagnostics you might think are necessary. I'm
> hunting around for a large test json file that I can provide for anyone
> else to give it a go.
>
It may just be the way that the allocator works. What happens if you load
the JSON, del the object, then do it again? Does it take up 3.6GB or stay
at 1.8GB? You may not be able to "release" that memory to the OS in such a
way that RSS gets smaller... but at the same time it's not really a leak
either.
GC shouldn't really take part in a JSON structure, since it's guaranteed to
be acyclic… ref counting alone should be sufficient to instantly reclaim
that space. I'm not at all surprised that gc.collect() doesn't change
anything for CPython in this case.
$ python
Python 2.7.2 (default, Jan 23 2012, 14:26:16)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on
darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, subprocess, simplejson
>>> def rss(): return subprocess.Popen(['ps', '-o', 'rss', '-p',
str(os.getpid())],
stdout=subprocess.PIPE).communicate()[0].splitlines()[1].strip()
...
>>> rss()
'7284'
>>> l = simplejson.loads(simplejson.dumps([x for x in xrange(1000000)]))
>>> rss()
'49032'
>>> del l
>>> rss()
'42232'
>>> l = simplejson.loads(simplejson.dumps([x for x in xrange(1000000)]))
>>> rss()
'49032'
>>> del l
>>> rss()
'42232'
$ python
Python 2.7.2 (default, Jan 23 2012, 14:26:16)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on
darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, subprocess, simplejson
>>> def rss(): return subprocess.Popen(['ps', '-o', 'rss', '-p',
str(os.getpid())],
stdout=subprocess.PIPE).communicate()[0].splitlines()[1].strip()
...
>>> l = simplejson.loads(simplejson.dumps(dict((str(x), x) for x in
xrange(1000000))))
>>> rss()
'288116'
>>> del l
>>> rss()
'84384'
>>> l = simplejson.loads(simplejson.dumps(dict((str(x), x) for x in
xrange(1000000))))
>>> rss()
'288116'
>>> del l
>>> rss()
'84384'
-bob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20120615/5b8c7c3e/attachment.html>
More information about the Baypiggies
mailing list