[Baypiggies] json using huge memory footprint and not releasing

David Lawrence david at bitcasa.com
Fri Jun 15 23:32:24 CEST 2012


On Fri, Jun 15, 2012 at 2:22 PM, Bob Ippolito <bob at redivi.com> wrote:

> On Fri, Jun 15, 2012 at 4:15 PM, David Lawrence <david at bitcasa.com> wrote:
>
>> When I load the file into json, pythons memory usage spike to about 1.8GB
>> and I can't seem to get that memory to be released.  I put together a test
>> case that's very simple:
>>
>> with open("test_file.json", 'r') as f:
>>     j = json.load(f)
>>
>> I'm sorry that I can't provide a sample json file, my test file has a lot
>> of sensitive information, but for context, I'm dealing with a file in the
>> order of 240MB.  After running the above 2 lines I have the
>> previously mentioned 1.8GB of memory in use.  If I then do "del j" memory
>> usage doesn't drop at all.  If I follow that with a "gc.collect()" it still
>> doesn't drop.  I even tried unloading the json module and running another
>> gc.collect.
>>
>> I'm trying to run some memory profiling but heapy has been churning 100%
>> CPU for about an hour now and has yet to produce any output.
>>
>> Does anyone have any ideas?  I've also tried the above using cjson rather
>> than the packaged json module.  cjson used about 30% less memory but
>> otherwise displayed exactly the same issues.
>>
>> I'm running Python 2.7.2 on Ubuntu server 11.10.
>>
>> I'm happy to load up any memory profiler and see if it does better then
>> heapy and provide any diagnostics you might think are necessary.  I'm
>> hunting around for a large test json file that I can provide for anyone
>> else to give it a go.
>>
>
> It may just be the way that the allocator works. What happens if you load
> the JSON, del the object, then do it again? Does it take up 3.6GB or stay
> at 1.8GB? You may not be able to "release" that memory to the OS in such a
> way that RSS gets smaller... but at the same time it's not really a leak
> either.
>
> GC shouldn't really take part in a JSON structure, since it's guaranteed
> to be acyclic… ref counting alone should be sufficient to instantly reclaim
> that space. I'm not at all surprised that gc.collect() doesn't change
> anything for CPython in this case.
>
> $ python
> Python 2.7.2 (default, Jan 23 2012, 14:26:16)
> [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on
> darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import os, subprocess, simplejson
> >>> def rss(): return subprocess.Popen(['ps', '-o', 'rss', '-p',
> str(os.getpid())],
> stdout=subprocess.PIPE).communicate()[0].splitlines()[1].strip()
> ...
> >>> rss()
> '7284'
> >>> l = simplejson.loads(simplejson.dumps([x for x in xrange(1000000)]))
> >>> rss()
> '49032'
> >>> del l
> >>> rss()
> '42232'
> >>> l = simplejson.loads(simplejson.dumps([x for x in xrange(1000000)]))
> >>> rss()
> '49032'
> >>> del l
> >>> rss()
> '42232'
>
> $ python
> Python 2.7.2 (default, Jan 23 2012, 14:26:16)
> [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on
> darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import os, subprocess, simplejson
> >>> def rss(): return subprocess.Popen(['ps', '-o', 'rss', '-p',
> str(os.getpid())],
> stdout=subprocess.PIPE).communicate()[0].splitlines()[1].strip()
> ...
> >>> l = simplejson.loads(simplejson.dumps(dict((str(x), x) for x in
> xrange(1000000))))
> >>> rss()
> '288116'
> >>> del l
> >>> rss()
> '84384'
> >>> l = simplejson.loads(simplejson.dumps(dict((str(x), x) for x in
> xrange(1000000))))
> >>> rss()
> '288116'
> >>> del l
> >>> rss()
> '84384'
>
> -bob
>
>
It does appear that deleting the object and running the example again the
memory stays static at about 1.8GB.  Could you provide a little more detail
on what your examples are meant to demonstrate.  One shows a static memory
footprint and the other shows the footprint fluctuating up and down.  I
would expect the static footprint in the first example just from my
understanding of python free lists of integers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20120615/ec5f2eea/attachment-0001.html>


More information about the Baypiggies mailing list