Program uses twice as much memory in Python 3.6 than in Python 3.5

Jan Gosmann jan at
Tue Mar 28 11:29:17 EDT 2017

On 28 Mar 2017, at 6:11, INADA Naoki wrote:

> I managed to install pyopencl and run the script.  It takes more than
> 2 hours, and uses only 7GB RAM.
> Maybe, some faster backend for OpenCL is required?
> I used Microsoft Azure Compute, Standard_A4m_v2 (4 cores, 32 GB
> memory) instance.

I suppose that the computing power of the Azure instance might not be 
sufficient and it takes much longer to get to the phase where the memory 
requirements increase? Have you access to the output that was produced?

By the way, this has nothing to do with OpenCL. OpenCL isn't used by the script at all. It is listed in the dependencies because 
some other things use it.

> More easy way to reproduce is needed...

Yes, I agree, but it's not super easy (all the smaller existing examples 
don't exhibit the problem so far), but I'll see what I can do.

>> My best idea about what's going on at the moment is that memory
>> fragmentation is worse in Python 3.6 for some reason. The virtual 
>> memory
>> size indicates that a large address space is acquired, but the 
>> resident
>> memory size is smaller indicating that not all of that address space 
>> is
>> actually used. In fact, the code might be especially bad to 
>> fragmentation
>> because it takes a lot of small NumPy arrays and concatenates them 
>> into
>> larger arrays. But I'm still surprised that this is only a problem 
>> with
>> Python 3.6 (if this hypothesis is correct).
>> Jan
> Generally speaking, VMM vs RSS doesn't mean fragmentation.
> If RSS : total allocated memory ratio is bigger than 1.5, it may be
> fragmentation.
> And large VMM won't cause swap.  Only RSS is meaningful.

I suppose you are right that from the VMM and RSS numbers one cannot 
deduce fragmentation. But I think RSS in this case might not be 
meaningful either. My understanding from [the Wikipedia description] is 
that it doesn't account for parts of the memory that have been written 
to the swap. Or in other words RSS will never exceed the size of the 
physical RAM. VSS is also only partially useful because it just gives 
the size of the address space of which not all might be used?

Anyways, I'm getting a swap usage of about 30GB with Python 3.6 and 
zsh's time reports 2339977 page faults from disk vs. 107 for Python 3.5.

I have some code to measure the unique set size (USS) and will see what 
numbers I get with that.


More information about the Python-list mailing list