Memory usage per top 10x usage per heapy
Dave Angel
d at davea.name
Tue Sep 25 07:06:29 EDT 2012
On 09/25/2012 12:21 AM, Junkshops wrote:
>> Just curious; which is it, two million lines, or half a million bytes?
<snip>
>
> Sorry, that should've been a 500Mb, 2M line file.
>
>> which machine is 2gb, the Windows machine, or the VM?
> VM. Winders is 4gb.
>
>> ...but I would point out that just because
>> you free up the memory from the Python doesn't mean it gets released
>> back to the system. The C runtime manages its own heap, and is pretty
>> persistent about hanging onto memory once obtained. It's not normally a
>> problem, since most small blocks are reused. But it can get
>> fragmented. And i have no idea how well Virtual Box maps the Linux
>> memory map into the Windows one.
> Right, I understand that - but what's confusing me is that, given the
> memory use is (I assume) monotonically increasing, the code should never
> use more than what's reported by heapy once all the data is loaded into
> memory, given that memory released by the code to the Python runtime is
> reused. To the best of my ability to tell I'm not storing anything I
> shouldn't, so the only thing I can think of is that all the object
> creation and destruction, for some reason, it preventing reuse of
> memory. I'm at a bit of a loss regarding what to try next.
I'm not familiar with heapy, but perhaps it's missing something there.
I'm a bit surprised you aren't beyond the 2gb limit, just with the
structures you describe for the file. You do realize that each object
has quite a few bytes of overhead, so it's not surprising to use several
times the size of a file, to store the file in an organized way. I also
wonder if heapy has been written to take into account the larger size of
pointers in a 64bit build.
Perhaps one way to save space would be to use a long to store those md5
values. You'd have to measure it, but I suspect it'd help (at the cost
of lots of extra hexlify-type calls). Another thing is to make sure
that the md5 object used in your two maps is the same object, and not
just one with the same value.
--
DaveA
More information about the Python-list
mailing list