<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 9/25/2012 11:17 AM, Oscar Benjamin wrote:
<blockquote
cite="mid:CAHVvXxQfAMiPaHp0SDtfLknmK2nQUqaZBc83_LuE9nTHcHyerg@mail.gmail.com"
type="cite">
<div class="gmail_quote">On 25 September 2012 19:08, Junkshops <span
dir="ltr"><<a moz-do-not-send="true"
href="mailto:junkshops@gmail.com" target="_blank">junkshops@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><br>
In [38]: mpef._ustore._store<br>
Out[38]: defaultdict(<type 'dict'>, {'Measurement':
{'8991c2dc67a49b909918477ee4efd767':
<micropheno.exchangeformat.Exceptions.FileContext object
at 0x2f0fe90>, '7b38b429230f00fe4731e60419e92346':
<micropheno.exchangeformat.Exceptions.FileContext object
at 0x2f0fad0>, 'b53531471b261c44d52f651add647544':
<micropheno.exchangeformat.Exceptions.FileContext object
at 0x2f0f4d0>, '44ea6d949f7c8c8ac3bb4c0bf4943f82':
<micropheno.exchangeformat.Exceptions.FileContext object
at 0x2f0f910>, '0de96f928dc471b297f8a305e71ae3e1':
<micropheno.exchangeformat.Exceptions.FileContext object
at 0x2f0f550>}})<br>
</div>
</blockquote>
<div><br>
</div>
<div>Have these exceptions been raised from somewhere before
being stored? I wonder if you're inadvertently keeping
execution frames alive. There are some problems in CPython
with this that are related to storing exceptions.</div>
</div>
</blockquote>
FileContext objects aren't exceptions. They store information about
where the stored object originally came from, so if there's an MD5
or ID clash with a later line in the file the code can report both
the current line and the older clashing line to the user. I have an
Exception subclass that takes a FileContext as an argument. There
are no exceptions thrown in the file I processed to get the heapy
results earlier in the thread.<br>
<br>
<blockquote
cite="mid:CAHVvXxQfAMiPaHp0SDtfLknmK2nQUqaZBc83_LuE9nTHcHyerg@mail.gmail.com"
type="cite">
<div class="gmail_quote">
<blockquote type="cite">In [43]:
mpef._ustore._idstore['Measurement']._SIDstore<br>
Out[43]: defaultdict(<function <lambda> at
0x2ece7d0>, {'emailRemoved': defaultdict(<function
<lambda> at 0x2c4caa0>, {'microPhenoShew2011':
defaultdict(<type 'dict'>, {0: {'MLR_124572462':
'8991c2dc67a49b909918477ee4efd767', 'MLR_124572161':
'7b38b429230f00fe4731e60419e92346', 'SMMLR_12551352':
'b53531471b261c44d52f651add647544', 'SMMLR_12551051':
'0de96f928dc471b297f8a305e71ae3e1', 'SMMLR_12550750':
'44ea6d949f7c8c8ac3bb4c0bf4943f82'}})})})</blockquote>
Also I think lambda functions might be able to keep the frame
alive. Are they by any chance being created in a function that
is called in a loop?
<div><br>
</div>
</div>
</blockquote>
Here's the context for the lambdas: <br>
<br>
def __init__(self):<br>
self._SIDstore = defaultdict(lambda: defaultdict(lambda:
defaultdict(dict)))<br>
<br>
So the lambda is only being called when a new key is added to the
top 3 levels of the datastructure, which in the test case I've been
discussing, only happens once each.<br>
<br>
Although the suggestion to change the hex strings to ints is a good
one and I'll do it, what I'm really trying to understand is why
there's such a large difference between the memory use per top (and
the fact that the code appears to thrash swap) and per heapy and my
calculations of how much memory the code should be using.<br>
<br>
Cheers, MrsEntity<br>
</body>
</html>