<div class="gmail_quote">On 25 September 2012 00:58, Junkshops <span dir="ltr"><<a href="mailto:junkshops@gmail.com" target="_blank">junkshops@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi Tim, thanks for the response.<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- check how you're reading the data:  are you iterating over<br>

   the lines a row at a time, or are you using<br>

   .read()/.readlines() to pull in the whole file and then<br>

   operate on that?<br>

</blockquote></div>

I'm using enumerate() on an iterable input (which in this case is the filehandle).<div class="im"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- check how you're storing them:  are you holding onto more<br>

   than you think you are?<br>

</blockquote></div>

I've used ipython to look through my data structures (without going into ungainly detail, 2 dicts with X numbers of key/value pairs, where X = number of lines in the file), and everything seems to be working correctly. Like I say, heapy output looks reasonable - I don't see anything surprising there. In one dict I'm storing a id string (the first token in each line of the file) with values as (again, without going into massive detail) the md5 of the contents of the line. The second dict has the md5 as the key and an object with __slots__ set that stores the line number of the file and the type of object that line represents.</blockquote>

<div><br></div><div>Can you give an example of how these data structures look after reading only the first 5 lines?</div><div><br></div><div>Oscar</div></div>