Not able to store data to dictionary because of memory limitation
Rama Rao Polneni
ramp99 at gmail.com
Sun Jul 31 10:38:06 CEST 2011
Thanks for your idea.
I resolved the issue by making use of integers instead of strings.
Earlier I had many duplicate strings in different rows retrieved from database.
I created a list to have unique strings and their indexes are used in
actual computations. So lot of space is saved by having integers
instead of strings.
I am trying differnt approach too, to get ids from database instead
of gettign direct values. values will be retrived by using unique ids
of different tables. I hope, I will be able to improve perforamnce
with this approach.
On 7/6/11, Ulrich Eckhardt <ulrich.eckhardt at dominolaser.com> wrote:
> Rama Rao Polneni wrote:
>> After storing 1.99 GB data in to the dictionary, python stopped to
>> store the remaining data in to dictionary.
> Question here:
> - Which Python?
> - "stopped to store" (you mean "stopped storing", btw), how does it behave?
> Hang? Throw exceptions? Crash right away?
>> Memory utilization is 26 GB/34GB. That means, still lot memory is left
>> as unutilized.
> 2GiB is typically the process limit for memory allocations on 32-bit
> systems. So, if you are running a 32-bit system or running a 32-bit process
> on a 64-bit system, you are probably hitting hard limits. With luck, you
> could extend this to 3GiB on a 32-bit system.
>> Is this proplem becasue of large memory utlization.
> I guess yes.
>> Is there any alternate solution to resolve this issue. Like splitting
>> the dictionaries or writing the data to hard disk instead of writing
>> to memory.
> If you have lost of equal strings, interning them might help, both in size
> and speed. Doing in-memory compression would be a good choice, too, like
> e.g. if you have string fields in the DB that can only contain very few
> possible values, converting them to an integer/enumeration.
> Otherwise, and this is a more general approach, prefer making a single sweep
> over the data. This means that you read a chunk of data, perform whatever
> operation you need on it, possibly write the results and then discard the
> chunk. This keeps memory requirements low. At first, it doesn't look as
> clean as reading the whole data in one step, calculations as a second and
> writing results as a third, but with such amounts of data as yours, it is
> the only viable step.
> Good luck, and I'd like to hear how you solved the issue!
> Domino Laser GmbH
> Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
More information about the Python-list