<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">


<html>


  <head>


    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1" />


    <title>Optimizing size of very large dictionaries</title>


  </head>


  <body dir="ltr">


    <p>Are there any techniques I can use to strip a dictionary data structure down to the smallest memory overhead possible? </p><p>I'm working on a project where my available RAM is limited to 2G and I would like to use very large dictionaries vs. a traditional database.</p><p>Background: I'm trying to identify duplicate records in very large text based transaction logs. I'm detecting duplicate records by creating a SHA1 checksum of each record and using this checksum as a dictionary key. This works great except for several files whose size is such that their associated checksum dictionaries are too big for my workstation's 2G of RAM.</p><p>Thank you,</p><p>Malcolm</p>


  </body>


</html>