MemoryError on reading mbox file

Christoph Krammer redtiger84 at googlemail.com
Wed Sep 12 05:27:39 EDT 2007


Hello everybody,

I have to convert a huge mbox file (~1.5G) to MySQL.

I tried with the following simple code:

for m in mailbox.mbox(fileName):

  msg  = m.as_string(True)
  hash = md5.new(msg).hexdigest()

  try:
    dbcurs.execute("""INSERT INTO archive (hash, msg) VALUES (%s,
%s)""", (hash, msg))
  except MySQLdb.OperationalError, err:
    print "%s  Error (%d): %s" % (file, err[0], err[1])
  else:
    print "%s: Message successfully added to database" % (hash,
spamSource)

The problem seems to be the size of file, every time I try to execute
the script, after about 20000 messages, the following error occurs:

Traceback (most recent call last):
  File "email_to_mysql_mbox.py", line 21, in <module>
    for m in mailbox.mbox(fileName):
  File "/usr/lib/python2.5/mailbox.py", line 98, in itervalues
    value = self[key]
  File "/usr/lib/python2.5/mailbox.py", line 70, in __getitem__
    return self.get_message(key)
  File "/usr/lib/python2.5/mailbox.py", line 633, in get_message
    string = self._file.read(stop - self._file.tell())
MemoryError

My system has 512M RAM and 768M swap, which seems to run out at an
early stage of this. Is there a way to clean up memory for messages
already processed?

Thanks and regards,
 Christoph




More information about the Python-list mailing list