[Mailman-Developers] unicode / archive problem revisited

Ron Brogden rb at islandnet.com
Tue Dec 3 02:15:10 2002

Howdy.  I am currently running 2.1b5 of Mailman and am trying to sort out an 
issue with archiving that has crept up.  

The problem has been mentioned previously from what I can tell but no 
resolution seems to have been mentioned.

What the problem is that list archives (for reasons I won't bore you with) 
have a number of SPAM message in them with all sorts of random encoding types 
and other mangled garbage.  What happens is that when the archiver gets to 
the point of writing the archive, the encoding type test generates an error 
and the whole archiving process grinds to a crashing halt.  These are busy 
lists and the mbox archive takes a very long time to parse and there just is 
not enough time in the day to search for the offending message, chop it out 
and wait another 45 minutes or more until the archives are regenerated to hit 
the next garbled header, etc.  This will also continue to be a problem if any 
future SPAM messages sneak in via forged headers, etc.

The issue appears to be with: 


Traceback (most recent call last):
  File "./bin/arch", line 173, in ?
  File "./bin/arch", line 163, in main
  File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 303, in close
  File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 517, in 
  File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 1058, in 
  File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 423, in 
    self._update_simple_index(hdr, archive, arcdir)
  File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 444, in 
  File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 980, in 
    subject = self.get_header("subject", article)
  File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 1007, in 
    return unicode(result, article.charset)
TypeError: unicode() argument 2 must be string, not None

What I want is the archiver to default to english if it cannot figure out the 
encoding so that at least the archiver will not die.  

So two questions:

What is a valid encoding type to pass as default to the unicode call?  
Secondly, is there any danger in changing the fallback option to always use a 
specific charset?  I'd rather have gibberish than a process that dies.

Basically, around line 1007 in 
"/usr/local/mailman/Mailman/Archiver/HyperArch.py" I want to change:

 if isinstance(result, types.UnicodeType):
            return result
            return unicode(result, article.charset)


 if isinstance(result, types.UnicodeType):
            return result
            return unicode(result, "some string") # never fail!

Thanks for any suggestions.


More information about the Mailman-Developers mailing list