Decode email subjects into unicode

Laszlo Nagy gandalf at
Tue Mar 18 11:09:32 CET 2008

Sorry, meanwhile i found that "email.Headers.decode_header" can be used 
to convert the subject into unicode:

> def decode_header(self,headervalue):
> val,encoding = decode_header(headervalue)[0]
> if encoding:
> return val.decode(encoding)
> else:
> return val

However, there are malformed emails and I have to put them into the 
database. What should I do with this:

Return-Path: <imitate at>
X-Original-To: info at
Delivered-To: dapinfo at
Received: from (unknown [])
by (Postfix) with SMTP id F1C071DD438;
Tue, 18 Mar 2008 05:43:27 -0400 (EDT)
Date: Tue, 18 Mar 2008 12:43:45 +0200
Message-ID: <60285728.00719565 at>
From: "Euro Dice Casino" <imitate at>
To: thomas at
Subject: With 2’500 Euro of Welcome Bonus you can’t miss the chance!
MIME-Version: 1.0
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 7bit

There is no encoding given in the subject but it contains 0x92. When I 
try to insert this into the database, I get:

ProgrammingError: invalid byte sequence for encoding "UTF8": 0x92

All right, this probably was a spam email and I should simply discard 
it. Probably the spammer used this special character in order to prevent 
mail filters detecting "can't" and "2500". But I guess there will be 
other important (ham) emails with bad encodings. How should I handle this?



More information about the Python-list mailing list