RFC 2047 parser
François Pinard
pinard at iro.umontreal.ca
Thu Jul 6 19:53:38 EDT 2000
[Jason Abate]
> I was wondering if anyone has put together code for parsing mail headers
> encoded according to RFC 2047 (Message Header Extensions for
> Non-ASCII Text).
I needed this soon after learning Python (so this is part of my first Python
lines, I would probably write something simpler today :-) and quickly
wrote what appears below. However, I found out after the fact that the
Python library had something already. See mimify.mime_decode_header.
# Handling of RFC 2047 (previously RFC 1522) headers.
import re, string
def to_latin1(text):
return _sub_f(r'=\?ISO-8859-1\?Q\?([^?]*)\?=', re.I, _replace1, text)
def _replace1(match):
return _sub_f('=([0-9A-F][0-9A-F])', re.I, _replace2,
re.sub('_', ' ', match.group(1)))
def _replace2(match):
return chr(string.atoi(match.group(1), 16))
def _sub_f(pattern, flags, function, text):
matcher = re.compile(pattern, flags).search
position = 0
results = []
while 1:
match = matcher(text, position)
if not match:
results.append(text[position:])
return string.joinfields(results, '')
results.append(text[position:match.start(0)])
position = match.end(0)
results.append(function(match))
--
François Pinard http://www.iro.umontreal.ca/~pinard
More information about the Python-list
mailing list