[Mailman-Developers] 8bit and NNTP gateway

J C Lawrence claw at kanga.nu
Tue Sep 21 15:38:41 CEST 2004


On Tue, 21 Sep 2004 10:14:10 +0200
Brad Knowles <brad at stop.mail-abuse.org> wrote:
> At 11:24 PM -0400 2004-09-20, J C Lawrence wrote:

>> Are there any extant tools to detect 8bit mail (identified by a
>> Content-Transfer-Encoding: 8bit or not) and to suitably transform the
>> affected MIME parts (or base message if no MIME) to quoted-printable?

> The easy way to handle this is to configure your MTA so that it thinks
> that it is 7-bit only, and it should do the conversion for you.

Exim alas does not fall in that camp, and I need to use Exim as versus
Postfix due to the fine grained control it offers over executing UID/GID
for filters.  Sendmail and QMail are not a reasonable consideration for
this case.

> Sendmail certainly does this, and I'm pretty sure postfix does too.
> Not sure about any of the others.

The closest I've found for the Exim world is:

  http://www.exim.org/pipermail/exim-users/Week-of-Mon-20020916/043866.html

<sigh>

> The only other alternative I know of would be to hack the gateway code
> in Mailman so that it does the conversion.

The Mailman code for news injection is rather generous in terms of what
it accepts on the inbound side, and does remarkably little to fix errors
or enforce correctness.  My current hack for properly handling NNTP
injection looks something like:

--<cut>--
#!/usr/bin/python
import os, email, email.Utils, re, sys, string, time, nntplib, optparse
from StringIO import StringIO

_error = """
Exception: %s
Error: %s
-=-=-=-=-=-=-=-=-=-=-=
%s
-=-=-=-=-=-=-=-=-=-=-=
"""

parser = optparse.OptionParser ()
parser.add_option ("-s", "--server", dest = "nntp_server",
                   help = "NNTP server to post to", metavar = "SERVER")
parser.add_option ("-p", "--port", type = 'int', dest = "nntp_port",
                   help = "Port on NNTP server to post to", metavar = "PORT")
parser.add_option ("-U", "--user", dest = "nntp_user",
                   help = "Authentication user on NNTP server", metavar = "USER"
)
parser.add_option ("-P", "--password", dest = "nntp_password",
                   help = "Authentication password on NNTP server",
                   metavar = "PASSWORD")
parser.add_option ("-m", "--mode", type = 'int', default = 0,
                   dest = "nntp_mode", help = "Set reader mode on NNTP server?",
                   metavar = "MODE")
parser.add_option ("-a", "--approved", dest = "approved",
                   help = "Value of Approved: header to add", metavar = "APPROVE
D")
parser.add_option ("-g", "--group", dest = "newsgroup",
                   help = "Newsgroup[s] to post to", metavar = "NEWSGROUP")
parser.add_option("-d", "--debug",
                  action = "store_true", dest = "debug", default = False,
                  help = "Print debug messages")
#parser.add_option("-v", "--verbose",
#                  action = "store_false", dest = "verbose", default = False,
#                  help = "Verbose operation")
(options, args) = parser.parse_args()

def debug (
  opt,
  str):
  """
  Output if in debug mode.
  """
  if opt.debug:
    print str

#debug (options, options)
re_continuation = re.compile ('^\s+(\S.*)$')
re_empty = re.compile ('^\s*$')
re_envelope = re.compile ('^From ')
ndx = 0
headers = []
body = []
current = headers
inbody = 0
# Read message in headers line and body list
for line in sys.stdin.readlines ():
  rc = re_empty.search (line) # End of headers or hit separator
  if inbody or rc != None: # End of headers or hit separator
    inbody = True
    body.append (line)
    continue
  rc = re_continuation.search (line) # Header continuation
  if rc:
    headers[-1] += ' ' + rc.group (1).strip ()
    continue
  if line.startswith ('From '):
    headers.append ("X-Envelope-From: " + line[4:])
    continue
  headers.append (line.strip ())
# Fix up headers to be suitable for NNTP
headers_out = []
headers_noted = {}
for hdr in headers:
  # Extract name/value
  ndx = hdr.find (':') # Can't use split as there may be other colons
  n = hdr[:ndx].strip ()
  v = hdr[ndx + 1:].strip ()
  # Some header names require special casing per the RFC.
  n = string.capitalize (n)
  cased_headers = {
    'Message-id'      : 'Message-ID',
    'Relay-version'   : 'Relay-Version',
    'Posting-version' : 'Posting-Version',
    'Reply-to'        : 'Reply-To',
    'In-reply-to'     : 'In-Reply-To',
    'Followup-to'     : 'Followup-To',
    'Date-received'   : 'Date-Received',
    'newsgroups'      : 'newsgroups',
    'Content-type'    : 'Content-Type',
    'Content-transfer-encoding' : 'Content-Transfer-Encoding',
    }
  if n in cased_headers.keys ():
    n = cased_headers[n]
  # Remove illegal headers that just shouldn't be there.
  bad_headers = ['NNTP-Posting-Host',
                'NNTP-Posting-Date',]
  if n in bad_headers:
    continue
  # Prepend X- to unprotected headers
  protected_headers = ['From',
                       'Date',
                       'Subject',
                       'Path',
                       'Relay-version',
                       'Posting-version',
                       'Newsgroups',
                       'Message-id',
                       'Reply-to',
                       'Sender',
                       'Followup-to',
                       'Date-received',
                       'Expires',
                       'References',
                       'Distribution',
                       'Organization',
                       'Approved',
                       'Mime-Version'
                       ]
  if n not in protected_headers and not n.startswith ('X-'):
    n = "X-" + n
  # Remember the values of special headers for later processing
  special_headers = [
    'From',
    'Message-ID',
    'Path',
    'References',
    'In-Reply-To',
    'Date',
    'newsgroups',
    'Subject',
    'Content-Transfer-Encoding',
    ]
  if n in special_headers:
    headers_noted[n] = v
  # We need to suppress some headers so that we can have a chance to fix
  # them up later.
  suppressed_headers = [
    'Message-ID',
    'Content-Transfer-Encoding'
  ]
  # We need to specially process Message-ID later (format fixups),
  # so don't add it to the headers out yet.
  if n in suppressed_headers:
    continue # Already stashed in headers_noted
  # Leave everything else untouched
  headers_out.append ("%s: %s\n" % (n ,v))
# Missing From?
if not headers_noted.has_key ('From'):
  v = 'unknown at somewhere'
  headers_noted['From'] = v
  headers_out.append ('From: ' + v + '\n')
# Make a (new) Path as needed
if not headers_noted.has_key ('Path'):
  v = headers_noted['From']
  a, b = email.Utils.parseaddr (v)[1].split ('@')
  headers_out.append ('Path: %s!%s\n' % (b, a))
# Fix References header from In-Reply-To
if not headers_noted.has_key ('References') \
   and headers_noted.has_key ('In-Reply-To'):
  v = headers_noted['In-Reply-To']
  v = email.Utils.parseaddr (v)[1]
  headers_out.append ('References: <' + v + '>\n')
# Missing Date?
if not headers_noted.has_key ('Date'):
  debug (options, 'Missing date fixed.')
  headers_out.append ('Date: ' + email.Utils.formatdate (time.time (), 1) + '\n'
)
# Missing Subject?
if not headers_noted.has_key ('Subject'):
  debug (options, 'Missing subject fixed.')
  headers_out.append ('Subject: (subject missing)\n')
# Fix Message-ID and fieldname case (adding if needed)
if not headers_noted.has_key ('Message-ID'):
  headers_noted['Message-ID'] = email.Utils.make_msgid ()
v = headers_noted['Message-ID']
if not v.find ('<') < v.find ('>'): # Message-ID: <id_string>
  headers_out.append ('X-orig-Message-ID: ' +  v + '\n')
  v = email.Utils.parseaddr (v)[1]
  headers_out.append ('Message-ID: <' + v + '>\n')
# Fix or add newsgroups header
v = None
if headers_noted.has_key ('newsgroups'):
  v = headers_noted['newsgroups']
if not v: # Just add the one from the CLI
  headers_out.append ('newsgroups: ' + options.newsgroup + '\n')
else: # Check the newsgroups header for the group already being present
  groups = v.split (',')
  found = 0
  for group in groups:
    group = group.strip ()
    if options.newsgroup == group:
      found = 1
  if not found:
    headers_out.append ('newsgroups: ' + v.strip () + ', '
                        + options.newsgroup + '\n')
# Add an approved header if requested
if hasattr (options, 'approved'):
  headers_out.append ('Approved: ' + options.approved + '\n')
# Re-assemble message
out_f = StringIO ()
for line in headers_out:
  out_f.write (line)
out_f.write ('\n')
for line in body:
  out_f.write (line)
out_f.seek (0)
if options.debug:
  debug (options, "Not sending.")
  debug (options, out_f.getvalue ())
  sys.exit (0)
try:
  debug (options, 'Server: %s  Port: %s User: %s Password: %s Mode: %s'
                  % (options.nntp_server, options.nntp_port,
                     options.nntp_user, options.nntp_password,
                     options.nntp_mode))
  server = nntplib.NNTP (options.nntp_server, options.nntp_port,
                         options.nntp_user, options.nntp_password,
                         options.nntp_mode)
  server.post (out_f)
  server.quit ()
except (nntplib.NNTPError), message:
  debug (options, 'Message: ' + repr (message) + '\n' + _error % sys.exc_info ()
)
  raise message
  sys.exit (2)
sys.exit (0)
--<cut>--

This was originally written to inject old archives into a news spool,
for which it works rather well modulo a few caveats discussed below.
About the only other thing which ended up needing to be done for archive
injection was to comment out the date check in your copy of inews so
that the old mail being freshly injected wasn't bounced for having a
stale date.

Strengths of the script are that it enforces and fixes/generates header
correctness in multiple areas (case, presence etc), preserves all prior
headers (esp Received:) for debugging/tracing downstream, unfolds
continued header lines for older/more fragile News servers downstream,
and generally tries to be very generous in terms of what it will accept
and rather pedantically picky in terms of what it will emit.  Known
weaknesses are that it's fragile on incorrect or missing CLI arguments,
email.Utils.parseaddr() has known problems with addresses which contain
parenthesis in the GECOS field, and it doesn't begin to handle the
7/8bit problem yet.

--
J C Lawrence
---------(*)                Satan, oscillate my metallic sonatas.
claw at kanga.nu               He lived as a devil, eh?
http://www.kanga.nu/~claw/  Evil is a name of a foeman, as I live.


More information about the Mailman-Developers mailing list