[spambayes-dev] RE: [Spambayes] Re: Training empty messagesproblem
sethg at GoodmanAssociates.com
Thu Dec 16 22:05:44 CET 2004
> From: Kenny Pitt
> Sent: Thursday, December 16, 2004 9:56 AM
> The correct format, I believe, would be:
> To: kennypitt at hotpop.com; "Kenny Pitt" <KennyPitt at invalid>
> Should be a simple matter of splitting the addresses on the ";" character.
> I'm going to go take a shot at this and see what I get.
This is acceptable, and typical for Outlook, but it does involve some legacy
constructs which have been deprecated in RFC2822. RFC2822 updates and
replaces RFC822 for all practical purposes and is a better reference to use.
It does list which "obsolete" address formats must be accepted. In
particular, the use of the first address without angle brackets is
deprecated, though recognized as a legacy format that must be accepted.
Current practice is to include all addresses in angle brackets and unless
that causes problems in Outlook, that would be preferable. The second
problem is the use of a semi-colon to separate addresses. This is now
supposed to be a comma, though the obsolete semi-colon delimiter of RFC822
is explicitly supported. My copy of Outlook2000 contains a check box to
accept commas as address delimiters, which is the default setting, but it
still produce semicolons for display. I think it would be prudent to accept
either delimiter, in case MS ever gets a gram of clue and drops the
deprecated format. This will also position you to more easily integrate
with non-MS MUA's, hopefully open-source ones as they become popular enough.
Microsoft never did give a rat's posterior about IETF standards and often
uses them as a marketing tool to "differentiate" their products
(translation: intentionally create interoperability problems).
Another general question on standards compliance is does Spambayes support
the Resent-*: series of headers? These are neither generated nor displayed
by Outlook, since Microsoft apparently never considered RFC2822 relevant.
However, many other MUA's use the remailing syntax of that standard, which
uses those headers. Though they are defined as trace headers and in that
sense are optional, they are required in order to use the remailing
semantics of RFC2822 section 3.6.6. An example of this is Pine's bounce
function. The fact that MS completely ignores those headers in their MUA's
has created a huge problem for those of us who are involved in message
authentication standards efforts. When used, those headers do contain
important information, and as authentication becomes more common, they will
become more important. My suggestion is that, of that whole series of
headers, the ones that would be of interest to Spambayes are:
Below is the relevant text from RFC2822. Some tokens are only defined in
other sections and there are two that are worth describing here. "Phrase"
is a quoted string, an atom or an obsolete format consisting of a
combination of words including "." and CFWS. CFWS is "commented folding
white space" that encompasses folding white space and comments, where
comments are parenthesis-delimited strings. This is relevant to the way you
described Outlook presenting some addresses and is the most serious
difference from the standards.
Even in RFC822, comments were permitted but expressly ignored in address
strings, so Microsoft's practice is completely broken. RFC2822 specifically
says that comments SHOULD NOT be included in address fields, as legacy
implementations sometimes interpret the comments. Apparently, we are now a
legacy application because MS has forced us to interpret the content of
comments in order to get the correct address-list from their broken MUA.
3.4. Address Specification
Addresses occur in several message header fields to indicate senders
and recipients of messages. An address may either be an individual
mailbox, or a group of mailboxes.
address = mailbox / group
mailbox = name-addr / addr-spec
name-addr = [display-name] angle-addr
angle-addr = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr
group = display-name ":" [mailbox-list / CFWS] ";"
display-name = phrase
mailbox-list = (mailbox *("," mailbox)) / obs-mbox-list
address-list = (address *("," address)) / obs-addr-list
A mailbox receives mail. It is a conceptual entity which does not
necessarily pertain to file storage. For example, some sites may
choose to print mail on a printer and deliver the output to the
addressee's desk. Normally, a mailbox is comprised of two parts: (1)
an optional display name that indicates the name of the recipient
(which could be a person or a system) that could be displayed to the
user of a mail application, and (2) an addr-spec address enclosed in
angle brackets ("<" and ">"). There is also an alternate simple form
of a mailbox where the addr-spec address appears alone, without the
recipient's name or the angle brackets. The Internet addr-spec
address is described in section 3.4.1.
Note: Some legacy implementations used the simple form where the
addr-spec appears without the angle brackets, but included the name
of the recipient in parentheses as a comment following the addr-spec.
Since the meaning of the information in a comment is unspecified,
implementations SHOULD use the full name-addr form of the mailbox,
instead of the legacy form, to specify the display name associated
with a mailbox. Also, because some legacy implementations interpret
the comment, comments generally SHOULD NOT be used in address fields
to avoid confusing such implementations.
When it is desirable to treat several mailboxes as a single unit
(i.e., in a distribution list), the group construct can be used. The
group construct allows the sender to indicate a named group of
recipients. This is done by giving a display name for the group,
followed by a colon, followed by a comma separated list of any number
of mailboxes (including zero and one), and ending with a semicolon.
Because the list of mailboxes can be empty, using the group construct
is also a simple way to communicate to recipients that the message
was sent to one or more named sets of recipients, without actually
providing the individual mailbox address for each of those
More information about the spambayes-dev