[Mailman-Developers] Please Allow Me To Introduce Myself...

Les Niles les@2pi.org
Wed, 6 Mar 2002 11:22:17 -0800


On Wed, 06 Mar 2002 11:01:51 -0800 "James J. Besemer" <jb@cascade-sys.com> wrote:
>
>mac@wooz.org wrote:
>> You'd think!  I've had a couple of patches contributed that filter out
>> HTML, but I've not been able to whip them into shape for inclusion.
>> I've basically given up hope for MM2.1, but will look at it again for
>> the next release.  The problem is that the naive approach isn't
>> difficult, but for it to be robust is much more difficult.
>
>When you find more time I'd appreciate some more background on this.
>
>Wanting to filter out HTML (nb. from AOL accounts) is the #1 gripe from my
>users.
>
>The Python library has an HTML parser that I've used before and it works
>pretty well.  I used it to translate HTML to HTML, inserting data in
>various named fields.  But removal of the HTML is the default action of
>the code.  Of course you don't really want simply to remove it.  E.g.,
>you'd want to include HREF's somehow, substitute the description for
>images, etc.

Most of the time you really can just strip out the HTML.  AOL,
Outhouse, and most of the other clients that like to generate HTML
put out multipart/alternative messages that include a text/plain
section, so picking out the latter and dropping the other
alternatives works pretty well.  Almost all of the pure-HTML
traffic I see is spam.  I've been using one of the patches Barry
referred to on some medium-sized lists for the past 1.5 years with
no complaints and very few instances of message bodies disappearing
entirely.  (It was the release of AOL 6.0, which doesn't allow
turning off HTML, that prompted me.)

  -les