HTMLParser: getting one extra space around >/</^M tags
Les Schaffer
godzilla at netmeg.net
Fri Jun 2 11:39:33 EDT 2000
I am helping someone convert html archives from a mailing list back to
mbox format for forwarding to mail-archive.com.
i am using HTMLParser to pull out the body of the message into plain
text, and AbstractFormatter/DumbWriter ...
The message bodies are all inside <PRE> tags. so nofill is set to 1
which means self.formatter.add_literal_data(data) is used to add data
to output. This is what i want for preformatted messages.
There are lots of > and < tags in the body of messages, as
mhonarc/hypermail translate the '>' quote reply and the '<>' of
embedded email addresses using these html tags. (the bodies are
relativly HTML free except for these).
In addition, the end-of-line characters __in the body of the mail
message__ use '^M'.
The problem is this: the formatter is adding one extra space around
these three characters ( > < and ^M).
so a back-in-text-mode mail message which looks like this:
=====
Hello, why are you even reading this email.
I have nothing important to say.
Stop it.
=====
the extra space on the left hand side which i assume comes from
translation of ^M
or this:
====
and so then you said:
> i dont like you
and i say, so what, and you said:
> thats what!
again, extra single space at beginning of line and on either side of
the '>'.
I cannot for the life of me (after 2 hours) find where in htmllib or
formatter these extra spaces are being inserted.
any one know?
many many thanks
les schaffer
More information about the Python-list
mailing list