[Email-SIG] Ensuring 7 bit encoding

R. David Murray rdmurray at bitdance.com
Fri Aug 28 03:45:39 CEST 2009


On Thu, 27 Aug 2009 at 17:42, Mark Sapiro wrote:
> Nicholas Cole wrote:
>>
>> What do I need to do to ensure that emails are generated only in 7,
>> not 8-bit encodings?  I assume that I need to use
>> email.charset.add_charset , but can't quite work out what incantation
>> to give it.  Does anyone have any pointers?
>
>
> I'm not sure what it is you're asking. Does this answer your question?
>
>>>> import email.message
>>>> m = email.message.Message()
>>>> m.set_payload("""A few lines
> ... of 7-bit text
> ...
> ... No high bit characters.
> ... """, 'us-ascii')
>>>> print m.as_string()
> MIME-Version: 1.0
> Content-Type: text/plain; charset="us-ascii"
> Content-Transfer-Encoding: 7bit
>
> A few lines
> of 7-bit text
>
> No high bit characters.
>
>>>>

It probably doesn't, since if that message contains high range
characters it will result in an encoding of 8bit:

>>> import email.message
>>> m = email.message.Message()
>>> m.set_payload("""A few lines
... of 8-bit text
...
... One high bit character: ².
... """, 'us-ascii')
>>> print m.as_string()
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8bit

A few lines
of 8-bit text

One high bit character: ².

>>>

Since 8bit isn't technically us-ascii, I wonder if this is a bug.

With a little experiement and a look at the code, it appears that you
will get 7bit clean output as long as you always provide a charset
for the input other than us-ascii that the charset module has been
told should be encoded using QP or BASE64 (which is true for
all of the already registered charsets).

EG: this results in 7bit clean output:

i>>> import email.message
>>> m = email.message.Message()
>>> m.set_payload("""A few lines
... of 8-bit text
...
... One high bit character: ².
... """, 'latin-1')
>>> print m.as_string()
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

A few lines
of 8-bit text

One high bit character: =C2=B2.

>>>

I suspect this is not a complete answer to the question...

--David


More information about the Email-SIG mailing list