[issue20747] Charset.header_encode in email.charset doesn't take a maxlinelen argument and has inconsistent behavior with different encodings
Rik
report at bugs.python.org
Sun Feb 23 18:59:37 CET 2014
New submission from Rik:
If you look at the `header_encode` method in the `Charset` class in `email.charset`, you'll see that depending on the `header_encoding` that is set on the `Charset` instance, it will either encode it using base64 or quoted-printable (QP):
http://hg.python.org/cpython/file/3a1db0d2747e/Lib/email/charset.py#l351
However, QP always uses `maxlinelen=None` and base64 doesn't. This results in the following behaviour:
- If you use base64 encoding and your header size is longer than the default `maxlinelen`, it will be split over multiple lines.
- If you use QP encoding with the same header it doesn't get split over multiple lines.
You can easily test it with this snippet:
from email.charset import Charset, BASE64, QP
header = (
'tejkstj tlkjes takldjf aseio neaoiflk asnfoieas nflkdan foeias '
'naskln ioeasn kldan flkansoie naslk dnaslk fndaslk fneoisaf '
'neklasn dfklasnf oiasenf lkadsn lkfanldk fas dfknaioe nas'
)
charset = Charset('utf-8')
charset.header_encoding = BASE64
print 'BASE64:'
print charset.header_encode(header)
charset.header_encoding = QP
print 'QP:'
print charset.header_encode(header)
Which will output:
BASE64:
=?utf-8?b?dGVqa3N0aiB0bGtqZXMgdGFrbGRqZiBhc2VpbyBuZWFvaWZsayBhc25mb2llYXMg?=
=?utf-8?b?bmZsa2RhbiBmb2VpYXMgbmFza2xuIGlvZWFzbiBrbGRhbiBmbGthbnNvaWUgbmFz?=
=?utf-8?b?bGsgZG5hc2xrIGZuZGFzbGsgZm5lb2lzYWYgbmVrbGFzbiBkZmtsYXNuZiBvaWFz?=
=?utf-8?b?ZW5mIGxrYWRzbiBsa2ZhbmxkayBmYXMgZGZrbmFpb2UgbmFz?=
QP:
=?utf-8?q?tejkstj_tlkjes_takldjf_aseio_neaoiflk_asnfoieas_nflkdan_foeias_naskln_ioeasn_kldan_flkansoie_naslk_dnaslk_fndaslk_fneoisaf_neklasn_dfklasnf_oiasenf_lkadsn_lkfanldk_fas_dfknaioe_nas?=
This is inconsistent behavior.
Aside from that, I think the `header_encode` method should accept an argument `maxlinelen` that defaults to an appropriate value (probably 76), but which you can overwrite on free will.
This is (I think) also necessary because the `Header` class in `email.header` has a `maxlinelen` attribute that is used for the same purpose. Normally this works fine, but when you specified a charset for your header, it uses the `Charset` class and the `maxlinelen` is lost. This is happening here:
http://hg.python.org/cpython/file/3a1db0d2747e/Lib/email/header.py#l368
You see, the `_encode_chunks` takes the `maxlinelen` argument but doesn't pass it on to the `header_encode` method of `charset` (which is a `Charset` instance).
As such, you can see this issue in action with the following snippet:
from email.header import Header
maxlinelen = 9999999
print 'No charset:'
print Header(
u'asdfjk lasjdf sajdfl ajsdfaj sdlkfjas kfladjs flkajsdflk jsadklf jadslkfj adslkfj asdlkjf lksadjfkldas jfkldasj fkadsj fladsjf kladsjfk asdjfkldasasd kfaj kfladsj fkadsjf asdf ',
maxlinelen=maxlinelen
).encode()
print 'Charset with special characters:'
print Header(
u'attachment; filename="ajdsklfj klasdjfkl asdjfkl jadsfja sdflkads fad fads adsf dasjfkl jadslkfj dlasf asd \u6211\u6211\u6211 jo \u6211\u6211 jo \u6211\u6211"',
charset='utf-8',
maxlinelen=9999999
).encode()
Which will output:
No charset:
asdfjk lasjdf sajdfl ajsdfaj sdlkfjas kfladjs flkajsdflk jsadklf jadslkfj adslkfj asdlkjf lksadjfkldas jfkldasj fkadsj fladsjf kladsjfk asdjfkldasasd kfaj kfladsj fkadsjf asdf
Charset with special characters:
=?utf-8?b?YXR0YWNobWVudDsgZmlsZW5hbWU9ImFqZHNrbGZqIGtsYXNkamZrbCBhc2RqZmts?=
=?utf-8?b?IGphZHNmamEgc2RmbGthZHMgZmFkIGZhZHMgYWRzZiBkYXNqZmtsIGphZHNsa2Zq?=
=?utf-8?b?IGRsYXNmIGFzZCDmiJHmiJHmiJEgam8g5oiR5oiRIGpvIOaIkeaIkSI=?=
This is currently an issue we're experiencing in Django, see our issue in the issue tracker:
https://code.djangoproject.com/ticket/20889#comment:4
----------
components: Library (Lib), email
messages: 212011
nosy: barry, r.david.murray, rednaw
priority: normal
severity: normal
status: open
title: Charset.header_encode in email.charset doesn't take a maxlinelen argument and has inconsistent behavior with different encodings
type: behavior
versions: Python 2.7
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20747>
_______________________________________
More information about the Python-bugs-list
mailing list