two spaces in subject lines
Since I upgraded to have iso_xxx compliant subjects, I notice that most emails go through with TWO spaces after the usual subject_prefix, on all lists. I don't really mind, but just wanted to mention it.
-- Fil
Since I upgraded to have iso_xxx compliant subjects, I notice that most emails go through with TWO spaces after the usual subject_prefix, on all lists. I don't really mind, but just wanted to mention it.
Precisely, here's how it happens :
"Subject: =?iso-8859-1?q?=5Bspip-dev=5D_?="
" petite mise a jour =?iso-8859-1?Q?s=E9curi?="
" =?iso-8859-1?Q?t=E9?= inc_auth_cookie"
(I've enclosed the lines with "" so there are no surprises)
-- Fil
On Saturday, April 13, 2002, at 06:13 , Fil wrote:
Since I upgraded to have iso_xxx compliant subjects, I notice that most emails go through with TWO spaces after the usual subject_prefix, on all lists. I don't really mind, but just wanted to mention it.
Precisely, here's how it happens :
"Subject: =?iso-8859-1?q?=5Bspip-dev=5D_?=" " petite mise a jour =?iso-8859-1?Q?s=E9curi?=" " =?iso-8859-1?Q?t=E9?= inc_auth_cookie"
This is an interesting side case. RFC 2047 says that between encoded words, whitespace is to be ignored; however, here, we have encoded words with US-ASCII in between them.
I think the email.Header package I wrote is doing the wrong thing here.
Either we need represent the whole thing as one or more encoded-words,
or we need to be super anal about whitespace between encoded-words and
non- encoded-words.
I am currently moving from Tokyo to California, but when I get back and settled I will take a long hard look at this issue. I agree that it's pretty important, and that email.Header is doing the wrong thing with respect to whitespace between encoded-words and non- encoded-words:
from email.Header import Header, decode_header from email.Charset import Charset f = Charset("iso-8859-1") z = Header("Zout alours!", f) z <email.Header.Header instance at 0x811b754> print z =?iso-8859-1?q?Zout_alours!?= z.append(" Hello?") print z =?iso-8859-1?q?Zout_alours!?= Hello? decode_header(z) [('Zout alours!', 'iso-8859-1'), ('Hello?', None)]
Here, the whitespace should *not* be disappearing in decode_header, and in fact there should only be one space between the encoded-word and "Hello?" in the printed-out header.
It's certainly a thinko in email.Header. I will work on this in a week or so..
Ben
"Ben" == Ben Gertzfield <che@debian.org> writes:
Ben> I think the email.Header package I wrote is doing the wrong
Ben> thing here.
Yup.
Ben> Either we need represent the whole thing as one or more
Ben> encoded-words, or we need to be super anal about whitespace
Ben> between encoded-words and non- encoded-words.
The latter. What are you going to do with encodings you know nothing about, eg, if I send a message with
Subject: =?sjt-1?q?=49=74=27=73=20=6A=75=73=74=20=41=53=43=49=49=21?=
in it?
Ben> It's certainly a thinko in email.Header.
RFC-2822-parsing is a dirty job.
dewa, matane.
-- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Don't ask how you can "do" free software business; ask what your business can "do for" free software.
On Saturday, April 13, 2002, at 02:59 , Stephen J. Turnbull wrote:
Ben> Either we need represent the whole thing as one or more Ben> encoded-words, or we need to be super anal about whitespace Ben> between encoded-words and non- encoded-words.
The latter. What are you going to do with encodings you know nothing about, eg, if I send a message with
Subject: =?sjt-1?q?=49=74=27=73=20=6A=75=73=74=20=41=53=43=49=49=21?=
in it?
Of course, we wouldn't do anything at all with that, regarding whitespace. I'm talking about when encoded-words and non- encoded-words are mixed together.
But I totally agree that we need to be anal; it's just hard to know ahead of time whether to encode the next space as part of an encoded-word, or as the space between an encoded-word and a non- encoded-word. But we certainly must not do both!
Ben
"Ben" == Ben Gertzfield <che@debian.org> writes:
Ben> On Saturday, April 13, 2002, at 02:59 , Stephen J. Turnbull
Ben> wrote:
Ben> Either we need represent the whole thing as one or more
Ben> encoded-words, or we need to be super anal about whitespace
Ben> between encoded-words and non- encoded-words.
>> The latter. What are you going to do with encodings you know
>> nothing about, eg, if I send a message with
>>
>> Subject:
>> =?sjt-1?q?=49=74=27=73=20=6A=75=73=74=20=41=53=43=49=49=21?=
>>
>> in it?
Ben> Of course, we wouldn't do anything at all with that,
Ben> regarding whitespace. I'm talking about when encoded-words
Ben> and non- encoded-words are mixed together.
On this list it should end up looking like this:
Subject: [Mailman-Developers] =?sjt-1?q?=49=74=27=73=20=6A=75=73=74=20=41=53=43=49=49=21?=
or so, no? Urk, folded at the whitespace and all....
-- Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Don't ask how you can "do" free software business; ask what your business can "do for" free software.
On Tuesday, April 16, 2002, at 12:53 , Stephen J. Turnbull wrote:
On this list it should end up looking like this:
Subject: [Mailman-Developers] =?sjt-1?q?=49=74=27=73=20=6A=75=73=74=20=41=53=43=49=49=21?=
or so, no? Urk, folded at the whitespace and all....
Ah, I see what you're getting at. Yes, thanks. Another edge case..
Ben
participants (3)
-
Ben Gertzfield
-
Fil
-
Stephen J. Turnbull