[Email-SIG] Problem Report for email.Utils.decode_rfc2231
Barry Warsaw
barry at python.org
Wed Jul 19 06:16:03 CEST 2006
On Jul 17, 2006, at 8:35 PM, Mark Sapiro wrote:
> I just looked at the fix in SVN, and I think there is still a problem.
> I don't think the RFC 2231 encodings that produce the error are
> 'buggy'. There are two independent things going on in RFC 2231 - the
> charset and language encoding and the splitting of the parameter into
> multiple pieces, e.g. filename*0=, filename*1=, etc.
>
> The problem with email.utils.decode_params() is it doesn't distinguish
> between these cases. The charset/language information is only present
> if there is a * immediately preceeding the = as in
>
> filename*=charset'language'value
>
> or
>
> filename*0*=charset'language'value
> ...
>
> in these cases, a compliant value must not contain '
>
> However, if the parameter is
>
> filename*0=value_part_0
> filename*1=value_part_1
> ...
>
> these value_parts may contain any number of ' characters and they
> don't
> delimit charset and language information.
>
> See my suggested patch attached to
> <http://mail.python.org/pipermail/email-sig/2006-July/000293.html>.
Mark, I think you're right in your diagnosis. I've gone back and re-
read RFC 2231 and I agree that we need to distinguish between the two
segment types, which I'll call encoded (name ends in *) and non-
encoded (no * at end of name).
The way I read the RFC however, I don't think the patch is quite
right. Specifically, you can mix encoded and non-encoded segments in
an extended parameter, like so:
filename*0*="This is%20encoded"
filename*1="This is%20not encoded"
I believe this should end up with a 'filename' parameter with a value:
This is encodedThis is%20not encoded
Further, if any segment ends in a * then the charset and language
information must appear at the front of the string, but this is
decoded after segments are %-decoded and all the segments are
concatenated together. (The RFC appears to be a bit ambiguous here,
but this is the only interpretation that makes sense to me.)
Both of these changes caused many failures in the test suite, but I
believe that's because many of the tests were incorrect. Some broke
because they were using all non-encoded segments yet were expecting
Message.get_param() to return a 3-tuple. That interface, while
yucky, seems clear that when all non-encoded segments are used, the
return value should be a simple string.
The other breakage was that non-encoded segments should not be %-
decoded, but there were many cases where they were still being decoded.
I believe the attached patch fixes all these cases, and yet retains
the failsafe checks in decode_rfc2231() -- be liberal in what you
accept, blah, blah, blah. The patch also updates all the affected
tests. This patch is against the Python trunk. Please let me know
what you think! If it looks good, I'll commit it and back port the
whole schmere to the earlier email package versions.
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: email.diff
Type: application/octet-stream
Size: 10097 bytes
Desc: not available
Url : http://mail.python.org/pipermail/email-sig/attachments/20060719/42621e24/attachment.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
Url : http://mail.python.org/pipermail/email-sig/attachments/20060719/42621e24/attachment.pgp
More information about the Email-SIG
mailing list