[Web-SIG] parsing of urlencoded data and Unicode
deron.meranda at gmail.com
Tue Jul 29 20:58:26 CEST 2008
On Tue, Jul 29, 2008 at 12:39 PM, Manlio Perillo
<manlio_perillo at libero.it> wrote:
> Bill Janssen ha scritto:
>> Actually, it's defined for all fields, isn't it? From RFC 2388:
>> ``As with all multipart MIME types, each part has an optional
>> "Content-Type", which defaults to text/plain.''
>> So the type is "text/plain" unless it says something else. And,
>> according to RFC 2046, the default charset for "text/plain" is
> Ok with theory.
> But in practice:
> <form action="" method="post" accept-charset="utf-8"
> In theory I should assume ascii encoded data for the body field; and since
> this data can not be decoded, I should assume it as byte string.
> However the body field is encoded in utf-8, and if I add an hidden _charset_
> field, FF and IE add this field in the response, with the charset used in
> the encoding.
>From what I've seen, most user agents fail to send a Content-Type,
much less a charset parameter. Many will also ignore the accept-charset
However most browsers will respectfully send the text fields in a POST
response in the same character set that the page which contained the
<form> element was sent to the browser to begin with. So if you
output HTML pages in UTF-8, the text portions of post messages will
be returned in UTF-8.
It's not following any standard, but its the way things seem to work.
I would think it most useful if the decoding framework would strictly
follow the RFC and assume "text/plain; charset=US-ASCII"; but
also allow the caller some means of indicating a different default.
Obviously, if a user agent does provide a complete Content-Type,
it should be used.
More information about the Web-SIG