[Web-SIG] parsing of urlencoded data and Unicode
deron.meranda at gmail.com
Tue Jul 29 22:12:10 CEST 2008
On Tue, Jul 29, 2008 at 3:50 PM, Manlio Perillo
<manlio_perillo at libero.it> wrote:
> Deron Meranda ha scritto:
>>> But, at this point, can one consider the content of form post to be
>>> "text" string?
>>> Or it should be considered encoded "byte" string?
>> I'd say follow the RFC, but perhaps allow a caller to provide
>> an override default. So yes, you should assume an encoded
>> string if the subpart has a text/* Content-Type, or if it has no
>> content type at all (which must then be assumed to be text/plain
>> US-ASCII). That is the intent of the MIME text/* media type
>> after all; that it should be interpreted as a character string
>> and not a byte string.
>> In other cases, I would say returning a byte string is the
>> correct thing to do.
> I'm not sure to understand.
> If you want non text data in the POST request body, you can use the file
I don't think we're disagreeing.
In HTML, an input element with type=file will result in non-text; e.g.,
should result in a byte stream (ignoring the possibility of uploading
text files, which are permitted but not required to have a text/*
content type). But on the other hand an input with type=text or
type=password should definitely result in a character string,
not a byte string. Same with a textarea element.
It's less clear what input type=checkbox or type=radio should give,
but I think it's safe to assume a character string.
Either way, the parser of the multipart/form-data has no idea
what the original HTML looked like; it only has the posted MIME
structure and headers to go by.
In my suggestion, only if there is a Content-Type header on the
subpart, and only then if it is not of text/*, then you would return
a byte string. Everything else should result in a character string.
But you just can't only pick one return type; sometimes you have
bytes and other times you have characters.
> I can't really see use cases of normal input fields having byte strings.
In HTML, no. Only input with type=file should ever result in
a content type other than text.
However don't forget that not all POSTs with multipart/form-data
have to be the result of an HTML page. So a generic consumer
of multipart/form-data can't make such assumptions; hence why it
should just follow the RFC; with possible caller-specified overrides
to compensate for the real-world not matching the RFC spec.
More information about the Web-SIG