[Web-SIG] HTTP header canonicalization?
Phillip J. Eby
pje at telecommunity.com
Sun Aug 22 20:16:29 CEST 2004
While reviewing the HTTP/1.1 spec (RFC 2616) for information on header
folding, I noticed an interesting bit under section "4.2 Message Headers":
Multiple message-header fields with the same field-name MAY be
present in a message if and only if the entire field-value for that
header field is defined as a comma-separated list [i.e., #(values)].
It MUST be possible to combine the multiple header fields into one
"field-name: field-value" pair, without changing the semantics of the
message, by appending each subsequent field-value to the first, each
separated by a comma. The order in which header fields with the same
field-name are received is therefore significant to the
interpretation of the combined field value, and thus a proxy MUST NOT
change the order of these field values when a message is forwarded.
So, although I've defined the headers sent by the application as a list of
name/value pairs, it seems that we *could* use a dictionary instead, if we
required that multiple headers not be used, and that some canonical form
(e.g. all lower-case) be used for the names.
Does anybody see any issues with this? The upside is that it makes it easy
for servers/gateways to add missing headers (using
'headerdict.setdefault()'), and it should also be easier for
application/framework developers to build up their headers incrementally in
the same way.
The only downsides I see that could possibly come up are:
* There's some reason to have headers with different names in a specific
order, even though the spec is adamant that such an ordering is
insignificant and not to be relied upon.
* There's some reason to split multi-value headers into separate header
lines, even though the spec is adamant that the forms are equivalent, and
that HTTP has no limitations on line length.
Does anybody know whether any HTTP clients in practice are affected by
these matters?
More information about the Web-SIG
mailing list