[Web-SIG] HTTP header canonicalization?

Sun Aug 22 20:16:29 CEST 2004

While reviewing the HTTP/1.1 spec (RFC 2616) for information on header 
folding, I noticed an interesting bit under section "4.2 Message Headers":

    Multiple message-header fields with the same field-name MAY be
    present in a message if and only if the entire field-value for that
    header field is defined as a comma-separated list [i.e., #(values)].
    It MUST be possible to combine the multiple header fields into one
    "field-name: field-value" pair, without changing the semantics of the
    message, by appending each subsequent field-value to the first, each
    separated by a comma. The order in which header fields with the same
    field-name are received is therefore significant to the
    interpretation of the combined field value, and thus a proxy MUST NOT
    change the order of these field values when a message is forwarded.

So, although I've defined the headers sent by the application as a list of 
name/value pairs, it seems that we *could* use a dictionary instead, if we 
required that multiple headers not be used, and that some canonical form 
(e.g. all lower-case) be used for the names.

Does anybody see any issues with this?  The upside is that it makes it easy 
for servers/gateways to add missing headers (using 
'headerdict.setdefault()'), and it should also be easier for 
application/framework developers to build up their headers incrementally in 
the same way.

The only downsides I see that could possibly come up are:

  * There's some reason to have headers with different names in a specific 
order, even though the spec is adamant that such an ordering is 
insignificant and not to be relied upon.

  * There's some reason to split multi-value headers into separate header 
lines, even though the spec is adamant that the forms are equivalent, and 
that HTTP has no limitations on line length.

Does anybody know whether any HTTP clients in practice are affected by 
these matters?