zlib, gzip and HTTP compression.

Alan Kennedy alanmk at hotmail.com
Sat Jan 12 08:49:24 EST 2002


jj <janez.jere at void.si> wrote in message news:<3C3F6D45.D4232F5D at void.si>...
> Content-Length must be the length of original, before compression.
> So try
> 
> print "Content-Length: %d" % os.path.getsize("test.html")

Thanks again for the suggestion JJ.

However, my reading of RFC2616 tells me otherwise. Though I could be
wrong, since there is much room for interpretation in HTTP specs.

http://www.ietf.org/rfc/rfc2616.txt

>From Section: 7.2.2 Entity Length

   The entity-length of a message is the length of the message-body
   before any transfer-codings have been applied. Section 4.4 defines
   how the transfer-length of a message-body is determined.

AK> So the "entity-length" is the uncompressed length of the file

>From Section: 4.4 Message Length

   The transfer-length of a message is the length of the message-body
   as it appears in the message; that is, after any transfer-codings
   have been applied. When a message-body is included with a message,
   the transfer-length of that body is determined by one of the
   following (in order of precedence):

AK> And the "transfer-length" is the compressed length of the file,
AK> i.e. the length as it appears in the message

   1.Any response message which "MUST NOT" include a message-body
    (such as the 1xx, 204, and 304 responses
    [elided, not relevant]

   2.If a Transfer-Encoding header field (section 14.41) is present
     [elided, not relevant] 

   3.If a Content-Length header field (section 14.13) is present,
     its decimal value in OCTETs represents both the entity-length
     and the transfer-length. The Content-Length header field MUST NOT
     be sent if these two lengths are different (i.e., if a
     Transfer-Encoding header field is present). If a message is
     received with both a Transfer-Encoding header field and a
     Content-Length header field, the latter MUST be ignored.

AK> So according to this, since the "entity-length" and the 
AK> "transfer-length" are different in this case, I shouldn't be 
AK> sending a "Content-length" at all! (Which I tried, and it didn't
AK> work)

   4.If the message uses the media type "multipart/byteranges",
     [elided, not relevant]

   5.By the server closing the connection. (Closing the conn...
     [elided, not relevant]

   For compatibility with HTTP/1.0 applications, HTTP/1.1 requests
   containing a message-body MUST include a valid Content-Length
   [elided, relevant to requests only, not responses]

   All HTTP/1.1 applications that receive entities MUST accept the
   "chunked" transfer-coding (section 3.6), thus allowing this
   mechanism to be used for messages when the message length cannot
   be determined in advance.

   Messages MUST NOT include both a Content-Length header field
   and a non-identity transfer-coding. If the message does include
   a non-identity transfer-coding, the Content-Length MUST be ignored.

AK> This appears to agree with point 3 above about not sending a 
AK> Content-length when the "entity-length" and the "transfer-length"
AK> are different. 

   When a Content-Length is given in a message where a message-body
   is allowed, its field value MUST exactly match the number of OCTETs
   in the message-body. HTTP/1.1 user agents MUST notify the user when
   an invalid length is received and detected.

AK> This seems to me the conclusive statement. If "Content-length" 
AK> is present, it MUST represent the length of the message body, i.e.
AK> the compressed length of the file.

I think that there is confusion around the interpretation of
"Content-length" because of lack of clarity on the difference
between "Content-encoding" and "Transfer-encoding".

"Content-encoding" is supposed to represent the inherent encoding
of the entity (i.e. file) being transferred. Most likely it was
intended to communicate that a file was being sent which was
compressed in some way, and which is permanently exists in
compressed format.

"Transfer-encoding" is supposed to be a transient thing, lasting
only for the duration of the tx/rx of the HTTP message. That is,
it is a mechanism for temporarily encoding (compressing) a file
purely so that it can be transmitted safely or using less
bandwidth.

The difficulty comes in deciding when to use Transfer-encoding
or Content-encoding. For example, If I dynamically generate a HTML
"file", and want to send it compressed, is the compression
inherent to the nature of the "file", or is it merely a transient
thing which is purely to save bandwidth in transmission of the file.

Of course, the answer to this question is decided by the actual
interpretation that people have made in writing their software.
The general consensus seems to be that "Content-encoding" is the
way to go. "Transfer-encoding" seems not to be used.

Or I could be wrong :-)

But of course, none of this helps me with my current problem, since
I have tried all possible combinations of Content-encoding,
Transfer-encoding, Content-length, no Content-length, etc,
etc, etc, etc.

I'm not quite sure what to try next.

I think I am either

1. Sending the wrong compressed data
2. Sending correct compressed data in a way that is resulting in
corruption
3. Missing out on some further processing that must be conducted on
the message.

I will get to the bottom of this.

And I will document it so no-one will have to go through this hassle
again.

Regards,

Alan.



More information about the Python-list mailing list