Urllib2: Only a partial page retrieved
hpsMouse
hpsmouse at gmail.com
Sun May 23 05:19:40 EDT 2010
On 5月22日, 下午5时43分, Dragon Lord <dragonlord... at gmail.com> wrote:
> The cutoff is allways at the same location: just after the label
> "Meeting date" and before the date itself. Could it be that something
> is interpreted as and eof command or something like that?
>
> example of the cutoff point with a bad page:
> <br/><b>Meeting Date: </b>
>
> example of the cutoff point with a good page:
> <br/><b>Meeting Date: </b>
I checked TCP packages, and found that the remote HTTP server send a
data package with flag "PUSH", causing the client to close connection.
That is exactly where the "Meeting Date: </b>" appears.
This seems not to be a bug for python, because Qt and telnet both
failed in my test, so did the wget program...
Most browsers use keep-alive HTTP, so the connection won't be closed.
I think that's why a browser show the page correctly.
More information about the Python-list
mailing list