PyPI download issues from Rackspace Cloud
I recently spin up a Windows VM on Rackspace Cloud. I'm seeing a very weird problem: downloads from PyPI fail with checksum errors, nondeterministically. Sometimes it's a md5 hash mismatch error from pip[1]. Sometimes the error is a CRC error deep in the gzip module. It's not only pip -- I've seen this error from virtualenv[2], I've seen it from get-pip.py. [1] https://jenkins.gedmin.as/job/restview-on-windows/1/console https://jenkins.gedmin.as/job/restview-on-windows/2/console https://jenkins.gedmin.as/job/restview-on-windows/3/console https://jenkins.gedmin.as/job/check-manifest-on-windows/11/console [2] https://jenkins.gedmin.as/job/check-manifest-on-windows/7/console https://jenkins.gedmin.as/job/check-manifest-on-windows/6/console The error is sporadic and tends to go away (and come back) on retries (look at the restview-on-windows jobs: first pip install mock fails, then succeeds, then fails again on next retry). I haven't encountered any download troubles with Firefox. In fact that's how I got 'pip install tox' to work in the end -- pypi downloads of virtualenv/py libraries kept failing, so I downloaded the tarballs with Firefox and pip installed them from local files. I've ran curl https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.10.1.tar.g... | md5sum - like 10 times and got the same (correct) md5 hash each time. Meanwhile unset PIP_DOWNLOAD_CACHE rm -f virtualenv-1.10.1.tar.gz && pip install -d . virtualenv fails with a bad md5hash error 9 times out of 10. Curiously, it's the same bad hash every time: 20cf17baaf2126f99d6f652831f82d75. I used http://www.python.org/ftp/python/2.7.6/python-2.7.6.msi to install Python on this Windows 2012 server (64-bit OS, 32-bit Python). Any ideas? Marius Gedminas -- IBM motto: "We found five vowels hiding in a corner, and we used them _all_ for the 'eieio' instruction so that we wouldn't have to use them anywhere else" -- Linus Torvalds
Can you try with 1.5rc1? We switched to requests in that version and perhaps it side steps the issue? On Nov 23, 2013, at 5:05 AM, Marius Gedminas <marius@pov.lt> wrote:
I recently spin up a Windows VM on Rackspace Cloud. I'm seeing a very weird problem: downloads from PyPI fail with checksum errors, nondeterministically.
Sometimes it's a md5 hash mismatch error from pip[1]. Sometimes the error is a CRC error deep in the gzip module. It's not only pip -- I've seen this error from virtualenv[2], I've seen it from get-pip.py.
[1] https://jenkins.gedmin.as/job/restview-on-windows/1/console https://jenkins.gedmin.as/job/restview-on-windows/2/console https://jenkins.gedmin.as/job/restview-on-windows/3/console https://jenkins.gedmin.as/job/check-manifest-on-windows/11/console
[2] https://jenkins.gedmin.as/job/check-manifest-on-windows/7/console https://jenkins.gedmin.as/job/check-manifest-on-windows/6/console
The error is sporadic and tends to go away (and come back) on retries (look at the restview-on-windows jobs: first pip install mock fails, then succeeds, then fails again on next retry).
I haven't encountered any download troubles with Firefox. In fact that's how I got 'pip install tox' to work in the end -- pypi downloads of virtualenv/py libraries kept failing, so I downloaded the tarballs with Firefox and pip installed them from local files.
I've ran
curl https://pypi.python.org/packages/source/v/virtualenv/virtualenv-1.10.1.tar.g... | md5sum -
like 10 times and got the same (correct) md5 hash each time. Meanwhile
unset PIP_DOWNLOAD_CACHE rm -f virtualenv-1.10.1.tar.gz && pip install -d . virtualenv
fails with a bad md5hash error 9 times out of 10. Curiously, it's the same bad hash every time: 20cf17baaf2126f99d6f652831f82d75.
I used http://www.python.org/ftp/python/2.7.6/python-2.7.6.msi to install Python on this Windows 2012 server (64-bit OS, 32-bit Python).
Any ideas?
Marius Gedminas -- IBM motto: "We found five vowels hiding in a corner, and we used them _all_ for the 'eieio' instruction so that we wouldn't have to use them anywhere else" -- Linus Torvalds _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Sat, Nov 23, 2013 at 08:59:35AM -0500, Donald Stufft wrote:
Can you try with 1.5rc1?
This was trickier than I thought, because pip appears to be incapable of upgrading itself on Windows: $ git clone https://github.com/pypa/pip $ virtualenv env $ env/scripts/pip install -U ./pip WindowsError: [Error 5] Access is denied: 'c:\\users\\mg\\...\\scripts\\pip.exe' but the advice in https://github.com/pypa/pip/issues/1299 helped: $ env/scripts/python -m pip -U ./pip
We switched to requests in that version and perhaps it side steps the issue?
Looks like. 10 out of 10 successful downloads with pip git master using: $ unset PIP_DOWNLOAD_CACHE $ rm -f virtualenv-1.10.1.tar.gz && env/scripts/pip install -d . virtualenv As a control, I tested again with pip 1.4.1, and 2 out of 4 downloads failed with a bad hash error. Interesting. Marius Gedminas -- Those parts of the system that you can hit with a hammer (not advised) are called hardware; those program instructions that you can only curse at are called software. -- Levitating Trains and Kamikaze Genes: Technological Literacy for the 1990's.
Digging deeper: - SSLSocket.read() returns a premature EOF, truncating the downloaded file, which makes the md5 hash to be different. - if I fish out the SSLSocket object and set suppress_ragged_eofs = False, then I get ssl.SSLError: [Errno 8] _ssl.c:1426: EOF occurred in violation of protocol - SSLSocket.read(8192) returns chunks of 8000 bytes except the very first one (7669 bytes) and, in cases when the download does _not_ fail, the last one (5634 bytes). When the download fails, it's missing one or two last chunks (it varies). - SSLSocket.read(16384) does the same; SSLSocket.read(4096) returns pairs of chunks of 4096 and 3904 bytes, hinting that the underlying SSL communication works in multiples of 8000. My experimental code: https://gist.github.com/mgedmin/7637559 I now have two Wireshark pcap files of two SSL conversations: one failed, one successful. It's a bit beyond my skills to analyze them. I think the server did send everything (I see a TLSv1 Application Data record of size 5654, which looks like the final chunk), but Wireshark marks it as [TCP Out-Of-Order], and then there are some desperate-looking [TCP Retransmission] packets at the end. (Incidentally, the captured download was truncated at 1303669 bytes -- i.e. it was missing the last three TLS application data chunks of 8000, 8000 and 5634 bytes.) Given that Firefox/curl seem to be able to download PyPI packages over HTTPS without problems, I'd be inclined to blame it on a bug in the CPython's ssl module, maybe. But doesn't Requests use it too? Confused, Marius Gedminas -- MCSE == Marginal Computer Software Enthusiast
On Mon, Nov 25, 2013 at 10:38:09AM +0200, Marius Gedminas wrote:
My experimental code: https://gist.github.com/mgedmin/7637559
I now have two Wireshark pcap files of two SSL conversations: one failed, one successful. It's a bit beyond my skills to analyze them. I think the server did send everything (I see a TLSv1 Application Data record of size 5654, which looks like the final chunk), but Wireshark marks it as [TCP Out-Of-Order], and then there are some desperate-looking [TCP Retransmission] packets at the end.
(Incidentally, the captured download was truncated at 1303669 bytes -- i.e. it was missing the last three TLS application data chunks of 8000, 8000 and 5634 bytes.)
Given that Firefox/curl seem to be able to download PyPI packages over HTTPS without problems, I'd be inclined to blame it on a bug in the CPython's ssl module, maybe. But doesn't Requests use it too?
I never got to the bottom of this. I upgraded pip to 1.5 (painfully -- had to download the .tar.gz with curl, then use python -m pip install -U pip*.tar.gz to upgrade) and these SSL download errors disappeared. Marius Gedminas -- "Actually, the Singularity seems rather useful in the entire work avoidance field. "I _could_ write up that report now but if I put it off, I may well become a weakly godlike entity, at which point not only will I be able to type faster but my comments will be more on-target." - James Nicoll
On Mon, Nov 25, 2013 at 08:48 +0200, Marius Gedminas wrote:
On Sat, Nov 23, 2013 at 08:59:35AM -0500, Donald Stufft wrote:
Can you try with 1.5rc1?
This was trickier than I thought, because pip appears to be incapable of upgrading itself on Windows:
$ git clone https://github.com/pypa/pip $ virtualenv env $ env/scripts/pip install -U ./pip WindowsError: [Error 5] Access is denied: 'c:\\users\\mg\\...\\scripts\\pip.exe'
but the advice in https://github.com/pypa/pip/issues/1299 helped:
$ env/scripts/python -m pip -U ./pip
We switched to requests in that version and perhaps it side steps the issue?
Looks like. 10 out of 10 successful downloads with pip git master using:
$ unset PIP_DOWNLOAD_CACHE $ rm -f virtualenv-1.10.1.tar.gz && env/scripts/pip install -d . virtualenv
As a control, I tested again with pip 1.4.1, and 2 out of 4 downloads failed with a bad hash error.
Interesting.
Thanks for the notes, Marius. I also got md5 checksumming issues with pip-1.4.1 on a windows machine in the rackspace cloud. Alas, i cannot easily force a different pip version because pip is installed from an automation script which comes from somewhere else :( best, holger
participants (3)
-
Donald Stufft
-
holger krekel
-
Marius Gedminas