Sourceforge mirrors, again

Well, it looks like Sourceforge has found yet another way to mess with easy_install's ability to download from their mirrors. :( Specifically, they are not keeping the dl.sourceforge.net "A" list up-to-date, so easy_install's attempts to just use simple round-robin DNS aren't always working. Several IPs in the round robin "A" list are not responding, and some new mirrors haven't been added to it. At this rate, the current approach will become unusable in a relatively short timeframe. :( It seems as though there is no way to auto-discover the mirrors themselves; I had hoped that perhaps a zone transfer on the dl.sourceforge.net zone might work to obtain a list of the actual mirrors, but I haven't been able to successfully obtain one. What I'm wondering at this point is if perhaps the only sane thing to do is to publish our own mirror list via DNS, so that at least when there's a problem it can still be fixed. The idea would be to replace easy_install's current DNS lookup of 'dl.sourceforge.net' IPs with something like 'dl.sfmirrors.telecommunity.com' (for example). Anybody have any thoughts on this?

On Sep 19, 2006, at 11:51 AM, Phillip J. Eby wrote:
Well, it looks like Sourceforge has found yet another way to mess with easy_install's ability to download from their mirrors. : ( Specifically, they are not keeping the dl.sourceforge.net "A" list up-to-date, so easy_install's attempts to just use simple round-robin DNS aren't always working. Several IPs in the round robin "A" list are not responding, and some new mirrors haven't been added to it. At this rate, the current approach will become unusable in a relatively short timeframe. :(
It seems as though there is no way to auto-discover the mirrors themselves; I had hoped that perhaps a zone transfer on the dl.sourceforge.net zone might work to obtain a list of the actual mirrors, but I haven't been able to successfully obtain one.
What I'm wondering at this point is if perhaps the only sane thing to do is to publish our own mirror list via DNS, so that at least when there's a problem it can still be fixed. The idea would be to replace easy_install's current DNS lookup of 'dl.sourceforge.net' IPs with something like 'dl.sfmirrors.telecommunity.com' (for example).
That seems like a good idea. One other possibility is if we want to future-proof against changes to SF.net's download pages, there could be a web service that figures out where to send the user to a file. That way, only that service needs to change if they change their download page format. That is, of course, a lot heavier of a solution. Doing the DNS change seems like a good idea. Kevin

At 04:49 PM 9/19/2006 -0400, Kevin Dangoor wrote:
On Sep 19, 2006, at 11:51 AM, Phillip J. Eby wrote:
Well, it looks like Sourceforge has found yet another way to mess with easy_install's ability to download from their mirrors. : ( Specifically, they are not keeping the dl.sourceforge.net "A" list up-to-date, so easy_install's attempts to just use simple round-robin DNS aren't always working. Several IPs in the round robin "A" list are not responding, and some new mirrors haven't been added to it. At this rate, the current approach will become unusable in a relatively short timeframe. :(
It seems as though there is no way to auto-discover the mirrors themselves; I had hoped that perhaps a zone transfer on the dl.sourceforge.net zone might work to obtain a list of the actual mirrors, but I haven't been able to successfully obtain one.
What I'm wondering at this point is if perhaps the only sane thing to do is to publish our own mirror list via DNS, so that at least when there's a problem it can still be fixed. The idea would be to replace easy_install's current DNS lookup of 'dl.sourceforge.net' IPs with something like 'dl.sfmirrors.telecommunity.com' (for example).
That seems like a good idea. One other possibility is if we want to future-proof against changes to SF.net's download pages, there could be a web service that figures out where to send the user to a file. That way, only that service needs to change if they change their download page format. That is, of course, a lot heavier of a solution.
Doing the DNS change seems like a good idea.
I've implemented a proof of concept as 'sf-mirrors.telecommunity.com', with a cron job that scrapes the mirror names via HTTP and then updates the zone file. For the moment, it's set up to automatically halt if there's any change in the mirror names or the number of mirrors, so I can make sure the change isn't due to SF changing their UI again. If there are no changes and the script is successful in pulling the current IPs for the named mirrors, it updates the zone file. Anybody want to give it a try? Just change all references to 'dl.sourceforge.net' in setuptools/package_index.py with references to 'sf-mirrors.telecommunity.com'. I'm not sure what I think about it, exactly. One issue is that it makes it look like the software is "phoning home" to me, or that downloads are coming from my servers, even though they are unrelated. It's also possible that some mirrors might freak when they receive a 'Host:' header that points to telecommunity.com! So, I'm not 100% sure this can work reliably yet. Maybe it would be better to just encourage SF to fix their broken DNS records. :(
participants (3)
-
Eric S. Johansson
-
Kevin Dangoor
-
Phillip J. Eby