On Fri, Jun 18, 2010 at 6:27 PM, Tarek Ziadé <span dir="ltr"><<a href="mailto:ziade.tarek@gmail.com">ziade.tarek@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im">On Fri, Jun 18, 2010 at 6:44 PM, Ian Bicking <<a href="mailto:ianb@colorstudy.com">ianb@colorstudy.com</a>> wrote:<br>
> With all the reliability discussion, I thought I'd offer a kind of<br>
> counterproposal, that we rewrite PyPI to use App Engine.<br>
><br>
> Of course, this means writing code, etc., but I believe this is a reasonable<br>
> goal. I think if "we" (Catalog-SIG? PyPI maintainers?) committed to using<br>
> such an implementation (assuming it was of good quality) that we could find<br>
> people (probably not on this list) to write and maintain the code. People<br>
> have already rewritten PyPI a couple times, but no one knows what exactly to<br>
> *do* with the rewrite so they haven't gone anywhere. And PyPI is not a<br>
> particularly complicated application. I think we can set the bar high on<br>
> the implementation quality and that people will meet it, so long as they<br>
> know their effort won't be in vain.<br>
<br>
</div>Out of curiosity : have you ever worked with the current implementation ?<br>
<br>
I have hard time to understand why some people say it's hard to work with it,<br>
I don't think its a valid argument.<br></blockquote><div><br>I haven't looked at it in years, but I've poked around it some. I found it difficult, yes.<br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im">> Why App Engine? The primary reason I'm proposing it is because it will be<br>
> much easier to manage. If it runs out of memory it won't bring down a<br>
> machine. If new people maintain the system it's easy to describe how to do<br>
> deployments, for instance. It's easy for people to install their own PyPI<br>
> instances for development and to generate patches. Hosted services can have<br>
> downtimes of course, but unlike currently there are other people (the App<br>
> Engine maintainers) who will resolve those problems. There's still a class<br>
> of bugs like badly indexed tables or weird locking issues that could bring<br>
> PyPI down and "we" would have to fix it, and with a rewrite there's more of<br>
> a risk of that, but... it'll just take some testing to make sure things are<br>
> okay.<br>
><br>
> In terms of cost, I expect we can get free hosting, and packages can be<br>
> stored directly in the data store. That doesn't preclude using a CDN like<br>
> CloudFront, but that can be handled separately. Also since the index just<br>
> links to packages, packages can be incrementally uploaded to a CDN.<br>
<br>
</div>Even if I don't think its a priority in our concerns (community<br>
mirrors come first), I wouldn't mind having the main PyPI UI in GAE.<br></blockquote><div><br>The priorities that motivate me are:<br><br>1. Make installation more reliable with respect to PyPI<br>2. Decrease overall maintenance burden<br>
3. Decrease code liability<br><br>Community mirrors only address 1 while App Engine addresses 2 and a rewrite addresses 3. And I think App Engine would be significantly more reliable than PyPI with mirrors. It's less moving parts, and it's built on infrastructure that is highly automated. Also because it requires less maintenance, if someone drops out of communication for a while or goes on vacation or something, it's not something that needs active tending. <br>
<br>There's a significant number of failure conditions that a mirror network doesn't protect you from. Connection refused, connection timed out, and 500 errors are the only really obvious errors that will make a tool look to the next mirror. Because of potential synchronization problems there's a lot of new problems a mirror network could introduce.<br>
<br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Although, if PyPI was to be ported to GAE, couldn't we reuse the<br>
existing code instead of rewriting from scratch ? we would just have<br>
to rewrite the DB layer.<br></blockquote><div><br>I don't think it's worth reusing that code.<br><br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im">
> Besides a commitment to using the code (which I think is really important to<br>
> motivate people), a scrubbed dump of the database would be really helpful<br>
> for development. I know we've passed around complete dumps to people, but<br>
> it contains private information so we can't put it up publicly which creates<br>
> a speed bump for developers.<br>
<br>
</div>Private information could be easily removed from those dumps;<br>
<br>
But I don't think it's so helpful since you have all the .sql scripts to create<br>
your own DB. But we could add a script to create some sample data on<br>
the top of those scripts.<font color="#888888"><a href="http://ziade.org" target="_blank"></a><br></font></blockquote><div><br>It's useful to have a representative data set to test with, especially for stuff like performance testing.<br>
</div></div><br>-- <br>Ian Bicking | <a href="http://blog.ianbicking.org">http://blog.ianbicking.org</a><br>