[Distutils] Fixing PyPI download stats with real-time log analysis (Was: PyPI Download Counts)
anatoly techtonik
techtonik at gmail.com
Wed Jun 26 13:41:32 CEST 2013
On Sun, Jun 23, 2013 at 6:11 PM, Donald Stufft <donald at stufft.io> wrote:
>
> On Jun 23, 2013, at 9:36 AM, Alex Clark <aclark at aclark.net> wrote:
>
> Hi Noah,
>
>
> Noah Kantrowitz <noah <at> coderanger.net> writes:
>
>
>
> On Jun 22, 2013, at 10:33 PM, anatoly techtonik wrote:
>
> On Fri, Jun 14, 2013 at 5:05 PM, anatoly techtonik <techtonik <at>
>
> gmail.com> wrote:
>
> Could you, please, share the log format+example, so that we can
>
> experiment with it?
>
>
> ping
>
>
> Additional assistance is not required for this project. Thank you for your
>
> interest, and stay tuned for
>
> future updates.
>
>
>
> WAT.
>
>
> Can you explain why "additional assistance is not required for this
> project"? AFAICT:
>
> - Shit is broken (i.e. download counts)
> - At first, there was no plans to fix it (which I accepted as a cost to
> enable the CDN)
> - Now there are plans to fix it, but no additional assistance is needed? If
> that were the case I'd expect download counts to be fixed. Or at the very
> least, I'd expect a "no thanks we don't need help, but here is where we are
> at [ explanation of situation ]". Not "we don't need help, we'll be in
> touch". I think the latter is a disservice to the community, which (I
> assume) you especially don't want to be responsible for given the
> tremendously positive service you've done by enabling the CDN in the first
> place.)
>
>
> We get it: download counts were not as important as CDN. But they *are*
> important, so let's keep talking about how to fix them.
>
>
> https://en.wikipedia.org/wiki/Brooks%27_law
>
> To further expand. Right now the biggest issue preventing the CDN download
> counts is a less than ideal configuration on the VM hosting PyPI. I've been
> working on (and almost have completed) a Chef cookbook that will allow us
> to easily deploy PyPI to a second VM (and gracefully do the switch over)
> with minimal downtime.
>
> However there is an issue with launching VM's at the moment that OSUOL is
> looking into which means even with a completed cookbook we couldn't launch
> a VM at the moment anyways. If we wanted to correct the configuration in
> place it would require downtime.
>
I understand the pressure you're working, but I believe Brooks law doesn't
apply here, because you're working on server configuration problem, while I
propose to help with actual download counting problem. It is, of course,
interesting to know the details of what are you trying to achieve to see
where people can plug in. As I understand the problem you're trying to
solve right now is (1) how to store the log or to receive it. If you're
solving some problem (0) and don't have time for (1) yet - tell us the
input data - so that somebody could prepare some variants. Problem (2), is
to pipe the log to analyzer. I guess it will transform from "something" you
got in (1) into stream of lines. Problem (3) is count processing - consume
line stream, process, organize in batches and write to PyPI store. Working
on (3) doesn't require any of the (0)-(2) steps to be complete.
I am pushing this because I doubt that Chef configuration is an easy and
fast process. At least a year ago my experience with Chef in a limited time
constraint was negative (which should not come as a surprise, because I
mostly see only things that are negative). I started with zero after Noah's
talk at PyCon, but within 2 months I failed to provide a solution for
automatic server deployment for just a couple of servers with a little bit
different configuration for web and DB server than official cookbooks
support. This post at
http://lumberjaph.net/devops/2012/11/27/ansible-and-chef.html summarizes my
experience. I am not sure you will be allowed to use Ansible for this
stuff, but I'd give it a try at least for a prototype configuration.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20130626/187e010d/attachment.html>
More information about the Distutils-SIG
mailing list