[Distutils] Fixing PyPI download stats with real-time log analysis (Was: PyPI Download Counts)

anatoly techtonik techtonik at gmail.com
Fri Jun 14 16:05:41 CEST 2013


On Sun, Jun 9, 2013 at 9:33 PM, Donald Stufft <donald at stufft.io> wrote:

>
> Fastly has been a dream to work with. They've been fast at fixing and
> diagnosing issues, have helped tune the config to get a higher hit rate,
> and when they heard that people were upset that download counts
> had to be turned off they offered the logging support to be turned on
> for our account.
>
> The infrastructure is setup to receive the logs (and intact was receiving
> logs for a day or so) but upgrades to the VM that runs PyPI needs to
> occur before we can continue receiving them. The disk drive that PyPI
> has is to small to handle the volume of request data that is coming in
> from Fastly and it quickly filled up in under 24 hours. Upgrading that
> space requires powering off the VM so we (The Infra team) are working
> on doing that, ideally without downtime on PyPI.
>

Disk drives? It is unneeded bottleneck. What is needed is resident log
cruncher that processes log entries storing intermediate results in
memcache, and updates DB counts in batch. This can be even done on
AppEngine. Something makes me believe that we can get some extra free quota
for this purpose. =)

Could you, please, share the log format+example, so that we can experiment
with it?
-- 
anatoly t.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20130614/97584d9b/attachment.html>


More information about the Distutils-SIG mailing list