On Jul 20, 2016, at 11:01 AM, Adi Roiban <adi@roiban.ro> wrote:



On 20 July 2016 at 17:51, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:

On Jul 20, 2016, at 6:31 AM, Adi Roiban <adi@roiban.ro> wrote:



On 18 July 2016 at 19:04, James Broadhead <jamesbroadhead@gmail.com> wrote:
On 17 July 2016 at 07:21, Amber Brown <hawkowl@atleastfornow.net> wrote:

It's OOMing  (...)


Have you considered something like monit[1] to detect & restart in cases like this?  


This might help, but will not help up understand what we are doing wrong :)

After disabling the github webhooks, the buildbot look stable... so we might have a clue about what goes wrong.

Right now I don't have time to look into this issue, so github hooks are disabled for now from the GitHub UI.

Can someone who's had a direct look at the OOMing process (adi? amber?) report this upstream?  It's a real pity that we won't get github statuses for buildbot builds any more; that was a huge step in the right direction.


I don't know how to grasp this.
By the time I was observing the issue, the buildbot process was already dead.

Yeah, these types of issues are tricky to debug.  Thanks for looking into it nonetheless; I was hoping you knew more, but if you don't, nothing to be done.

I have recently discovered the Rackspace monitoring capabilities for VM... and set up a memory notification... not sure who will receive the alerts.

I'll make sure that the relevant people are on the monitoring list.

I have re-enable to GitHub hooks and will start taking a closer look at the buildmaster process.... but maybe 2GB is just not enough for a buildmaster.

Thanks.

I have triggered the creation of an image for the current buildbot machine and will consider upgrading the buildbot to 4GB of memory to see if we still hit the ceiling.

For my project I have a similar buildmaster based on number of builders and slaves (without github hooks and without linter factories) and in 2 weeks of uptime the virtual memory usage is 1.5GB
.... so mabybe 2GB is just not enough for buildbot.

Bummer.  It does seem like that's quite likely.

-glyph