"Buildbot" category on the tracker

Hello, What do you think of creating a "buildbot" category in the tracker? There are often problems on specific buildbots which would be nice to track, but there's nowhere to do so. Regards Antoine.

What do you think of creating a "buildbot" category in the tracker? There are often problems on specific buildbots which would be nice to track, but there's nowhere to do so.
Do you have any specific reports that you would want to classify with this category? Regards, Martin

Martin v. Löwis <martin <at> v.loewis.de> writes:
What do you think of creating a "buildbot" category in the tracker? There are often problems on specific buildbots which would be nice to track, but
there's
nowhere to do so.
Do you have any specific reports that you would want to classify with this category?
I was thinking of http://bugs.python.org/issue4970 Regards Antoine.

On 02:30 pm, solipsis@pitrou.net wrote:
Hello,
What do you think of creating a "buildbot" category in the tracker? There are often problems on specific buildbots which would be nice to track, but there's nowhere to do so.
Is your idea that this would be for tracking issues with the *bots* themselves? That is, not just for tracking cases where some test method fails on a particular bot, but for tracking cases where, say, a bot's host has run out of disk space and cannot run the tests at all? For the case where a test is failing because of some platform or environment issue, it seems more sensible to track the ticket as relating to that platform or environment, or track it in relation to the feature it affects. Of course, tickets could move between these classifications as investigation reveals new information about the problem. Jean-Paul

On Thu, Oct 29, 2009 at 7:04 PM, <exarkun@twistedmatrix.com> wrote:
On 02:30 pm, solipsis@pitrou.net wrote:
Hello,
What do you think of creating a "buildbot" category in the tracker? There are often problems on specific buildbots which would be nice to track, but there's nowhere to do so.
Is your idea that this would be for tracking issues with the *bots* themselves? That is, not just for tracking cases where some test method fails on a particular bot, but for tracking cases where, say, a bot's host has run out of disk space and cannot run the tests at all?
For the case where a test is failing because of some platform or environment issue, it seems more sensible to track the ticket as relating to that platform or environment, or track it in relation to the feature it affects.
Of course, tickets could move between these classifications as investigation reveals new information about the problem.
Then again, I know for a fact certain tests fail ONLY on certain buildbots because of the way they're configured. For example, certain multiprocessing tests will fail if /dev/shm isn't accessible on Linux, and several of the buildbosts are in tight chroot jails and don't have that exposed. Is it a bug in that buildbot, a platform specific bug, etc? jesse

On Thu, 29 Oct 2009 at 19:41, Jesse Noller wrote:
Then again, I know for a fact certain tests fail ONLY on certain buildbots because of the way they're configured. For example, certain multiprocessing tests will fail if /dev/shm isn't accessible on Linux, and several of the buildbosts are in tight chroot jails and don't have that exposed.
Is it a bug in that buildbot, a platform specific bug, etc?
I'd say that particular one is a bug in the tests. If /dev/shm is not available and is required, then the tests should be skipped with an appropriate message. It would also secondarily be an issue with the buildbot fleet, since multiprocessing would then not be getting thoroughly tested by those buildbots. IMO a buildbot category might be useful for bugs that show up in a buildbot but no one can (currently) reproduce, or problems with the buildbots themselves. I don't think we currently have any bugs filed that fall in the second category, but multiprocessing not getting completely tested because of lack of /dev/shm would fall into that category. Issue 4970 was in the first category until recently. But the real reason for having a buildbot category (or at least a keyword) would be to be able to tag all bugs that are currently making buildbots fail that are _not_ the result of a recent checkin. This would make the task of finding the bugs that need to be cleaned up to stabilize the buildbot fleet easier. I'm currently aware of issues 4970, 3892, and 6462 in this category, and there are a few more that we can/will file if we continue to pay attention to the failure reports now arriving on the irc channel. --David (RDM)

On Thu, Oct 29, 2009 at 8:31 PM, R. David Murray <rdmurray@bitdance.com> wrote:
I'd say that particular one is a bug in the tests. If /dev/shm is not available and is required, then the tests should be skipped with an appropriate message. It would also secondarily be an issue with the buildbot fleet, since multiprocessing would then not be getting thoroughly tested by those buildbots.
Fwiw: The tests skip on those platforms; but multiprocessing can't function properly on platforms like that.

Jesse Noller wrote:
On Thu, Oct 29, 2009 at 8:31 PM, R. David Murray <rdmurray@bitdance.com> wrote:
I'd say that particular one is a bug in the tests. If /dev/shm is not available and is required, then the tests should be skipped with an appropriate message. It would also secondarily be an issue with the buildbot fleet, since multiprocessing would then not be getting thoroughly tested by those buildbots.
Fwiw: The tests skip on those platforms; but multiprocessing can't function properly on platforms like that.
I'm confused: first you said they fail, now you say they get skipped. Which one is it? I agree with R. David's analysis: if they fail, it's a multiprocessing bug, if they get skipped, it's a flaw in the build slave configuration (but perhaps only slightly so, because it is good if both cases are tested - and we do have machines also that provide /dev/shm). Regards, Martin

On Fri, Oct 30, 2009 at 4:53 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I'm confused: first you said they fail, now you say they get skipped. Which one is it? I agree with R. David's analysis: if they fail, it's a multiprocessing bug, if they get skipped, it's a flaw in the build slave configuration (but perhaps only slightly so, because it is good if both cases are tested - and we do have machines also that provide /dev/shm).
They failed until we had the tests skip those platforms - at the time, I felt that it was more of a bug with the build slave configuration than a multiprocessing issue, I don't like skipping tests unless the platform fundamentally isn't supported (e.g. broken semaphores for some actions on OS/X) - linux platforms support this functionality just fine - except when in locked-down chroot jails. The only reason I brought it up was to point out the a buildbot configuration on a given host can make tests fail even if those tests would normally pass on that operating system. jesse

On 12:55 pm, jnoller@gmail.com wrote:
On Fri, Oct 30, 2009 at 4:53 AM, "Martin v. L�wis" <martin@v.loewis.de> wrote:
I'm confused: first you said they fail, now you say they get skipped. Which one is it? I agree with R. David's analysis: if they fail, it's a multiprocessing bug, if they get skipped, it's a flaw in the build slave configuration (but perhaps only slightly so, because it is good if both cases are tested - and we do have machines also that provide /dev/shm).
They failed until we had the tests skip those platforms - at the time, I felt that it was more of a bug with the build slave configuration than a multiprocessing issue, I don't like skipping tests unless the platform fundamentally isn't supported (e.g. broken semaphores for some actions on OS/X) - linux platforms support this functionality just fine - except when in locked-down chroot jails.
The only reason I brought it up was to point out the a buildbot configuration on a given host can make tests fail even if those tests would normally pass on that operating system.
Just as a build slave can be run in a chroot, so can any other Python program. This is a real shortcoming of the multiprocessing module. It's entirely possible that people will want to run Python software in chroots sometimes. So it's proper to acknowledge that this is an unsupported environment. The fact that the kernel in use is the same as the kernel in use on another supported platform is sort of irrelevant. The kernel is just one piece of the system, there are many other important pieces. Jean-Paul

On Fri, Oct 30, 2009 at 10:15 AM, <exarkun@twistedmatrix.com> wrote:
On 12:55 pm, jnoller@gmail.com wrote:
On Fri, Oct 30, 2009 at 4:53 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I'm confused: first you said they fail, now you say they get skipped. Which one is it? I agree with R. David's analysis: if they fail, it's a multiprocessing bug, if they get skipped, it's a flaw in the build slave configuration (but perhaps only slightly so, because it is good if both cases are tested - and we do have machines also that provide /dev/shm).
They failed until we had the tests skip those platforms - at the time, I felt that it was more of a bug with the build slave configuration than a multiprocessing issue, I don't like skipping tests unless the platform fundamentally isn't supported (e.g. broken semaphores for some actions on OS/X) - linux platforms support this functionality just fine - except when in locked-down chroot jails.
The only reason I brought it up was to point out the a buildbot configuration on a given host can make tests fail even if those tests would normally pass on that operating system.
Just as a build slave can be run in a chroot, so can any other Python program. This is a real shortcoming of the multiprocessing module. It's entirely possible that people will want to run Python software in chroots sometimes. So it's proper to acknowledge that this is an unsupported environment. The fact that the kernel in use is the same as the kernel in use on another supported platform is sort of irrelevant. The kernel is just one piece of the system, there are many other important pieces.
Jean-Paul
I'm well aware of that.

On Fri, 30 Oct 2009 at 08:55, Jesse Noller wrote:
On Fri, Oct 30, 2009 at 4:53 AM, "Martin v. L�wis" <martin@v.loewis.de> wrote:
I'm confused: first you said they fail, now you say they get skipped. Which one is it? I agree with R. David's analysis: if they fail, it's a multiprocessing bug, if they get skipped, it's a flaw in the build slave configuration (but perhaps only slightly so, because it is good if both cases are tested - and we do have machines also that provide /dev/shm).
They failed until we had the tests skip those platforms - at the time, I felt that it was more of a bug with the build slave configuration than a multiprocessing issue, I don't like skipping tests unless the platform fundamentally isn't supported (e.g. broken semaphores for some actions on OS/X) - linux platforms support this functionality just fine - except when in locked-down chroot jails.
As Martin pointed out, Python supports both configurations (chroot and non-chroot), and needs to be tested in both. Somewhere we should probably have a list of what tests are getting skipped on what buildslaves so we can inspect the buildbot fleet for complete coverage, but I'm not sure who is going to volunteer to create and maintain that list :)
The only reason I brought it up was to point out the a buildbot configuration on a given host can make tests fail even if those tests would normally pass on that operating system.
Yes, and that's a kind of ticket that should end up getting tagged with the new 'buildbot' keyword (thanks, Martin), IMO. --David

But the real reason for having a buildbot category (or at least a keyword) would be to be able to tag all bugs that are currently making buildbots fail that are _not_ the result of a recent checkin. This would make the task of finding the bugs that need to be cleaned up to stabilize the buildbot fleet easier. I'm currently aware of issues 4970, 3892, and 6462 in this category, and there are a few more that we can/will file if we continue to pay attention to the failure reports now arriving on the irc channel.
That's convincing; I've created a "buildbot" keyword. I gave it the description "indicates that tests fail on a buildslave for uncertain reasons" If that is indeed the intended purpose of this classification, please keep it in mind when assigning the tag. If I misunderstood the purpose of the keyword, please provide a better description. Regards, Martin

On Fri, 30 Oct 2009 at 09:57, "Martin v. L�wis" wrote:
But the real reason for having a buildbot category (or at least a keyword) would be to be able to tag all bugs that are currently making buildbots fail that are _not_ the result of a recent checkin. This would make the task of finding the bugs that need to be cleaned up to stabilize the buildbot fleet easier. I'm currently aware of issues 4970, 3892, and 6462 in this category, and there are a few more that we can/will file if we continue to pay attention to the failure reports now arriving on the irc channel.
That's convincing; I've created a "buildbot" keyword. I gave it the description
"indicates that tests fail on a buildslave for uncertain reasons"
If that is indeed the intended purpose of this classification, please keep it in mind when assigning the tag. If I misunderstood the purpose of the keyword, please provide a better description.
How about: "indicates that related test failures are causing buildbot instability" My thought is that sometimes we more-or-less know the reasons for the failure, but for one reason or another we can't fix it immediately, and I'd like to keep such a bug visible when looking at buildbot related issues. IMO it would be no bad thing for this tag to be applied to any issue that is created as a result of an observed test failure on a buildbot. Such an issue should only get created if the person who did the checkin that caused it can't reproduce the problem themselves (ie: they ran the test suite and on their platform it was clean). Now, we know that in practice some bugs show up on buildbots because a committer forgot to run the test suite prior to check in (we all make mistakes), but if such a bug gets tagged 'buildbot' I think that's fine, since it will still be affecting the stability of the buildbots. --David

On 29 Oct, 11:41 pm, jnoller@gmail.com wrote:
On Thu, Oct 29, 2009 at 7:04 PM, <exarkun@twistedmatrix.com> wrote:
On 02:30 pm, solipsis@pitrou.net wrote:
Hello,
What do you think of creating a "buildbot" category in the tracker? There are often problems on specific buildbots which would be nice to track, but there's nowhere to do so.
Is your idea that this would be for tracking issues with the *bots* themselves? �That is, not just for tracking cases where some test method fails on a particular bot, but for tracking cases where, say, a bot's host has run out of disk space and cannot run the tests at all?
For the case where a test is failing because of some platform or environment issue, it seems more sensible to track the ticket as relating to that platform or environment, or track it in relation to the feature it affects.
Of course, tickets could move between these classifications as investigation reveals new information about the problem.
Then again, I know for a fact certain tests fail ONLY on certain buildbots because of the way they're configured. For example, certain multiprocessing tests will fail if /dev/shm isn't accessible on Linux, and several of the buildbosts are in tight chroot jails and don't have that exposed.
Is it a bug in that buildbot, a platform specific bug, etc?
It's a platform configuration that can exist. If you're rejecting that configuration and saying that CPython will not support it, then it's silly to have a buildbot set up that way, and presumably that should be changed. The point is that this isn't about buildbot. It's about CPython and what platforms it supports. Categorizing it by "buildbot" is not useful, because no one is going to be cruising along looking for multiprocessing issues to fix by browsing tickets by the "buildbot" category. If, on the other hand, (sticking with this example) /dev/shm-less systems are not a platform that CPython wants to support, then having a buildbot running on one is a bit silly. It will probably always fail, and all it does is contribute another column of red. Who does that help? Jean-Paul

<exarkun <at> twistedmatrix.com> writes:
Is your idea that this would be for tracking issues with the *bots* themselves? That is, not just for tracking cases where some test method fails on a particular bot, but for tracking cases where, say, a bot's host has run out of disk space and cannot run the tests at all?
Well the general situation would be slightly easier to appreciate if there was a public medium where buildbot info was exchanged, announcements done, and problems tracked. Some kind of tracker, tracker keyword, mailing-list, or anything else. Regards Antoine.

Well the general situation would be slightly easier to appreciate if there was a public medium where buildbot info was exchanged, announcements done, and problems tracked. Some kind of tracker, tracker keyword, mailing-list, or anything else.
As for the tracker keyword - I created one (in case you didn't notice). As for exchanging info: if you talk about the specific Python buildbot installation (instead of info about buildbot, the software), then python-dev is the proper place to exchange info about it. Regards, Martin
participants (5)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
exarkun@twistedmatrix.com
-
Jesse Noller
-
R. David Murray