Strange umask(?)/st_mode issue

Hi I'm trying to track down the cause of the failed Linux tests on this build (it's also been happening on random PR builds for the last day or two, but this is the first I've seen it happen on already merged code): https://dev.azure.com/Python/cpython/_build/results?buildId=40189 One of the failed tests is this one from test_tarfile.py (and I think this is representative of the others): def test_file_mode(self): # Test for issue #8464: Create files with correct # permissions. if os.path.exists(tmpname): support.unlink(tmpname) original_umask = os.umask(0o022) try: tar = tarfile.open(tmpname, self.mode) tar.close() mode = os.stat(tmpname).st_mode & 0o777 self.assertEqual(mode, 0o644, "wrong file permissions") finally: os.umask(original_umask) It's failing because the mode of the new TAR file is 0o666 rather than 0o644. I'd guess that this means the umask() call is not having effect? But honestly this is beyond me - I don't have any idea what would cause Python to ignore that, and I don't see anything in the commit history that could be at fault. I'm hoping someone can tell me that it's definitely a platform configuration issue or it's definitely interference between two of our tests (which run in random order, which would explain the intermittent nature of this, but I haven't spotted any tests that seem likely either). To see the other failed PR builds, the full list is at https://dev.azure.com/Python/cpython/_build?definitionId=9 and most of the ones from today have failed because of whatever is causing it. Any help? Thanks, Steve

That does sound strange. How easily can you reproduce it? That majorly effects how I would try to debug something like this... If you're able to get an strace of a failed run then that would probably tell us a *lot*. On Fri, Mar 29, 2019, 16:28 Steve Dower <steve.dower@python.org> wrote:
Hi
I'm trying to track down the cause of the failed Linux tests on this build (it's also been happening on random PR builds for the last day or two, but this is the first I've seen it happen on already merged code):
https://dev.azure.com/Python/cpython/_build/results?buildId=40189
One of the failed tests is this one from test_tarfile.py (and I think this is representative of the others):
def test_file_mode(self): # Test for issue #8464: Create files with correct # permissions. if os.path.exists(tmpname): support.unlink(tmpname)
original_umask = os.umask(0o022) try: tar = tarfile.open(tmpname, self.mode) tar.close() mode = os.stat(tmpname).st_mode & 0o777 self.assertEqual(mode, 0o644, "wrong file permissions") finally: os.umask(original_umask)
It's failing because the mode of the new TAR file is 0o666 rather than 0o644. I'd guess that this means the umask() call is not having effect? But honestly this is beyond me - I don't have any idea what would cause Python to ignore that, and I don't see anything in the commit history that could be at fault.
I'm hoping someone can tell me that it's definitely a platform configuration issue or it's definitely interference between two of our tests (which run in random order, which would explain the intermittent nature of this, but I haven't spotted any tests that seem likely either).
To see the other failed PR builds, the full list is at https://dev.azure.com/Python/cpython/_build?definitionId=9 and most of the ones from today have failed because of whatever is causing it.
Any help?
Thanks, Steve _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%40pobox.com

On 29Mar.2019 1731, Nathaniel Smith wrote:
That does sound strange.
Oh good, it's not just me :)
How easily can you reproduce it? That majorly effects how I would try to debug something like this...
No idea. Looking at the builds on Pipelines again, it seems that all the builds since about 10am PDT have been failing, so perhaps they rolled out something that changed how umask works? I don't have any idea whether that's possible though.
If you're able to get an strace of a failed run then that would probably tell us a *lot*.
I'll need help with that :) But if it's just commands in the .azure-pipelines/posix-steps.yml file then sending a PR with them added should do it. Though given there seems to be a clear start time and no relevant change, I'm inclined to report it to the Pipelines team (which I already have) and see what they say. Normally my internal CPython builds catch issues in the hosted VMs before they reach public accounts, but this may have been a hotfix for some other issue. Cheers, Steve

On 29Mar2019 19:30, Steve Dower <steve.dower@python.org> wrote:
On 29Mar.2019 1731, Nathaniel Smith wrote:
That does sound strange.
Oh good, it's not just me :)
How easily can you reproduce it? That majorly effects how I would try to debug something like this...
No idea. Looking at the builds on Pipelines again, it seems that all the builds since about 10am PDT have been failing, so perhaps they rolled out something that changed how umask works? I don't have any idea whether that's possible though.
If you're able to get an strace of a failed run then that would probably tell us a *lot*.
I'll need help with that :) But if it's just commands in the .azure-pipelines/posix-steps.yml file then sending a PR with them added should do it.
Can you get a branch into your pipeline? Then you could just hack the tarfile test with something quick and dirty like: pid = os.getpid() system("strace -p %d 2>/path/to/strace.out &" % pid) time.sleep(2) # get strace heaps of time to start just before the tarfile open. A ghastly hack but it would get you debugging info. You could even decide to remove the strace.out file if the umask issue doesn't show, if it is erratic (can't see why it would be though). Cheers, Cameron Simpson <cs@cskk.id.au>

On 29Mar.2019 1939, Cameron Simpson wrote:
Can you get a branch into your pipeline? Then you could just hack the tarfile test with something quick and dirty like:
pid = os.getpid() system("strace -p %d 2>/path/to/strace.out &" % pid) time.sleep(2) # get strace heaps of time to start
just before the tarfile open. A ghastly hack but it would get you debugging info. You could even decide to remove the strace.out file if the umask issue doesn't show, if it is erratic (can't see why it would be though).
Perfect, I'll give this a go. Thanks!

On 29Mar.2019 1944, Steve Dower wrote:
On 29Mar.2019 1939, Cameron Simpson wrote:
Can you get a branch into your pipeline? Then you could just hack the tarfile test with something quick and dirty like:
pid = os.getpid() system("strace -p %d 2>/path/to/strace.out &" % pid) time.sleep(2) # get strace heaps of time to start
just before the tarfile open. A ghastly hack but it would get you debugging info. You could even decide to remove the strace.out file if the umask issue doesn't show, if it is erratic (can't see why it would be though).
Perfect, I'll give this a go. Thanks!
I set up a PR to collect this trace and the results are at: https://github.com/python/cpython/pull/12625 However, I suspect it's a non-result: umask(022) = 022 open("/home/vsts/work/1/s/build/test_python_5154/@test_5154_tmp-tardir/tmp.tar", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3 write(3, "BZh91AY&SYY{\270\344\0\0\24P\0\300\0\4\0\0\10 \0000\314\5)\246"..., 46) = 46 close(3) = 0 stat("/home/vsts/work/1/s/build/test_python_5154/@test_5154_tmp-tardir/tmp.tar", {st_mode=S_IFREG|0666, st_size=46, ...}) = 0 Happy to take more suggestions if anyone has them. Thanks, Steve

Steve Dower <steve.dower@python.org> writes:
On 29Mar.2019 1944, Steve Dower wrote:
On 29Mar.2019 1939, Cameron Simpson wrote:
Can you get a branch into your pipeline? Then you could just hack the tarfile test with something quick and dirty like:
pid = os.getpid() system("strace -p %d 2>/path/to/strace.out &" % pid) time.sleep(2) # get strace heaps of time to start
just before the tarfile open. A ghastly hack but it would get you debugging info. You could even decide to remove the strace.out file if the umask issue doesn't show, if it is erratic (can't see why it would be though).
Perfect, I'll give this a go. Thanks!
I set up a PR to collect this trace and the results are at: https://github.com/python/cpython/pull/12625
However, I suspect it's a non-result:
umask(022) = 022 open("/home/vsts/work/1/s/build/test_python_5154/@test_5154_tmp-tardir/tmp.tar", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3 write(3, "BZh91AY&SYY{\270\344\0\0\24P\0\300\0\4\0\0\10 \0000\314\5)\246"..., 46) = 46 close(3) = 0 stat("/home/vsts/work/1/s/build/test_python_5154/@test_5154_tmp-tardir/tmp.tar", {st_mode=S_IFREG|0666, st_size=46, ...}) = 0
Happy to take more suggestions if anyone has them.
The file must exist already. Try moving the strace capturing to before the unlink. -- regards, kushal

On 30.03.2019 19:00, Steve Dower wrote:
On 29Mar.2019 1944, Steve Dower wrote:
On 29Mar.2019 1939, Cameron Simpson wrote:
Can you get a branch into your pipeline? Then you could just hack the tarfile test with something quick and dirty like:
pid = os.getpid() system("strace -p %d 2>/path/to/strace.out &" % pid) time.sleep(2) # get strace heaps of time to start
just before the tarfile open. A ghastly hack but it would get you debugging info. You could even decide to remove the strace.out file if the umask issue doesn't show, if it is erratic (can't see why it would be though). Perfect, I'll give this a go. Thanks! I set up a PR to collect this trace and the results are at: https://github.com/python/cpython/pull/12625
However, I suspect it's a non-result:
umask(022) = 022 open("/home/vsts/work/1/s/build/test_python_5154/@test_5154_tmp-tardir/tmp.tar", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3 write(3, "BZh91AY&SYY{\270\344\0\0\24P\0\300\0\4\0\0\10 \0000\314\5)\246"..., 46) = 46 close(3) = 0 stat("/home/vsts/work/1/s/build/test_python_5154/@test_5154_tmp-tardir/tmp.tar", {st_mode=S_IFREG|0666, st_size=46, ...}) = 0
Happy to take more suggestions if anyone has them.
According to https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops&tabs=yaml#use-a-microsoft-hosted-agent , MS uses Ubuntu 16.04 http://manpages.ubuntu.com/manpages/xenial/man2/umask.2.html suggests that umask is ignored if the parent directory has a default ACL (a newly-created dir inherits those from its parent). As per https://linuxconfig.org/how-to-manage-acls-on-linux , the following commands should show if acls are enabled on the current FS and if any are active on the dir: DEVICE=$(df <location> | tail -n +2 | awk '{print $1}') sudo tune2fs -l $DEVICE | grep -w "Default mount options" mount | grep -w $DEVICE getfacl <location> In `getfacl' output for a directory, entries that start with "default:" list the default ACL . `setfacl -b <location>' removes all ACLs from location.
Thanks, Steve _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/vano%40mail.mipt.ru
-- Regards, Ivan

I wouldn't expect it to be the case in a CI environment but I believe a umask can be overridden if the filesystem is mounted and configured with acls set? (oh, hah, Ivan just said the same thing) -gps On Sat, Mar 30, 2019 at 9:05 AM Steve Dower <steve.dower@python.org> wrote:
On 29Mar.2019 1944, Steve Dower wrote:
On 29Mar.2019 1939, Cameron Simpson wrote:
Can you get a branch into your pipeline? Then you could just hack the tarfile test with something quick and dirty like:
pid = os.getpid() system("strace -p %d 2>/path/to/strace.out &" % pid) time.sleep(2) # get strace heaps of time to start
just before the tarfile open. A ghastly hack but it would get you debugging info. You could even decide to remove the strace.out file if the umask issue doesn't show, if it is erratic (can't see why it would be though).
Perfect, I'll give this a go. Thanks!
I set up a PR to collect this trace and the results are at: https://github.com/python/cpython/pull/12625
However, I suspect it's a non-result:
umask(022) = 022
open("/home/vsts/work/1/s/build/test_python_5154/@test_5154_tmp-tardir/tmp.tar", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3 write(3, "BZh91AY&SYY{\270\344\0\0\24P\0\300\0\4\0\0\10 \0000\314\5)\246"..., 46) = 46 close(3) = 0
stat("/home/vsts/work/1/s/build/test_python_5154/@test_5154_tmp-tardir/tmp.tar", {st_mode=S_IFREG|0666, st_size=46, ...}) = 0
Happy to take more suggestions if anyone has them.
Thanks, Steve _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/greg%40krypto.org

On 30Mar2019 1130, Gregory P. Smith wrote:
I wouldn't expect it to be the case in a CI environment but I believe a umask can be overridden if the filesystem is mounted and configured with acls set? (oh, hah, Ivan just said the same thing)
Yep, it appears this is the case. The Pipelines team got back to me and it seems to be a known issue - the workaround they gave me was to run "sudo setfacl -Rb /home/vsts" at the start, so I've merged that in for now (to master and 3.7). Cheers, Steve

On 01Apr2019 09:12, Steve Dower <steve.dower@python.org> wrote:
On 30Mar2019 1130, Gregory P. Smith wrote:
I wouldn't expect it to be the case in a CI environment but I believe a umask can be overridden if the filesystem is mounted and configured with acls set? (oh, hah, Ivan just said the same thing)
Yep, it appears this is the case. The Pipelines team got back to me and it seems to be a known issue - the workaround they gave me was to run "sudo setfacl -Rb /home/vsts" at the start, so I've merged that in for now (to master and 3.7).
Could that be done _without_ sudo to just the local directory containing the test tar file? If that works then you don't need any nasty privileged sudo use (which will just break on platforms without sudo anyway). Cheers, Cameron Simpson <cs@cskk.id.au>

On 01Apr2019 1535, Cameron Simpson wrote:
On 01Apr2019 09:12, Steve Dower <steve.dower@python.org> wrote:
On 30Mar2019 1130, Gregory P. Smith wrote:
I wouldn't expect it to be the case in a CI environment but I believe a umask can be overridden if the filesystem is mounted and configured with acls set? (oh, hah, Ivan just said the same thing)
Yep, it appears this is the case. The Pipelines team got back to me and it seems to be a known issue - the workaround they gave me was to run "sudo setfacl -Rb /home/vsts" at the start, so I've merged that in for now (to master and 3.7).
Could that be done _without_ sudo to just the local directory containing the test tar file? If that works then you don't need any nasty privileged sudo use (which will just break on platforms without sudo anyway).
I tried something similar to that and it didn't work. My guess is it's to do with the actual mount point? (I also tried without sudo at first, and when I didn't work, I tried it with sudo. I hear that's how to decide whether you need it or not ;) ) In any case, it only applies to the Azure Pipelines build definition, so there aren't any other platforms where it'll be used. Cheers, Steve

On 01Apr2019 15:44, Steve Dower <steve.dower@python.org> wrote:
On 01Apr2019 1535, Cameron Simpson wrote:
On 01Apr2019 09:12, Steve Dower <steve.dower@python.org> wrote:
On 30Mar2019 1130, Gregory P. Smith wrote:
I wouldn't expect it to be the case in a CI environment but I believe a umask can be overridden if the filesystem is mounted and configured with acls set? (oh, hah, Ivan just said the same thing)
Yep, it appears this is the case. The Pipelines team got back to me and it seems to be a known issue - the workaround they gave me was to run "sudo setfacl -Rb /home/vsts" at the start, so I've merged that in for now (to master and 3.7).
Could that be done _without_ sudo to just the local directory containing the test tar file? If that works then you don't need any nasty privileged sudo use (which will just break on platforms without sudo anyway).
I tried something similar to that and it didn't work. My guess is it's to do with the actual mount point? (I also tried without sudo at first, and when I didn't work, I tried it with sudo. I hear that's how to decide whether you need it or not ;) )
In any case, it only applies to the Azure Pipelines build definition, so there aren't any other platforms where it'll be used.
Ok then. Curious: is the sudo now in the build setup? I'm just thinking that this isn't a tarfile specific fix but a "get correct POSIX umask semantics" fix, so it should apply to the entire environment. Or am I naive? Cheers, Cameron Simpson <cs@cskk.id.au>

On Mon, Apr 1, 2019 at 4:49 PM Cameron Simpson <cs@cskk.id.au> wrote:
On 01Apr2019 15:44, Steve Dower <steve.dower@python.org> wrote:
On 01Apr2019 1535, Cameron Simpson wrote:
On 01Apr2019 09:12, Steve Dower <steve.dower@python.org> wrote:
On 30Mar2019 1130, Gregory P. Smith wrote:
I wouldn't expect it to be the case in a CI environment but I believe a umask can be overridden if the filesystem is mounted and configured with acls set? (oh, hah, Ivan just said the same thing)
Yep, it appears this is the case. The Pipelines team got back to me and it seems to be a known issue - the workaround they gave me was to run "sudo setfacl -Rb /home/vsts" at the start, so I've merged that in for now (to master and 3.7).
Could that be done _without_ sudo to just the local directory containing the test tar file? If that works then you don't need any nasty privileged sudo use (which will just break on platforms without sudo anyway).
I tried something similar to that and it didn't work. My guess is it's to do with the actual mount point? (I also tried without sudo at first, and when I didn't work, I tried it with sudo. I hear that's how to decide whether you need it or not ;) )
In any case, it only applies to the Azure Pipelines build definition, so there aren't any other platforms where it'll be used.
Ok then.
Curious: is the sudo now in the build setup? I'm just thinking that this isn't a tarfile specific fix but a "get correct POSIX umask semantics" fix, so it should apply to the entire environment.
Or am I naive?
I'm reading between the lines and assuming we're not the only user of their CI complaining about this environment change. ;)

On 02.04.2019 1:44, Steve Dower wrote:
On 01Apr2019 1535, Cameron Simpson wrote:
On 01Apr2019 09:12, Steve Dower <steve.dower@python.org> wrote:
On 30Mar2019 1130, Gregory P. Smith wrote:
I wouldn't expect it to be the case in a CI environment but I believe a umask can be overridden if the filesystem is mounted and configured with acls set? (oh, hah, Ivan just said the same thing)
Yep, it appears this is the case. The Pipelines team got back to me and it seems to be a known issue - the workaround they gave me was to run "sudo setfacl -Rb /home/vsts" at the start, so I've merged that in for now (to master and 3.7).
Could that be done _without_ sudo to just the local directory containing the test tar file? If that works then you don't need any nasty privileged sudo use (which will just break on platforms without sudo anyway).
I tried something similar to that and it didn't work. My guess is it's to do with the actual mount point? (I also tried without sudo at first, and when I didn't work, I tried it with sudo. I hear that's how to decide whether you need it or not ;) )
In any case, it only applies to the Azure Pipelines build definition, so there aren't any other platforms where it'll be used.
https://github.com/python/cpython/pull/12655
Cheers, Steve _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/vano%40mail.mipt.ru
-- Regards, Ivan
participants (6)
-
Cameron Simpson
-
Gregory P. Smith
-
Ivan Pozdeev
-
Kushal Kumaran
-
Nathaniel Smith
-
Steve Dower