Non-stable pyc results on python 3.6

hello, we're seeing strange problems when trying to do reproducible builds of some python 3.6 modules. Namely, from one build to another, there will be something like the following difference in the compiled object: 00004e40 da 07 5f 5f 61 6c 6c 5f 5f da 0a 5f 5f 61 75 74 |..__all__..__aut| -00004e50 68 6f 72 5f 5f da 07 64 65 63 69 6d 61 6c 72 0c |hor__..decimalr.| +00004e50 68 6f 72 5f 5f 5a 07 64 65 63 69 6d 61 6c 72 0c |hor__Z.decimalr.| 00004e60 00 00 00 72 43 00 00 00 72 08 00 00 00 72 41 00 |...rC...r....rA.| This specific one is in the top-level co_names segment and the 0x5a vs 0xda byte is TYPE_SHORT_ASCII_INTERNED, with FLAG_REF set or unset. I'm also seeing off-by-one differences in reference ids, i.e., the number appearing after TYPE_REF. Not in all cases, but it seems that when a "part" is affected, all references in that "part" are changed (for some value of "part"; all the knowledge of pycs I have was gained from about an hour of reading marshal.c). So that seems to imply that there's a reference that is sometimes included and sometimes not? This is most often found in __init__.py. Often this affects optimized pycs, but we can see it in un-optimized as well. The issue is rare -- 99% of all pycs are stable -- but when it occurs, it's easy to replicate it in the same place. This also happens on different machines, so that seems to rule out hardware memory errors :) The pycs in question are generated by normal "setup.py build" -> "setup.py install". It happens on Python 3.6 but not on Python 2.7. I'm not sure about Python 3.5 because we don't currently use it. It doesn't seem to depend on hash seed - the instability is observed even with PYTHONHASHSEED set to zero. What seems to fix it, however, is running the build on disorderfs, which ensures that the filesystem entries are in the same order. Any ideas why something like this would happen and why would it be correlated with filesystem ordering? thanks m.

On 27 July 2017 at 23:48, jan matejek <jmatejek@suse.cz> wrote:
The marshal implementation received some significant optimisations in 3.5 [1] and a new marshal format in 3.4 [2], so if you're able to check the behaviour in 3.4 and 3.5 that would be helpful: if the problem occurs in 3.5, but *not* in 3.4, the hashtable based optimisations would be the place to start looking, while if 3.4 misbehaves as well, then there may be some general inconsistency to resolve in how the module decides which instances to mark with `FLAG_REF`. The fact that disorderfs makes a difference does make me a little suspicious, as there's an early exit from the FLAG_REF setting code related to objects having exactly one live reference. Courtesy of string interning and other immutable object caches, order of code compilation can affect how many live references there are to strings and other constants, and hence there may be cases where marshal *won't* flag an immutable object if *only* that particular code object has been compiled, but *will* flag it if some other code object has been created first. That check has been there since version 3 of the marshal format was defined, so if it *is* the culprit, then you'll see this misbehaviour with 3.4 as well. Cheers, Nick. [1] https://docs.python.org/3/whatsnew/3.5.html#optimizations [2] https://github.com/python/cpython/commit/d7009c69136a3809282804f460902ab42e9... -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 28 Jul 2017 00:54, "Nick Coghlan" <ncoghlan@gmail.com> wrote: The fact that disorderfs makes a difference does make me a little suspicious, as there's an early exit from the FLAG_REF setting code related to objects having exactly one live reference. Courtesy of string interning and other immutable object caches, order of code compilation can affect how many live references there are to strings and other constants, and hence there may be cases where marshal *won't* flag an immutable object if *only* that particular code object has been compiled, but *will* flag it if some other code object has been created first. That check has been there since version 3 of the marshal format was defined, so if it *is* the culprit, then you'll see this misbehaviour with 3.4 as well. It occurs to me that it would likely be easier for you to just test that theory directly: the check that now seems suspicious to me is the one that calls "Py_REFCNT" (and is the only reference to that API in the file), so if you comment that out, and the problem goes away, it is most likely the cause of the currently variable behaviour. Cheers, Nick.

On 27 July 2017 at 23:48, jan matejek <jmatejek@suse.cz> wrote:
The marshal implementation received some significant optimisations in 3.5 [1] and a new marshal format in 3.4 [2], so if you're able to check the behaviour in 3.4 and 3.5 that would be helpful: if the problem occurs in 3.5, but *not* in 3.4, the hashtable based optimisations would be the place to start looking, while if 3.4 misbehaves as well, then there may be some general inconsistency to resolve in how the module decides which instances to mark with `FLAG_REF`. The fact that disorderfs makes a difference does make me a little suspicious, as there's an early exit from the FLAG_REF setting code related to objects having exactly one live reference. Courtesy of string interning and other immutable object caches, order of code compilation can affect how many live references there are to strings and other constants, and hence there may be cases where marshal *won't* flag an immutable object if *only* that particular code object has been compiled, but *will* flag it if some other code object has been created first. That check has been there since version 3 of the marshal format was defined, so if it *is* the culprit, then you'll see this misbehaviour with 3.4 as well. Cheers, Nick. [1] https://docs.python.org/3/whatsnew/3.5.html#optimizations [2] https://github.com/python/cpython/commit/d7009c69136a3809282804f460902ab42e9... -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 28 Jul 2017 00:54, "Nick Coghlan" <ncoghlan@gmail.com> wrote: The fact that disorderfs makes a difference does make me a little suspicious, as there's an early exit from the FLAG_REF setting code related to objects having exactly one live reference. Courtesy of string interning and other immutable object caches, order of code compilation can affect how many live references there are to strings and other constants, and hence there may be cases where marshal *won't* flag an immutable object if *only* that particular code object has been compiled, but *will* flag it if some other code object has been created first. That check has been there since version 3 of the marshal format was defined, so if it *is* the culprit, then you'll see this misbehaviour with 3.4 as well. It occurs to me that it would likely be easier for you to just test that theory directly: the check that now seems suspicious to me is the one that calls "Py_REFCNT" (and is the only reference to that API in the file), so if you comment that out, and the problem goes away, it is most likely the cause of the currently variable behaviour. Cheers, Nick.
participants (2)
-
jan matejek
-
Nick Coghlan