[Python-Dev] Re: [Python-checkins] python/dist/src/Python compile.c, 2.311, 2.312
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Log Message: Don't intern the filename of a file being compiled. Hard to believe it ever helped anything, and it hurts finding reference leaks.
The intention was to introduce sharing of the filename object between code objects compiled from the same file (remember that every method is a separate code object). But I believe the sharing won't happen when the code is loaded from a bytecoded file instead, so it is indeed wasted efficiency. Good catch. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/58a0b/58a0be886f0375938476d3eb7345a8b9d8cdc91e" alt=""
Guido van Rossum wrote:
The intention was to introduce sharing of the filename object between code objects compiled from the same file (remember that every method is a separate code object).
But I believe the sharing won't happen when the code is loaded from a bytecoded file instead, so it is indeed wasted efficiency.
In 2.4, it would: strings that are interned when being marshalled will get interned on unmarshalling. Regards, Martin
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
The intention was to introduce sharing of the filename object between code objects compiled from the same file (remember that every method is a separate code object).
But I believe the sharing won't happen when the code is loaded from a bytecoded file instead, so it is indeed wasted efficiency.
In 2.4, it would: strings that are interned when being marshalled will get interned on unmarshalling.
Aha! So maybe we should reconsider whether mwh's removal of the filename interning in the compiler should be reverted. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/58a0b/58a0be886f0375938476d3eb7345a8b9d8cdc91e" alt=""
Guido van Rossum wrote:
Aha! So maybe we should reconsider whether mwh's removal of the filename interning in the compiler should be reverted.
I just looked at the sizes of the four largest .pyc files wrt. to this change (ie. the one generated before 2.312, and the one generated after): before after pydoc 84711 91263 cookielib 57353 57353 pickletools 55862 57038 tarfile 54074 58974 So the patch does cause a size increase (and likely also a slowdown because of the extra memory allocations at startup). Whether this is relevant or not, I don't know. I think I would prefer if a different mechanism was found to account for the change in references during an import, e.g. by taking the number of interned strings before and after the import operation. Regards, Martin
data:image/s3,"s3://crabby-images/4c5e0/4c5e094efaa72edc3f091be11b2a2b05a33dd2b6" alt=""
"Martin v. Löwis" <martin@v.loewis.de> writes:
Guido van Rossum wrote:
Aha! So maybe we should reconsider whether mwh's removal of the filename interning in the compiler should be reverted.
I just looked at the sizes of the four largest .pyc files wrt. to this change (ie. the one generated before 2.312, and the one generated after):
before after pydoc 84711 91263 cookielib 57353 57353 pickletools 55862 57038 tarfile 54074 58974
So the patch does cause a size increase (and likely also a slowdown because of the extra memory allocations at startup). Whether this is relevant or not, I don't know.
Me neither. Are you sure you regenerated cookielib? What would it cost to check if all strings could be stored via TYPE_STRINGREF? (Almost nothing in code terms...)
I think I would prefer if a different mechanism was found to account for the change in references during an import, e.g. by taking the number of interned strings before and after the import operation.
Well, this isn't about import, it's more about execfile. I had entirely failed to twig that interning the filename might save space in the .pyc, so didn't see any downsides to this particular workaround. I'd be happy with any other (I mentioned not doing refcounting on dictobject.c's dummy; maybe I should actually see if that's possible :-). Cheers, mwh -- The gripping hand is really that there are morons everywhere, it's just that the Americon morons are funnier than average. -- Pim van Riezen, alt.sysadmin.recovery
data:image/s3,"s3://crabby-images/58a0b/58a0be886f0375938476d3eb7345a8b9d8cdc91e" alt=""
Michael Hudson wrote:
Me neither. Are you sure you regenerated cookielib?
Oops, no. Or, rather, yes - even the first number is already regenerated.
What would it cost to check if all strings could be stored via TYPE_STRINGREF? (Almost nothing in code terms...)
It would cause additional dictionary lookups on marshalling, which in turn would cause hash computation for all strings. Currently, as only interned strings are marshalled through TYPE_STRINGREF, we know that hash values are already computed for all of them, so we only pay a dict lookup per string (and perhaps an allocation if the string is new and the dict needs resizing). In addition, there is a slight semantical change, in that equal strings become shared on unmarshalling. I don't think this is relevant, though, since marshal never worried about shared strings in the first place.
I'd be happy with any other (I mentioned not doing refcounting on dictobject.c's dummy; maybe I should actually see if that's possible :-).
Feel free to change the marshalling to try sharing all strings; it might be easier to revert the change to compile.c, though. Regards, Martin
data:image/s3,"s3://crabby-images/4c5e0/4c5e094efaa72edc3f091be11b2a2b05a33dd2b6" alt=""
"Martin v. Löwis" <martin@v.loewis.de> writes:
I'd be happy with any other (I mentioned not doing refcounting on dictobject.c's dummy; maybe I should actually see if that's possible :-).
Feel free to change the marshalling to try sharing all strings; it might be easier to revert the change to compile.c, though.
I've reverted the change. I just I hope I remember this next time I run regrtest.py with -R... Cheers, mwh -- Clue: You've got the appropriate amount of hostility for the Monastery, however you are metaphorically getting out of the safari jeep and kicking the lions. -- coonec -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Aha! So maybe we should reconsider whether mwh's removal of the filename interning in the compiler should be reverted.
I just looked at the sizes of the four largest .pyc files wrt. to this change (ie. the one generated before 2.312, and the one generated after):
before after pydoc 84711 91263 cookielib 57353 57353 pickletools 55862 57038 tarfile 54074 58974
Are you sure the cookielib numbers are correct? They show no difference, which would only make sense if there was only a single code object.
So the patch does cause a size increase (and likely also a slowdown because of the extra memory allocations at startup). Whether this is relevant or not, I don't know.
Based on this data I'd like to see the change reverted; the motivation for the change was debuggability of leaks, not a real problem with the changed code.
I think I would prefer if a different mechanism was found to account for the change in references during an import, e.g. by taking the number of interned strings before and after the import operation.
--Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/58a0b/58a0be886f0375938476d3eb7345a8b9d8cdc91e" alt=""
Guido van Rossum wrote:
Are you sure the cookielib numbers are correct? They show no difference, which would only make sense if there was only a single code object.
No. Turns out I did not have "old" bytecode for cookielib. Regards, Martin
participants (3)
-
"Martin v. Löwis"
-
Guido van Rossum
-
Michael Hudson