Mailman 3 [Python-Dev] Re: [Python-checkins] python/dist/src/Python compile.c, 2.311, 2.312 - Python-Dev

[Python-Dev] Re: [Python-checkins] python/dist/src/Python compile.c, 2.311, 2.312

Guido van Rossum

Aug. 3, 2004

2:18 p.m.

Log Message: Don't intern the filename of a file being compiled. Hard to believe it ever helped anything, and it hurts finding reference leaks.

The intention was to introduce sharing of the filename object between code objects compiled from the same file (remember that every method is a separate code object). But I believe the sharing won't happen when the code is loaded from a bytecoded file instead, so it is indeed wasted efficiency. Good catch. --Guido van Rossum (home page: http://www.python.org/~guido/)

Show replies by date

"Martin v. Löwis"

August 2004

2:25 p.m.

New subject: [Python-checkins] python/dist/src/Python compile.c, 2.311, 2.312

Guido van Rossum wrote:

...

The intention was to introduce sharing of the filename object between code objects compiled from the same file (remember that every method is a separate code object).

But I believe the sharing won't happen when the code is loaded from a bytecoded file instead, so it is indeed wasted efficiency.

In 2.4, it would: strings that are interned when being marshalled will get interned on unmarshalling. Regards, Martin

Guido van Rossum

2:34 p.m.

New subject: [Python-checkins] python/dist/src/Python compile.c, 2.311, 2.312

...

...
The intention was to introduce sharing of the filename object between code objects compiled from the same file (remember that every method is a separate code object).

But I believe the sharing won't happen when the code is loaded from a bytecoded file instead, so it is indeed wasted efficiency.

In 2.4, it would: strings that are interned when being marshalled will get interned on unmarshalling.

Aha! So maybe we should reconsider whether mwh's removal of the filename interning in the compiler should be reverted. --Guido van Rossum (home page: http://www.python.org/~guido/)

"Martin v. Löwis"

3:37 p.m.

New subject: [Python-checkins] python/dist/src/Python compile.c, 2.311, 2.312

Guido van Rossum wrote:

...

Aha! So maybe we should reconsider whether mwh's removal of the filename interning in the compiler should be reverted.

I just looked at the sizes of the four largest .pyc files wrt. to this change (ie. the one generated before 2.312, and the one generated after): before after pydoc 84711 91263 cookielib 57353 57353 pickletools 55862 57038 tarfile 54074 58974 So the patch does cause a size increase (and likely also a slowdown because of the extra memory allocations at startup). Whether this is relevant or not, I don't know. I think I would prefer if a different mechanism was found to account for the change in references during an import, e.g. by taking the number of interned strings before and after the import operation. Regards, Martin

Michael Hudson

4:01 p.m.

New subject: [Python-checkins] python/dist/src/Python compile.c, 2.311, 2.312

"Martin v. Löwis" <martin@v.loewis.de> writes:

...

Guido van Rossum wrote:

...
Aha! So maybe we should reconsider whether mwh's removal of the filename interning in the compiler should be reverted.

I just looked at the sizes of the four largest .pyc files wrt. to this change (ie. the one generated before 2.312, and the one generated after):

before after pydoc 84711 91263 cookielib 57353 57353 pickletools 55862 57038 tarfile 54074 58974

So the patch does cause a size increase (and likely also a slowdown because of the extra memory allocations at startup). Whether this is relevant or not, I don't know.

Me neither. Are you sure you regenerated cookielib? What would it cost to check if all strings could be stored via TYPE_STRINGREF? (Almost nothing in code terms...)

...

I think I would prefer if a different mechanism was found to account for the change in references during an import, e.g. by taking the number of interned strings before and after the import operation.

Well, this isn't about import, it's more about execfile. I had entirely failed to twig that interning the filename might save space in the .pyc, so didn't see any downsides to this particular workaround. I'd be happy with any other (I mentioned not doing refcounting on dictobject.c's dummy; maybe I should actually see if that's possible :-). Cheers, mwh -- The gripping hand is really that there are morons everywhere, it's just that the Americon morons are funnier than average. -- Pim van Riezen, alt.sysadmin.recovery

"Martin v. Löwis"

5:12 p.m.

New subject: [Python-checkins] python/dist/src/Python compile.c, 2.311, 2.312

Michael Hudson wrote:

...

Me neither. Are you sure you regenerated cookielib?

Oops, no. Or, rather, yes - even the first number is already regenerated.

...

What would it cost to check if all strings could be stored via TYPE_STRINGREF? (Almost nothing in code terms...)

It would cause additional dictionary lookups on marshalling, which in turn would cause hash computation for all strings. Currently, as only interned strings are marshalled through TYPE_STRINGREF, we know that hash values are already computed for all of them, so we only pay a dict lookup per string (and perhaps an allocation if the string is new and the dict needs resizing). In addition, there is a slight semantical change, in that equal strings become shared on unmarshalling. I don't think this is relevant, though, since marshal never worried about shared strings in the first place.

...

I'd be happy with any other (I mentioned not doing refcounting on dictobject.c's dummy; maybe I should actually see if that's possible :-).

Feel free to change the marshalling to try sharing all strings; it might be easier to revert the change to compile.c, though. Regards, Martin

Michael Hudson

10:53 a.m.

New subject: [Python-checkins] python/dist/src/Python compile.c, 2.311, 2.312

"Martin v. Löwis" <martin@v.loewis.de> writes:

...

...
I'd be happy with any other (I mentioned not doing refcounting on dictobject.c's dummy; maybe I should actually see if that's possible :-).

Feel free to change the marshalling to try sharing all strings; it might be easier to revert the change to compile.c, though.

I've reverted the change. I just I hope I remember this next time I run regrtest.py with -R... Cheers, mwh -- Clue: You've got the appropriate amount of hostility for the Monastery, however you are metaphorically getting out of the safari jeep and kicking the lions. -- coonec -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html

Guido van Rossum

4:23 p.m.

New subject: [Python-checkins] python/dist/src/Python compile.c, 2.311, 2.312

...

...
Aha! So maybe we should reconsider whether mwh's removal of the filename interning in the compiler should be reverted.

I just looked at the sizes of the four largest .pyc files wrt. to this change (ie. the one generated before 2.312, and the one generated after):

before after pydoc 84711 91263 cookielib 57353 57353 pickletools 55862 57038 tarfile 54074 58974

Are you sure the cookielib numbers are correct? They show no difference, which would only make sense if there was only a single code object.

...

So the patch does cause a size increase (and likely also a slowdown because of the extra memory allocations at startup). Whether this is relevant or not, I don't know.

Based on this data I'd like to see the change reverted; the motivation for the change was debuggability of leaks, not a real problem with the changed code.

...

I think I would prefer if a different mechanism was found to account for the change in references during an import, e.g. by taking the number of interned strings before and after the import operation.

--Guido van Rossum (home page: http://www.python.org/~guido/)

"Martin v. Löwis"

5 p.m.

New subject: [Python-checkins] python/dist/src/Python compile.c, 2.311, 2.312

Guido van Rossum wrote:

...

Are you sure the cookielib numbers are correct? They show no difference, which would only make sense if there was only a single code object.

No. Turns out I did not have "old" bytecode for cookielib. Regards, Martin

7514

Age (days ago)

7515

Last active (days ago)

List overview

Download

8 comments

3 participants

participants (3)

"Martin v. Löwis"
Guido van Rossum
Michael Hudson

[Python-Dev] Re: [Python-checkins] python/dist/src/Python compile.c, 2.311, 2.312

Guido van Rossum

"Martin v. Löwis"

Guido van Rossum

"Martin v. Löwis"

Michael Hudson

"Martin v. Löwis"

Michael Hudson

Guido van Rossum

"Martin v. Löwis"

tags

participants (3)