Mailman 3 iterzip() - Python-Dev

newer
Re: [bug?] UserLong.py - pow(5,...

iterzip()

Raymond Hettinger

April 28, 2002

2:23 p.m.

What do you guys think about the merits of an iterzip() function that behaves like zip() but returns an iterator instead of a list? Raymond Hettinger P.S. There is a patch for experimentation at http://www.python.org/sf/549662

Show replies by date

Guido van Rossum

April 2002

10:14 p.m.

...

What do you guys think about the merits of an iterzip() function that behaves like zip() but returns an iterator instead of a list?

Hm, I'm not particularly enamored of the idea of adding 'iter' versions of everything under the sun. I wish zip() could've been an interator from the start, but now that it isn't, I don't think it's such a big deal. (An iterator version is easily written as a generator.) In general I'm not keen on increasing the number of builtin functions much. --Guido van Rossum (home page: http://www.python.org/~guido/)

Alex Martelli

10:29 p.m.

On Monday 29 April 2002 00:14, Guido van Rossum wrote: ...

...

such a big deal. (An iterator version is easily written as a generator.)

In general I'm not keen on increasing the number of builtin functions

Maybe we could have a standard module in the library that supplies such "utility" iterators and generators? That avoids adding builtin functions but still provides beginners a good sample of the things they can use iterators and generators for. (If it was a separate module, it might not be hard for somebody to supply a version compatible with 2.2, easing migration and cross-version use). Alex

Guido van Rossum

12:56 a.m.

...

Maybe we could have a standard module in the library that supplies such "utility" iterators and generators? That avoids adding builtin functions but still provides beginners a good sample of the things they can use iterators and generators for. (If it was a separate module, it might not be hard for somebody to supply a version compatible with 2.2, easing migration and cross-version use).

Yup, that's much less of a problem IMO. --Guido van Rossum (home page: http://www.python.org/~guido/)

barry＠zope.com

10:45 p.m.

...

...
...
...
...
"GvR" == Guido van Rossum <guido@python.org> writes:

GvR> Hm, I'm not particularly enamored of the idea of adding GvR> 'iter' versions of everything under the sun. I wish zip() GvR> could've been an interator from the start, but now that it GvR> isn't, I don't think it's such a big deal. (An iterator GvR> version is easily written as a generator.) Of course, zip() wasn't implemented originally as an iterator because it pre-dated them. I agree, it's probably not worth changing it. -Barry

Raymond Hettinger

12:16 a.m.

From: "Guido van Rossum" <guido@python.org>

...

Hm, I'm not particularly enamored of the idea of adding 'iter' versions of everything under the sun.

I'm already working on a separate module for iterators galore (and will cross-check to Haskell to make sure I didn't miss anything). I posted this one separately because zip() eats memory like crazy and because a Python generator version crawls like a snail. IMHO, This is a better way to loop over multiple sequences and has a chance at becoming the tool of choice. I scanned all of my Python code and found that iterzip() was a better choice in every case except a matrix transpose coded as zip(*mat).

...

I wish zip() could've been an interator from the start, but now that it isn't, I don't think it's such a big deal. (An iterator version is easily written as a generator.)

In general I'm not keen on increasing the number of builtin functions much.

Ditto. Any chance of moving functions like map(), reduce(), and filter() to a functional module; pow() and divmod() to the math module; or input() to oblivion? Raymond Hettinger

Guido van Rossum

12:32 a.m.

...

I'm already working on a separate module for iterators galore (and will cross-check to Haskell to make sure I didn't miss anything).

...

I posted this one separately because zip() eats memory like crazy and because a Python generator version crawls like a snail.

Do you have use cases where the memory use matters? I.e. where it needs more memory than you have RAM?

...

IMHO, This is a better way to loop over multiple sequences and has a chance at becoming the tool of choice. I scanned all of my Python code and found that iterzip() was a better choice in every case except a matrix transpose coded as zip(*mat).

Did you time any of these?

...

...
In general I'm not keen on increasing the number of builtin functions much.

Ditto. Any chance of moving functions like map(), reduce(), and filter() to a functional module; pow() and divmod() to the math module; or input() to oblivion?

I wish. Since they were there first, it's hard to get rid of them. (If you're truly masochist, write a PEP and post it to c.l.py to find out how hard. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

Tim Peters

12:56 a.m.

[Raymond Hettinger]

...

Ditto. Any chance of moving functions like map(), reduce(), and filter() to a functional module; pow() and divmod() to the math module; or input() to oblivion?

[Guido]

...

I wish. Since they were there first, it's hard to get rid of them. (If you're truly masochist, write a PEP and post it to c.l.py to find out how hard. :-)

If you agree to approve the PEP in advance, it's easy: Raymond posts it, ignores all responses except to summarize the arguments against in the PEP, and you approve it from your beach mansion on the Cayman Islands. BTW, note that math.pow already exists, and isn't the same as the builtin pow. You can agree in advance to break that too <wink>.

Raymond Hettinger

3:04 a.m.

From: "Guido van Rossum" <guido@python.org>

...

...
I'm already working on a separate module for iterators galore (and will cross-check to Haskell to make sure I didn't miss anything).

+!

...
I posted this one separately because zip() eats memory like crazy and because a Python generator version crawls like a snail.

Do you have use cases where the memory use matters? I.e. where it needs more memory than you have RAM?

No. Am I'm not proposing a backport to Pippy ;)

...

...
IMHO, This is a better way to loop over multiple sequences and has a chance at becoming the tool of choice. I scanned all of my Python code and found that iterzip() was a better choice in every case except a matrix transpose coded as zip(*mat).

Did you time any of these?

I just timed it and am shocked. iterzip() has exactly the same code as zip() except for the final append result to list. So, I expected only a microscopic speed-up. I don't see where the huge performance improvement came from. First timing: ---------------------- 19.439999938 zip 1.43000006676 iterzip 30.1000000238 genzip Second timing: ---------------------- 20.0999999046 zip 1.60000002384 iterzip 29.0599999428 genzip Timing code: --------------- # time iterzip from __future__ import generators import time def run(f, iterables): start = time.time() for tup in f(*iterables): pass return time.time() - start def genzip(*iterables): iterables = map(iter, iterables) while 1: yield tuple([i.next() for i in iterables]) n = 1000000 for f in [zip, iterzip, genzip]: print run(f, [xrange(n), xrange(n), xrange(n)]), f.__name__ Raymond Hettinger

Tim Peters

5:05 a.m.

[Raymond Hettinger]

...

... I just timed it and am shocked. iterzip() has exactly the same code as zip() except for the final append result to list. So, I expected only a microscopic speed-up. I don't see where the huge performance improvement came from.

Well, in the zip case you're allocating 4 million bytes worth of integers, 4 bytes at a time, and don't get to reuse any of the space for the duration. Likewise you're allocating space for a million 3-element tuples, and 4MB worth of contiguous vector 4 bytes at a time. That's all a real drag. Cute: I can make the Python genzip run faster than the builtin zip with "the usual" speed tricks: from time import clock as now def run(f, iterables): start = now() for tup in f(*iterables): pass return now() - start def genzip(*iterables): getters = [iter(i).next for i in iterables] t = tuple while 1: yield t([g() for g in getters]) n = 1000000 for f in [zip, genzip]: print run(f, [xrange(n), xrange(n), xrange(n)]), f.__name__ On my box under current CVS, that yields C:\Code\python\PCbuild>python -O temp.py 12.3587522419 zip 10.4161551484 genzip Better, I can get genzip down to 8.34 seconds by skipping the tuple() call -- in real life, it doesn't really matter whether it returns a tuple or a list. One reason zip is faster for me than for you is that pymalloc is enabled by default in 2.3, and can likely pump out space for a million small tuples significantly faster than your platform malloc can do it.

...

First timing: ---------------------- 19.439999938 zip 1.43000006676 iterzip 30.1000000238 genzip

If only I had a use for iterzip <wink>.

Guido van Rossum

12:30 p.m.

...

...
Did you time any of these?

I just timed it and am shocked. iterzip() has exactly the same code as zip() except for the final append result to list. So, I expected only a microscopic speed-up. I don't see where the huge performance improvement came from.

Probably the list-append still has a slight quadratic component -- you're doing a million elements here. But I asked something else. How much does the speed difference affect the apps you have? In my experience it's usually used for small lists where the quadratic effect doesn't occur and the timing doesn't matter compared to the rest of the program. --Guido van Rossum (home page: http://www.python.org/~guido/)

Raymond Hettinger

5:10 p.m.

From: "Guido van Rossum" <guido@python.org>

...

...
...
Did you time any of these?

I just timed it and am shocked. iterzip() has exactly the same code as zip() except for the final append result to list. So, I expected only a microscopic speed-up. I don't see where the huge performance improvement came from.

Probably the list-append still has a slight quadratic component -- you're doing a million elements here.

The order of magnitude speed boost is consistent through all ranges measurable on my machine: n time func - ---- ---- 10000 0.11 zip 10000 0.00 iterzip 10000 0.38 genzip 100000 1.05 zip 100000 0.16 iterzip 100000 2.31 genzip 1000000 19.77 zip 1000000 1.37 iterzip 1000000 28.62 genzip And at each range, timing('list(iterzip)') is approx equal to timing('zip').

...

But I asked something else. How much does the speed difference affect the apps you have? In my experience it's usually used for small lists where the quadratic effect doesn't occur and the timing doesn't matter

I use zip() a lot, but iterzip() only made a speed difference in a few places: building dictionaries: reachable = dict(zip(map(alpha,wordlist,), wordlist)) forming character pairs on a file read into a string: zip(text[:-1], text[1:]) forming matrices from vectors: Mat( zip(xvec, yvec) ) If we have to leave zip untouched, consider adding iterzip anyway. It will almost always be the better choice. The real issue is which ONE of the following poisons are you most willing to swallow: 1. Forgo iterzip's speed/space improvements 2. Grow the number of builtins by one 3. Deprecate and/or relocate one of: zip, reduce, input, apply, oct 4. Change the behavior of zip to return an iterator (breaking some code) I vote for either #2 or #3. Raymond Hettinger P.S. Hmm, no name controversy? Alex hasn't even suggested a beautiful sounding Italian variant like 'terziparare' <winks and grins>

Fredrik Lundh

6:03 p.m.

raymond wrote:

...

The real issue is which ONE of the following poisons are you most willing to swallow: 1. Forgo iterzip's speed/space improvements 2. Grow the number of builtins by one 3. Deprecate and/or relocate one of: zip, reduce, input, apply, oct 4. Change the behavior of zip to return an iterator (breaking some code)

I vote for either #2 or #3.

here's item 5: 5. Put it in a suitable extension module (e.g. raymondTools), ship it outside python, let it be "out in the wild" for some suitable time, show that a considerable number of real users are using it in real life (and that you're writing real stuff yourself, and just not con- stantly arguing to change the language just because you can), and add it to the library if/when the time is right. I vote for #5.

...

'terziparare'

does that mean "false dilemma", or is it just a cool-sounding googlewhack? </F>

Guido van Rossum

6:15 p.m.

...

The real issue is which ONE of the following poisons are you most willing to swallow: 1. Forgo iterzip's speed/space improvements 2. Grow the number of builtins by one 3. Deprecate and/or relocate one of: zip, reduce, input, apply, oct 4. Change the behavior of zip to return an iterator (breaking some code)

I strongly suggest sticking to the status quo. There are more important fish to fry. --Guido van Rossum (home page: http://www.python.org/~guido/)

Raymond Hettinger

6:20 p.m.

From: "Guido van Rossum" <guido@python.org>

...

I strongly suggest sticking to the status quo. There are more important fish to fry.

Okay. I'll go back to frying fish (working on the docs and an iterator tools module). If you don't mind, I'll leave the patch open in case the desire for it grows. Raymond Hettinger

Tim Peters

5:43 p.m.

[Guido]

...

Probably the list-append still has a slight quadratic component -- you're doing a million elements here.

I get: justpush 1.39 justzip 7.87 from this: """ from time import clock as now n = 1000000 zeroes = [0] * n def justpush(): x = [] push = x.append for i in zeroes: push(i) def justzip(): zip(zeroes) def run(func): start = now() func() finish = now() print "%10s %6.2f" % (func.__name__, finish - start) run(justpush) run(justzip) """ That is, at first glance it's *much* faster to do a million appends in pure Python than it is to let zip do them at C speed. I'm curious about what people get for this program on other platforms (pymalloc on or off may make a dramatic difference too -- or not, depending on how the platform malloc works). The cause on Win98SE is clear enough with a little instrumentation: in the justzip case, listobject.c's ins1 ends up *copying* the memory much more often, presumably due to that there's a new tuple allocation after every append (preventing realloc from growing into that space). justpush copies the memory at sizes (skipping some tiny ones, and where this is a count of the number of elements in the list, not the number of bytes (multiply by 4 to get the latter)): 247 639 959 2047 45055 163839 and that's all. justzip copies the memory at sizes: 247 639 959 2047 45055 53247 57343 65535 81919 98303 122879 163839 196607 229375 262143 294911 360447 393215 458751 589823 622591 688127 720895 753663 786431 819199 851967 884735 917503 950271 Copying 3-4MB multiple times at the end ain't cheap. justpush does relatively worse if the order in which they're called is swapped, as then justzip and the custom tuple free list conspire to leave memory in a more-fragmented state.

...

But I asked something else. How much does the speed difference affect the apps you have? In my experience it's usually used for small lists where the quadratic effect doesn't occur and the timing doesn't matter compared to the rest of the program.

Same here. High volume "vectorized" operations are NumPy's proper domain.

Neil Schemenauer

6:05 p.m.

Tim Peters wrote:

...

I'm curious about what people get for this program on other platforms (pymalloc on or off may make a dramatic difference too -- or not, depending on how the platform malloc works).

With pymalloc on Linux: justpush 0.79 justzip 5.17

Guido van Rossum

6:22 p.m.

...

The cause on Win98SE is clear enough with a little instrumentation: in the justzip case, listobject.c's ins1 ends up *copying* the memory much more often, presumably due to that there's a new tuple allocation after every append (preventing realloc from growing into that space). justpush copies the memory at sizes (skipping some tiny ones, and where this is a count of the number of elements in the list, not the number of bytes (multiply by 4 to get the latter)):

247 639 959 2047 45055 163839

and that's all. justzip copies the memory at sizes:

247 639 959 2047 45055 53247 57343 65535 81919 98303 122879 163839 196607 229375 262143 294911 360447 393215 458751 589823 622591 688127 720895 753663 786431 819199 851967 884735 917503 950271

Copying 3-4MB multiple times at the end ain't cheap.

justpush does relatively worse if the order in which they're called is swapped, as then justzip and the custom tuple free list conspire to leave memory in a more-fragmented state.

This suggests that zip() can be sped up considerably by using a hint for the input sequence sizes. If they all respond properly to PySequence_Length(), zip can preallocate the result list to the correct size. It shouldn't believe the numbers fully, but using them as a hint for initial allocation makes a lot of sense, and should greatly reduce the inefficiency. --Guido van Rossum (home page: http://www.python.org/~guido/)

Tim Peters

7:35 p.m.

[Guido]

...

This suggests that zip() can be sped up considerably by using a hint for the input sequence sizes. If they all respond properly to PySequence_Length(), zip can preallocate the result list to the correct size. It shouldn't believe the numbers fully, but using them as a hint for initial allocation makes a lot of sense, and should greatly reduce the inefficiency.

I'm afraid I misanalyzed the cause. Giving zip a good guess about the result size and using PyList_SET_ITEM up until then leads (as expected) to no list copying in the test case I posted, but only dropped the runtime from 7.87 to 7.31 seconds (which makes sense looking at the numbers more closely: it doesn't take *that* long to copy a few dozen mebabytes of memory unless you're doing it by hand <wink>). Check this out: """ from time import clock as now n = 1000000 def juststore(x): for i in x: x[i] = 0 def justtups(x): for i in x: 0, def storetups(x): for i in x: x[i] = 0, def run(func): x = range(n) start = now() func(x) finish = now() print "%10s %6.2f" % (func.__name__, finish - start) run(juststore) run(justtups) run(storetups) """ I get: juststore 0.93 justtups 0.58 storetups 7.61 list.append is out of the picture here. Creating a million tuples goes fast so long as they're recycled, and storing a million things goes fast, but storing a million distinct tuples takes very much longer. It's a Mystery to me so far. How does it act on Linux?

Guido van Rossum

7:55 p.m.

...

I get:

juststore 0.93 justtups 0.58 storetups 7.61

list.append is out of the picture here. Creating a million tuples goes fast so long as they're recycled, and storing a million things goes fast, but storing a million distinct tuples takes very much longer. It's a Mystery to me so far. How does it act on Linux?

Very reasonable with Python 2.2: juststore 0.95 justtups 0.82 storetups 1.61 But with Python 2.3 it does just about as poorly as you saw on Windows: juststore 0.91 justtups 0.82 storetups 9.40 Could it be that pymalloc somehow slows things down here? The 2.3 tupleobject.c is nearly unchanged from 2.2, so that can't be it. If I double the array size, the 2.3 storetups time almost quadruples: juststore 1.82 justtups 1.62 storetups 34.11 While with 2.2 it's nicely linear: juststore 1.91 justtups 1.63 storetups 3.20 Bizarre, but I know you like a challenge. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

holger krekel

8:02 p.m.

On Mon, Apr 29, 2002 at 03:55:15PM -0400, Guido van Rossum wrote:

...

...
I get:

juststore 0.93 justtups 0.58 storetups 7.61

list.append is out of the picture here. Creating a million tuples goes fast so long as they're recycled, and storing a million things goes fast, but storing a million distinct tuples takes very much longer. It's a Mystery to me so far. How does it act on Linux?

Very reasonable with Python 2.2:

juststore 0.95 justtups 0.82 storetups 1.61

Strange. my linux with python2.2.1rc2 (default) gives me: juststore 0.93 justtups 0.83 storetups 10.13 __holger

Tim Peters

7:57 p.m.

[Tim, on def juststore(x): for i in x: x[i] = 0 def justtups(x): for i in x: 0, def storetups(x): for i in x: x[i] = 0, where x is always range(1000000)]

...

... juststore 0.93 justtups 0.58 storetups 7.61

... It's a Mystery to me so far. How does it act on Linux?

Well, it's not pymalloc here. pymalloc is happier recycling memory than grabbing arena after arena of persistent memory (it doesn't do what you think here <wink> -- it creates a pool's free list "lazily", not all at once, and that takes time). If I recompile with pymalloc disabled, the storetups time increases to 9.65 seconds. So despite pymalloc's reluctance to carve up memory until absolutely needed, it's still a lot faster than using the system malloc for small tuples on this box.

Jeremy Hylton

8:03 p.m.

...

...
...
...
...
"TP" == Tim Peters <tim.one@comcast.net> writes:

TP> Check this out: TP> """ from time import clock as now TP> n = 1000000 TP> def juststore(x): TP> for i in x: x[i] = 0 TP> def justtups(x): TP> for i in x: 0, TP> def storetups(x): TP> for i in x: x[i] = 0, TP> def run(func): TP> x = range(n) start = now() func(x) finish = now() print TP> "%10s %6.2f" % (func.__name__, finish - start) TP> run(juststore) run(justtups) run(storetups) """ TP> I get: TP> juststore 0.93 TP> justtups 0.58 TP> storetups 7.61 TP> list.append is out of the picture here. Creating a million TP> tuples goes fast so long as they're recycled, and storing a TP> million things goes fast, but storing a million distinct tuples TP> takes very much longer. It's a Mystery to me so far. How does TP> it act on Linux? It acts about the same on Linux. I don't see why it's a mystery. justtups() only uses one tuple at a time; it gets re-used from the free list every time. storetups() has to (py)malloc a new tuple every stinkin' time. Note that there's a net excess of allocations in the storetup() case, so we invoke the garbage collector every n/700 times through the loop. I've noted before that it doesn't make much sense to invoke GC unless there is at least one deallocation; you can't reclaim anything if there are no DECREFs. Jeremy

Guido van Rossum

8:10 p.m.

...

It acts about the same on Linux.

I don't see why it's a mystery. justtups() only uses one tuple at a time; it gets re-used from the free list every time. storetups() has to (py)malloc a new tuple every stinkin' time.

Note that there's a net excess of allocations in the storetup() case, so we invoke the garbage collector every n/700 times through the loop. I've noted before that it doesn't make much sense to invoke GC unless there is at least one deallocation; you can't reclaim anything if there are no DECREFs.

But note the difference between 2.2 and 2.3 that I measured (repeatedly -- I don't know why Holger doesn't see this). Did the GC algorithm change between the two? *Something* must've changed. --Guido van Rossum (home page: http://www.python.org/~guido/)

jeremy＠zope.com

8:16 p.m.

I don't have a 2.2 checkout handy, but I've got 2.1 and 2.3. I don't see much difference. Note that I've made N smaller by about 10 to make it run in reasonable time on my (virtual) machine. localhost:~/src/python/src-release21-maint/build> ./python -O /tmp/iterzip.py juststore 0.41 justtups 0.40 storetups 2.77 localhost:~/src/python/src/build> ./python -O /tmp/iterzip.py juststore 0.32 justtups 0.29 storetups 2.44 Don't know what's up with your 2.2 build. Jeremy

Guido van Rossum

8:25 p.m.

...

I don't have a 2.2 checkout handy, but I've got 2.1 and 2.3. I don't see much difference. Note that I've made N smaller by about 10 to make it run in reasonable time on my (virtual) machine.

My 2.2 apparently had GC turned off. After a clean rebuild I see the same quadratic behavior too. --Guido van Rossum (home page: http://www.python.org/~guido/)

Neil Schemenauer

8:13 p.m.

Jeremy Hylton wrote:

...

Note that there's a net excess of allocations in the storetup() case, so we invoke the garbage collector every n/700 times through the loop.

That's the dirty culprit. :-) With the GC disabled: justpush 0.81 justzip 0.75 Perhaps we should raise the default threshold. Neil

Jeremy Hylton

8:25 p.m.

...

...
...
...
...
"NS" == Neil Schemenauer <nas@python.ca> writes:

NS> Jeremy Hylton wrote:

...

...
Note that there's a net excess of allocations in the storetup() case, so we invoke the garbage collector every n/700 times through the loop.

NS> That's the dirty culprit. :-) With the GC disabled: NS> justpush 0.81 NS> justzip 0.75 NS> Perhaps we should raise the default threshold. I suspect that would be good. Many Python programs chew through objects quickly. Still, for this class of programs, for any large N any small constant will be too small. If we care much about this class of examples (loops with no function calls and no deallocs), would it make sense to count alloc and dealloc separately? They would both need to exceed some limit before triggering GC. Jeremy

Tim Peters

8:56 p.m.

[Neil Schemenauer]

...

That's the dirty culprit. :-) With the GC disabled:

justpush 0.81 justzip 0.75

Perhaps we should raise the default threshold.

A fixed threshold of any size will leave us vulnerable to quadratic-time cases. Proportional growth wouldn't, though. For example, if a round of gc didn't find anything to collect, or found very little, we could boost the threshold by 25% (that's a right shift by 2 and an add <wink>). Contrarily, when gc finds lots of stuff to collect, reduce the threshold. This adjusts itself to a program's runtime characteristics. I suspect most long-running programs enjoy vast stretches of time over which the second derivative of their gc behavior is relatively constant <wink>.

Guido van Rossum

9:07 p.m.

...

A fixed threshold of any size will leave us vulnerable to quadratic-time cases. Proportional growth wouldn't, though. For example, if a round of gc didn't find anything to collect, or found very little, we could boost the threshold by 25% (that's a right shift by 2 and an add <wink>). Contrarily, when gc finds lots of stuff to collect, reduce the threshold. This adjusts itself to a program's runtime characteristics. I suspect most long-running programs enjoy vast stretches of time over which the second derivative of their gc behavior is relatively constant <wink>.

Should we worry about programs that don't create any cyclical garbage for a long time, and then sudenly start creating lots of it? The initial GC-free period may bump the threshold up very far, and then it will build up a significant pile of cyclical garbage before GC runs again. --Guido van Rossum (home page: http://www.python.org/~guido/)

Tim Peters

10:17 p.m.

[Guido, on adjusting gc threshold up or down depending on how much garbage each gc pass finds]

...

Should we worry about programs that don't create any cyclical garbage for a long time, and then suddenly start creating lots of it? The initial GC-free period may bump the threshold up very far, and then it will build up a significant pile of cyclical garbage before GC runs again.

Yes, when the past is a rotten predictor of the future, it's hard to predict the future better than rottenly. Countless megabytes of cyclic structures may be kept alive by a single reference too, so looking for a bump in deallocation rate isn't a cure-all either. A practical compromise would be to allow thresholds to auto-adjust, but only within a fixed range. That would give Neil the opportunity to add more tuning knobs nobody understands <wink>. I'd set the default upper bound pretty high, like a few million excess allocations. Somebody creating oodles of cyclic trash after a long quiet period, *and* who can't afford to let the memory go unreclaimed, would have to learn to import gc and force the issue. The rest of us would enjoy less gc overhead with little potential for harm. I'd also do auto-adjust by more than 25%; *2 and /2 are more like it. Programs don't *ease* their way into different memory behaviors, when it changes it's likely to be a sharp break.

martin＠v.loewis.de

6:28 a.m.

Guido van Rossum <guido@python.org> writes:

...

Should we worry about programs that don't create any cyclical garbage for a long time, and then sudenly start creating lots of it?

No. So far, there was only one bug report on SF (that I know of) where GC was not triggered early enough. In that case, the problem was that there were large strings attached to the cyclic garbage that consumed lots of memory, but contributed little to the object count. So any application that knows it creates cyclic garbage "on purpose" should explicitly invoke gc.collect from time to time to be safe. Most applications don't need to worry, as they can safely assume that users have enough spare memory for the garbage. Regards, Martin

Tim Peters

8:21 p.m.

[Jeremy]

...

It acts about the same on Linux.

Also in 2.2.1? Guido presented some evidence there that's getting increasingly hard to believe (in part because holger krekel couldn't verify it, and in larger part because you found the true cause below <wink>).

...

I don't see why it's a mystery. justtups() only uses one tuple at a time; it gets re-used from the free list every time. storetups() has to (py)malloc a new tuple every stinkin' time.

It's a mystery because it's not credible that falling back to pymalloc is *that* much slower than using the tuple free list.

...

Note that there's a net excess of allocations in the storetup() case, so we invoke the garbage collector every n/700 times through the loop.

Bingo! This also explains Guido's quadratic-time behavior. If I do gc.disable() at the start of this test, storetups() drops to under 1 second. So, also, the pymalloc burden is indeed comparatively trivial.

...

I've noted before that it doesn't make much sense to invoke GC unless there is at least one deallocation; you can't reclaim anything if there are no DECREFs.

Let's see whether Neil can resist <wink>.

Neil Schemenauer

9:35 p.m.

Jeremy Hylton wrote:

...

I've noted before that it doesn't make much sense to invoke GC unless there is at least one deallocation; you can't reclaim anything if there are no DECREFs.

It's easy to add a decref counter. It doesn't seem to help much though based on my limited testing. The standard test suite does not trigger it. I can't get our web application to either. Is it worth checking in? Maybe someone can come up with a better trick. Note that we must take into account generations as well. If threshold0 is low then there are lots of quick collections. Setting it larger does not help too much as new objects will eventually be examined (unless they are destroyed before the next collection). Neil

Jeremy Hylton

10:07 p.m.

...

...
...
...
...
"NS" == Neil Schemenauer <nas@python.ca> writes:

NS> Jeremy Hylton wrote:

...

...
I've noted before that it doesn't make much sense to invoke GC unless there is at least one deallocation; you can't reclaim anything if there are no DECREFs.

NS> It's easy to add a decref counter. It doesn't seem to help much NS> though based on my limited testing. The standard test suite NS> does not trigger it. I can't get our web application to either. NS> Is it worth checking in? Maybe someone can come up with a NS> better trick. I'm not sure what your trick is, since you've only described it as a "decref counter." It may not be what I was thinking of. Regardless of whether it is, my idea may not be any good either. I was imagining a scheme like this: Count increfs and decrefs. Set two thresholds. A collection occurs when both thresholds are exceeded. Perhaps 100 decrefs and 1000 increfs. NS> Note that we must take into account generations as well. If NS> threshold0 is low then there are lots of quick collections. NS> Setting it larger does not help too much as new objects will NS> eventually be examined (unless they are destroyed before the NS> next collection). How does this come into play in the benchmark in question? It seems like we should have gotten a lot of quick collections, but it was still quite slow. Jeremy

Guido van Rossum

10:19 p.m.

...

I was imagining a scheme like this: Count increfs and decrefs. Set two thresholds. A collection occurs when both thresholds are exceeded. Perhaps 100 decrefs and 1000 increfs.

I expect you can't literally count increfs and decrefs. These are macros that need to be super-fast, and I think we can't really afford to increment a counter on eacn macro invocation. The current thresholds are used to count the number of allocations. Adding the number of deallocations to mix seems dangerous: when (nearly) all data is tied up in cycles, there may not be any deallocations. It seems hard to distinguish between these two cases: (a) lots of memory is allocated and kept alive for real by containers (b) lots of memory is allocated and kept alive accidentally by cycles The zip example is a case of (a), but the same allocation behavior could ensue from case (b). Only running the allocator can determine which case we're seeing. I like Tim's idea of adjusting the thresholds base upon the effectiveness of recent collections.

...

How does this come into play in the benchmark in question? It seems like we should have gotten a lot of quick collections, but it was still quite slow.

The benchmark has a list with a million elements, and during execution more and more tuples are added to it. I expect that somehow all these tuples are visited every 700 allocations, which gives quadratic behavior. I guess the generational part of the collector doesn't affect this -- the only way to reduce the work would be if the big list somehow was skipped once it was known to be "old". But since it contains references to "new" object (the 700 tuples allocated last), it probably ends up being visited anyway. --Guido van Rossum (home page: http://www.python.org/~guido/)

barry＠zope.com

10:26 p.m.

...

...
...
...
...
"GvR" == Guido van Rossum <guido@python.org> writes:

GvR> The current thresholds are used to count the number of GvR> allocations. Adding the number of deallocations to mix seems GvR> dangerous: when (nearly) all data is tied up in cycles, there GvR> may not be any deallocations. It seems hard to distinguish GvR> between these two cases: GvR> (a) lots of memory is allocated and kept alive for real by GvR> containers GvR> (b) lots of memory is allocated and kept alive accidentally GvR> by cycles GvR> The zip example is a case of (a), but the same allocation GvR> behavior could ensue from case (b). Only running the GvR> allocator can determine which case we're seeing. I like GvR> Tim's idea of adjusting the thresholds base upon the GvR> effectiveness of recent collections. Isn't this a case of "knowing your application"? IOW, you're doing something that the gc isn't well-tuned to handle, by default. That's why we expose the its operation through the gc module -- so you can take explicit steps for the hotspots in your application. Not to say we can't improve the tuning, but it'll never be perfect, so just try to make it good enough for the most common types of programs. Then document situations where it might not do so well. -Barry

Tim Peters

11:36 p.m.

[Barry]

...

Isn't this a case of "knowing your application"? IOW, you're doing something that the gc isn't well-tuned to handle, by default. That's why we expose the its operation through the gc module -- so you can take explicit steps for the hotspots in your application.

The difficulty is with apps that grow a lot of long-lived containers that aren't trash and don't even contain cycles. There's no bound on how often they'll get crawled over looking for trash that ain't there, and the more of those you grow the longer it takes to look at them. When a gen2 collection doesn't find any trash, it should probably become less eager to try looking at the same stuff yet again. Adding more generations could have a similar good effect. Half of a shadow of an idea: at least in my code, it's common to have gazillions of tuples, and they almost always contain just strings and numbers. Since they're immutable, they'll never contain anything else. So they could get unlinked from cyclic gc entirely without ill effect (it's not possible that they could ever be in a cycle). Perhaps a gen2 collection could learn something about this and automagically untrack them.

Guido van Rossum

11:56 p.m.

...

The difficulty is with apps that grow a lot of long-lived containers that aren't trash and don't even contain cycles. There's no bound on how often they'll get crawled over looking for trash that ain't there, and the more of those you grow the longer it takes to look at them. When a gen2 collection doesn't find any trash, it should probably become less eager to try looking at the same stuff yet again. Adding more generations could have a similar good effect.

Half of a shadow of an idea: at least in my code, it's common to have gazillions of tuples, and they almost always contain just strings and numbers. Since they're immutable, they'll never contain anything else. So they could get unlinked from cyclic gc entirely without ill effect (it's not possible that they could ever be in a cycle). Perhaps a gen2 collection could learn something about this and automagically untrack them.

Different (complementary) idea: how about having more generations, each being traversed less frequently than the previous one? Maybe a (potentially) infinite number of generations? (Or at least a fixed limit that never gets reached in practice.) Wouldn't this have the same effect as increasing the threshold exponentially? --Guido van Rossum (home page: http://www.python.org/~guido/)

Tim Peters

12:02 a.m.

[Guido]

...

Different (complementary) idea: how about having more generations, each being traversed less frequently than the previous one?

That's what I meant by "Adding more generations could have a similar good effect". It's certainly worth considering.

...

Maybe a (potentially) infinite number of generations? (Or at least a fixed limit that never gets reached in practice.) Wouldn't this have the same effect as increasing the threshold exponentially?

Yes, but it wouldn't get the effect of decreasing the threshold exponentially when gc suddenly starts getting lots of trash back from the oldest generation. Only adaptive schemes actually adapt <wink>.

Neil Schemenauer

1:54 a.m.

Tim Peters wrote:

...

That's what I meant by "Adding more generations could have a similar good effect". It's certainly worth considering.

Adding a fourth generation drops the time from 5.13 to 2.11 on my machine. Adding a fifth doesn't seem to make a difference. I used 10 as the threshold for both new generations. Neil

Tim Peters

2:36 a.m.

[Neil Schemenauer]

...

Adding a fourth generation drops the time from 5.13 to 2.11 on my machine. Adding a fifth doesn't seem to make a difference. I used 10 as the threshold for both new generations.

Alas, these thresholds are a little hard to work with. For example ... else if (collections0 > threshold1) { ... collections1++; /* merge gen0 into gen1 and collect gen1 */ ... collections0 = 0; } else { generation = 0; collections0++; ... /* collect gen0 */ ... } Let's say threshold1 is 10 (because it is <wink>), and we just finished a gen1 collection. Then collections0 is 0. We have to do 11 gen0 collections then before "collections0 > threshold1" succeeds, and that point is actually the 12th time gen0 has filled up since the last time we did a gen1 collection. Similarly for collections1 vs threshold2. This makes it hard to multiply them out in an obvious way <wink>. Anyway, with 4 generations it takes in the ballpark of 700 * 10 * 10 * 10 = 700,000 excess allocations before a gen3 collection is triggered, so I expect you saw exactly one gen3 collection during the lifetime of the test run (there are about 1,000,000 excess allocations during its run). Also that adding a fifth generation wouldn't matter at all in this test, since you'd still see exactly one gen3 collection, and a gen4 collection would never happen. Now another ballpark: On the only machine that matters in real life (mine), I'm limited to 2**31 bytes of user address space, and an object participating in gc can rarely be smaller than 40 bytes. That means I can't have more than 2**31/40 ~= 55 million gcable objects alive at once, and that also bounds the aggregate excess of allocations over deallocations. That surprised me. It means the "one million tuple" test is already taxing a non-trivial percentage of this box's theoretical capacity. Indeed, I tried boosting it to 10 million, and after glorious endless minutes of listening to the disk grind itself to dust (with gc disabled, even), Win98 rebooted itself. So another factor-of-10 generation or two would probably move the gross surprises here out of the realm of practical concern. Except, of course, for the programs where it wouldn't <wink>.

martin＠v.loewis.de

6:50 a.m.

Tim Peters <tim.one@comcast.net> writes:

...

...
Maybe a (potentially) infinite number of generations? (Or at least a fixed limit that never gets reached in practice.) Wouldn't this have the same effect as increasing the threshold exponentially?

Yes, but it wouldn't get the effect of decreasing the threshold exponentially when gc suddenly starts getting lots of trash back from the oldest generation. Only adaptive schemes actually adapt <wink>.

That is no problem: if the application suddenly starts to create much new garbage, that garbage won't be in the oldest generation. You only run into this case when you release essentially all past objects. In that case, you just have to hope that the application will terminate soon, anyway, or that the next processing cycle will produce the same amount of garbage, triggering a collection in the oldest generation. Regards, Martin

martin＠v.loewis.de

6:45 a.m.

Tim Peters <tim.one@comcast.net> writes:

...

Half of a shadow of an idea: at least in my code, it's common to have gazillions of tuples, and they almost always contain just strings and numbers. Since they're immutable, they'll never contain anything else. So they could get unlinked from cyclic gc entirely without ill effect (it's not possible that they could ever be in a cycle). Perhaps a gen2 collection could learn something about this and automagically untrack them.

That could happen in gen0 already: when traversing an object, watch whether it contains any tracked objects. If it doesn't, invoke tp_is_immutable, if that returns true, untrack it. Regards, Martin

Guido van Rossum

12:58 p.m.

[Tim]

...

...
Half of a shadow of an idea: at least in my code, it's common to have gazillions of tuples, and they almost always contain just strings and numbers. Since they're immutable, they'll never contain anything else. So they could get unlinked from cyclic gc entirely without ill effect (it's not possible that they could ever be in a cycle). Perhaps a gen2 collection could learn something about this and automagically untrack them.

[MvL]

...

That could happen in gen0 already: when traversing an object, watch whether it contains any tracked objects. If it doesn't, invoke tp_is_immutable, if that returns true, untrack it.

Unfortunately, the visit API doesn't make it easy to watch this; a tuple calls visit() on its items but learns nothing except whether it failed. (I've never seen a visit() implementation that could fail, so I'm not sure even why the return code exists.) --Guido van Rossum (home page: http://www.python.org/~guido/)

martin＠v.loewis.de

5:28 p.m.

Guido van Rossum <guido@python.org> writes:

...

...
That could happen in gen0 already: when traversing an object, watch whether it contains any tracked objects. If it doesn't, invoke tp_is_immutable, if that returns true, untrack it.

Unfortunately, the visit API doesn't make it easy to watch this; a tuple calls visit() on its items but learns nothing except whether it failed. (I've never seen a visit() implementation that could fail, so I'm not sure even why the return code exists.)

However, the visit function could record this result. For example, for move_root_reachable: struct m_r_r{ PyGC_Head *reachable; int immutable_nonnested; }; static int visit_move(PyObject *op, struct m_r_r *tolist) { if(PyObject_IS_GC(op)){ if(GC_is_tracked(op)) m_r_r->tolist.immutable_nonnested = 0; ... } return 0; } static void move_root_reachable(PyGC_Head *reachable) { struct m_r_r my_m_r_r; my_m_r_r.reachable = reachable; for (; gc != reachable; gc=gc->gc.gc_next) { /* careful, reachable list is growing here */ PyObject *op = FROM_GC(gc); traverse = op->ob_type->tp_traverse; my_m_r_r.immutable_nonnested = 1; (void) traverse(op, (visitproc)visit_move, (void *)reachable); if(my_m_r_r.immutable_nonnested && op->ob_type->tp_immutable(op)){ GC_Untrack(op); } } } I'm not sure whether move_root_reachable would be the right place to untrack those objects, but the information is all there. Regards, Martin

Tim Peters

May 2002

2:28 a.m.

[Martin v. Loewis]

...

... However, the visit function could record this result. For example, for move_root_reachable:

I'm afraid it needs more knowledge than this, although it may be surmountable. A problem is that, e.g., tuples get tracked by GC immediately after they're initialized in PyTuple_New(), and they're initialized to all NULL slots. So if gc happens to occur before a tuple T is filled in "for real", traversing T isn't going to call visit_move (because T's slots are still NULL). This would cause gc to untrack T, even though T's creator may eventually get around to making T part of a cycle.

...

struct m_r_r{ PyGC_Head *reachable; int immutable_nonnested; };

static int visit_move(PyObject *op, struct m_r_r *tolist) { if(PyObject_IS_GC(op)){ if(GC_is_tracked(op)) m_r_r->tolist.immutable_nonnested = 0;

I assume this was meant to be tolist->immutable_nonnested = 0.

...

... } return 0; }

static void move_root_reachable(PyGC_Head *reachable) { struct m_r_r my_m_r_r; my_m_r_r.reachable = reachable;

m_r_r.reachable isn't referenced elsewhere.

...

for (; gc != reachable; gc=gc->gc.gc_next) { /* careful, reachable list is growing here */ PyObject *op = FROM_GC(gc); traverse = op->ob_type->tp_traverse; my_m_r_r.immutable_nonnested = 1; (void) traverse(op, (visitproc)visit_move, (void *)reachable); if(my_m_r_r.immutable_nonnested && op->ob_type->tp_immutable(op)){

What the heck is tp_immutable()? Are you implicitly assuming the existence of a new typeslot function? I'd be happy to special-case the snot out of tuples, and leave it at that. For that purpose, it may be enough just to see whether a thing *is* a tuple, and all its slots are non-NULL and filled with objects of the popular scalar immutable types (strings and numbers). Hmm! I bet we could go a long away leaving gc out of this entirely: BUILD_TUPLE in ceval.c could check the types of the objects it stuffs into a new tuple, and untrack it immediately then if it were safe to do so. And tupleconcat() could mindlessly inherit trackedness from the logical "or" of its inputs' trackedness. Would that get 80% of the potential benefit with 5% of the potential work?

Tim Peters

5:19 a.m.

[Tim]

...

... Hmm! I bet we could go a long away leaving gc out of this entirely: BUILD_TUPLE in ceval.c could check the types of the objects it stuffs into a new tuple, and untrack it immediately then if it were safe to do so. And tupleconcat() could mindlessly inherit trackedness from the logical "or" of its inputs' trackedness. Would that get 80% of the potential benefit with 5% of the potential work?

This looks promising, but needs more work. First off, BUILD_TUPLE gets surprised by the empty tuple: after untracking it once, it gets back the same empty tuple later, and blows up when trying to untrack it again. So I had to break into the GC abstraction to avoid untracking when something was already untracked. That test is a heartbreaker, since it's only the empty tuple that's a potential problem there. Probably better to have tupleobject.c do something special. After fiddling builtin_zip to untrack the tuples it creates when that's safe (untrack is safe == for each slot s in the tuple, either s is not of a gcable type, or s is of a gcable type but s is already untracked), the justzip() test showed a wonderful surprise: it took *forever*! This is cute: _PyObject_GC_UNTRACK() doesn't decrease gc's "excess allocation" count, but tuples keep getting created. After "the list" makes it into gen2, the excess allocation count just keeps growing, and collect_generations() keeps getting called. But there's nothing *in* gen0 or gen1 anymore (the tuples get untracked), and collect_generations() neither calls collect(), nor sets allocations back to 0 itself, when a gen0 or gen1 collection occurs and the gen0 & gen1 lists are empty. The upshoot is that "allocated" is (almost) always greater than threshold0, so collect_generations() gets called *every* time a tuple gets allocated. Every 10th tuple allocation a useless null gen1 collection gets done then, and about about every 100th a gen2 collection gets done. The tuples aren't getting tracked anymore, but the list containing them is, and it crawls over all the list slots each time a gen2 collection gets done (and that happens about 700x more frequently than it's supposed to). So that turned out to be quadratic-time with extreme prejudice <wink>. Anyway, setting 'allocated' back to 0 at the end of collect_generations cured that, and untracking the tuples did do some good: the runtime of justzip() fell by more than a factor of 2. The gen2 traversals of "the list" still slowed it down considerably, though (over what happens when gc is disabled). So, if we want to pursue this, perhaps what gc really wants to count is the excess of gc tracks over gc untracks. Right now they're strongly correlated with gc allocations and deallocations, respectively. But if we started untracking "safe" tuples, that connection would be broken. If we did exempt "safe tuples" from gc, and counted the excess of tracks over untracks, gc would never trigger in justzip(). So that's the attraction. I think that much would be principled, since I view the "excess" statistic as trying to guess how many objects are sitting in gen0, triggering a round of cycle gc when gen0 gets big enough; track-untrack is a more direct measure of gen0's size than allocated-deallocated.

Guido van Rossum

12:06 p.m.

...

Hmm! I bet we could go a long away leaving gc out of this entirely: BUILD_TUPLE in ceval.c could check the types of the objects it stuffs into a new tuple, and untrack it immediately then if it were safe to do so.

But this still takes time, and most tuples never make it into GC, so it's wasted time, isn't it? I'd put such code in tuplevisit (assuming it's safe to call GC_UnTrack there).

...

And tupleconcat() could mindlessly inherit trackedness from the logical "or" of its inputs' trackedness. Would that get 80% of the potential benefit with 5% of the potential work?

Too sophisticated already. ;-) --Guido van Rossum (home page: http://www.python.org/~guido/)

Neil Schemenauer

2:31 p.m.

Guido van Rossum wrote:

...

But this still takes time, and most tuples never make it into GC, so it's wasted time, isn't it?

I agree. The idea I though of last night was to have the GC special case certain objects. It could know about tuples. After collecting the first generation any surviving tuples that don't refer to objects with the GC flag could be untracked. We might even generalize it by untracking objects that don't have a tp_clear method and don't refer to GC objects. BTW, I also looked into the train algorithm last night. It's cool because it never examines the entire set of objects (unless they all happen to be a big cyclic garbage struture). Unfortunately I don't think it can be adapted for Python.

...

I'd put such code in tuplevisit (assuming it's safe to call GC_UnTrack there).

That would be slick. I don't know if it's safe without looking at the code but it could probably be made so. Neil

Tim Peters

3:54 p.m.

[Neil Schemenauer]

...

... The idea I though of last night was to have the GC special case certain objects. It could know about tuples. After collecting the first generation any surviving tuples that don't refer to objects with the GC flag could be untracked.

Just butting in to be obnoxious about that a NULL slot is a disqualifier for untracking too.

...

We might even generalize it by untracking objects that don't have a tp_clear method and don't refer to GC objects.

How do we know which other objects they refer to? How do we know that the set of objects they refer to will never change? And do we have a reason to believe that any type other than tuples makes a lick of difference <wink>?

...

BTW, I also looked into the train algorithm last night. It's cool because it never examines the entire set of objects (unless they all happen to be a big cyclic garbage struture). Unfortunately I don't think it can be adapted for Python.

Try adapting Python to it <wink>. [Guido]

...

...
I'd put such code in tuplevisit (assuming it's safe to call GC_UnTrack there).

...

That would be slick. I don't know if it's safe without looking at the code but it could probably be made so.

I believe it would be easy, requiring minor rewrites of all the gc loops that invoke tp_traverse (the gc->gc.gc_next field they rely on to get "the next guy" in the list could go NULL on them across tp_traverse calls).

Neil Schemenauer

6:31 p.m.

Tim Peters wrote:

...

How do we know which other objects they refer to?

We call tp_traverse on them.

...

How do we know that the set of objects they refer to will never change?

The assumption is that if the object doesn't have tp_clear then it's immutable. That's a little risky I guess.

...

And do we have a reason to believe that any type other than tuples makes a lick of difference <wink>?

Probably not. [train algorithm]

...

Try adapting Python to it <wink>.

Can't do it without dropping the reference counting and breaking every extension module (I least I can't figure out a way). Neil

Tim Peters

7:03 p.m.

[Tim]

...

...
How do we know which other objects they refer to?

[Neil Schemenauer]

...

We call tp_traverse on them.

We just went thru this for tuples, right? PyTuple_New returns a tracked tuple T all of whose data slots are NULL. If tupletraverse(T, visit, arg) is called when T is in this state, tupletraverse will never call visit. And, AFAICT, whether or not the visit() function is called is the only information you can get out of calling tp_traverse. In which case, calling tp_traverse will tell you if a tuple *does* point to a gcable object, but tells you nothing about the tuple if visit doesn't get called, only that the tuple doesn't currently happen to point to anything interesting (but well may if you try again later). IOW, "immutable" isn't a truthful description of tuples at this level; tuples are all but guaranteed to mutate after they become tracked, and knowing when a tuple has undergone its final under-the-covers mutation requires exact knowledge of how tuples are layed out and used. s/tuple/xyzobject/g.

...

...
How do we know that the set of objects they refer to will never change?

...

The assumption is that if the object doesn't have tp_clear then it's immutable. That's a little risky I guess.

Seems likely, yes.

martin＠v.loewis.de

7:31 p.m.

Tim Peters <tim.one@comcast.net> writes:

...

How do we know which other objects they refer to? How do we know that the set of objects they refer to will never change? And do we have a reason to believe that any type other than tuples makes a lick of difference <wink>?

I'd suggest a tp_is_immutable predicate. If a container is immutable and non-cyclic, it can be untracked. A container is non-cyclic if all contained objects are either non-gc, or untracked. Regards, Martin

Tim Peters

7:56 p.m.

[martin@v.loewis.de]

...

I'd suggest a tp_is_immutable predicate. If a container is immutable and non-cyclic, it can be untracked. A container is non-cyclic if all contained objects are either non-gc, or untracked.

And none of the contained objects are NULL (else the scheme doesn't even work for tuples). How do you know which other objects it contains? Calling tp_traverse and seeing whether that ever calls the passed-in visit function doesn't work (see the already-repeated discussion about this for tuples).

martin＠v.loewis.de

7:10 a.m.

Tim Peters <tim.one@comcast.net> writes:

...

[martin@v.loewis.de]

...
I'd suggest a tp_is_immutable predicate. If a container is immutable and non-cyclic, it can be untracked. A container is non-cyclic if all contained objects are either non-gc, or untracked.

And none of the contained objects are NULL (else the scheme doesn't even work for tuples).

If a tuple contains a NULL, its tp_is_immutable will return false, hence it won't be untracked, henced it isn't cyclic.

...

How do you know which other objects it contains? Calling tp_traverse and seeing whether that ever calls the passed-in visit function doesn't work (see the already-repeated discussion about this for tuples).

That's why I propose the tp_is_immutable predicate. Regards, Martin

Tim Peters

3:44 p.m.

[Tim]

...

Hmm! I bet we could go a long away leaving gc out of this entirely: ...

[Guido]

...

But this still takes time, and most tuples never make it into GC, so it's wasted time, isn't it?

Absolutely. I'm trying to understand the true causes here by fiddling various things. It turned out that "useless" tracing of tuples accounted for more than half the gc burden in this test, but not much more than half (the rest was due to "useless" repeated tracing of the list containing all the tuples).

...

I'd put such code in tuplevisit (assuming it's safe to call GC_UnTrack there).

I'm pretty sure that would lead to instant segfaults, but the gc loops calling tp_traverse could be fiddled to save a local pointer to "the next guy" before invoking tp_traverse. I'm not sure that's the best place to put it.

Aahz

April 2002

2:56 a.m.

New subject: Billions of gc's

On Mon, Apr 29, 2002, Barry A. Warsaw wrote:

...

Isn't this a case of "knowing your application"? IOW, you're doing something that the gc isn't well-tuned to handle, by default. That's why we expose the its operation through the gc module -- so you can take explicit steps for the hotspots in your application.

Not to say we can't improve the tuning, but it'll never be perfect, so just try to make it good enough for the most common types of programs. Then document situations where it might not do so well.

My take is that programs with a million live objects and no cycles are common enough that gc should be designed to handle that smoothly. I don't think that a programmer casually writing such applications (say, processing information from a database) should be expected to understand gc well enough to tune it. Having read the entire discussion so far, and *NOT* being any kind of gc expert, I would say that Tim's adaptive solution makes the most sense to me. For years, we told people with cyclic data to figure out how to fix the problem themselves; now that we have gc available, I don't think we should punish everyone else. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "I used to have a .sig but I found it impossible to please everyone..." --SFJ

Tim Peters

3:55 a.m.

New subject: Billions of gc's

[Aahz]

...

My take is that programs with a million live objects and no cycles are common enough that gc should be designed to handle that smoothly.

Well, millions of live objects is common but isn't a problem. The glitch we're looking at it is surprising slowdown with millions of live *container* objects. The latter isn't so common.

...

I don't think that a programmer casually writing such applications (say, processing information from a database) should be expected to understand gc well enough to tune it.

People casually writing applications pushing the limits of their boxes are in for more surprises than just this <wink>.

...

Having read the entire discussion so far, and *NOT* being any kind of gc expert, I would say that Tim's adaptive solution makes the most sense to me. For years, we told people with cyclic data to figure out how to fix the problem themselves; now that we have gc available, I don't think we should punish everyone else.

We're not trying to punish anyone, but innocent users with lots of containers can lose big despite our wishes: if we don't check them for cycles, they can run out of memory; if we do check them for cycles, it necessarily consumes time. As a datapoint, here are the times (in seconds) for justzip() on my box after my checkin to precompute the result size (list.append behavior is irrelevant now): gc disabled: 0.64 gc enabled: 7.32 magic=2(*): 2.63 magic=3(*): 2.02 (*) This is gcmodule.c fiddled to add this block after "collections1 = 0;" in the first branch of collect_generations(): if (n == 0) threshold2 *= magic; else if (threshold2 > 5) threshold2 /= magic; magic=1 is equivalent to the current code. That's all an "adaptive scheme" need amount to, provided the "*=" part were fiddled to prevent threshold2 from becoming insanely large. Boosting magic above 3 didn't do any more good in this test. At magic=3 it still takes 3+ times longer than with gc disabled, but that's a whale of a lot better than the current 11+ times longer. Note that with gc disabled, any cycle in any of the 1,000,001 containers this test creates would leak forever -- casual users definitely get something back for the time spent.

Aahz

4:21 a.m.

New subject: Billions of gc's

On Mon, Apr 29, 2002, Tim Peters wrote:

...

[Aahz]

...
My take is that programs with a million live objects and no cycles are common enough that gc should be designed to handle that smoothly.

Well, millions of live objects is common but isn't a problem. The glitch we're looking at it is surprising slowdown with millions of live *container* objects. The latter isn't so common.

...
I don't think that a programmer casually writing such applications (say, processing information from a database) should be expected to understand gc well enough to tune it.

People casually writing applications pushing the limits of their boxes are in for more surprises than just this <wink>.

Fair enough. I hadn't quite understood that it was specifically container objects, but obviously a database result will have lots of tuples, so I think that's a good real-world metric for testing whatever solution is proposed. Here's a question: suppose we've got a database result with 10K rows (I'd say that is fairly common), and we're processing each row with a regex (something that can't be done in SQL). What's a ballpark for gc overhead before and after your fix? (I'm still not set up to compile CVS, so I can't do it myself.) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "I used to have a .sig but I found it impossible to please everyone..." --SFJ

Tim Peters

5:16 a.m.

New subject: Billions of gc's

[Aahz]

...

... Here's a question: suppose we've got a database result with 10K rows (I'd say that is fairly common), and we're processing each row with a regex (something that can't be done in SQL). What's a ballpark for gc overhead before and after your fix? (I'm still not set up to compile CVS, so I can't do it myself.)

What's a database? grateful-for-the-easy-ones-ly y'rs - tim

Guido van Rossum

12:52 p.m.

New subject: Billions of gc's

...

Here's a question: suppose we've got a database result with 10K rows (I'd say that is fairly common), and we're processing each row with a regex (something that can't be done in SQL). What's a ballpark for gc overhead before and after your fix? (I'm still not set up to compile CVS, so I can't do it myself.)

For 10K objects, the GC overhead is negligeable. Jeremy did 100K tuples and still found that 60% of the time was in malloc. You only get in trouble when you approach a million tuples. Remember, it's quadratic. That means it gets 100x worse with every multiplying factor 10 -- but also that it gets 100x better with every division by 10 (until you run into other effects, of course). --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum

12:52 p.m.

New subject: Billions of gc's

...

Well, millions of live objects is common but isn't a problem. The glitch we're looking at it is surprising slowdown with millions of live *container* objects. The latter isn't so common.

But millions of tuples are not uncommon. They're probably the only thing to worry about here. --Guido van Rossum (home page: http://www.python.org/~guido/)

jeremy＠zope.com

10:31 p.m.

...

...
...
...
...
"GvR" == Guido van Rossum <guido@python.org> writes:

...

...
I was imagining a scheme like this: Count increfs and decrefs. Set two thresholds. A collection occurs when both thresholds are exceeded. Perhaps 100 decrefs and 1000 increfs.

GvR> I expect you can't literally count increfs and decrefs. These GvR> are macros that need to be super-fast, and I think we can't GvR> really afford to increment a counter on eacn macro invocation. GvR> The current thresholds are used to count the number of GvR> allocations. I was being sloppy. I meant allocations and deallactions. GvR> Adding the number of deallocations to mix seems dangerous: when GvR> (nearly) all data is tied up in cycles, there may not be any GvR> deallocations. Probably right, although function calls should provide a steady stream of deallocations. Frame, locals, &c. are deallocated on exit. So unless the code is blocked waiting for those cycles to be collected, it ought to eventually make progress. GvR> It seems hard to distinguish between these two cases: GvR> (a) lots of memory is allocated and kept alive for real by GvR> containers GvR> (b) lots of memory is allocated and kept alive accidentally by GvR> cycles GvR> The zip example is a case of (a), but the same allocation GvR> behavior could ensue from case (b). Only running the allocator GvR> can determine which case we're seeing. I like Tim's idea of GvR> adjusting the thresholds base upon the effectiveness of recent GvR> collections. I agree that this sounds interesting.

...

...
How does this come into play in the benchmark in question? It seems like we should have gotten a lot of quick collections, but it was still quite slow.

GvR> The benchmark has a list with a million elements, and during GvR> execution more and more tuples are added to it. I expect that GvR> somehow all these tuples are visited every 700 allocations, GvR> which gives quadratic behavior. I guess the generational part GvR> of the collector doesn't affect this -- I guess this is a question for Neil. I assumed that the generational part would affect this. GvR> the only way to reduce the work would be if the big list GvR> somehow was skipped once it was known to be "old". But since GvR> it contains references to "new" object (the 700 tuples GvR> allocated last), it probably ends up being visited anyway. I thought something was labelled old after it survived some number of collections. Why would the age of the objects it contains have any affect? Jeremy

Guido van Rossum

11:01 p.m.

...

I thought something was labelled old after it survived some number of collections. Why would the age of the objects it contains have any affect?

I thought that old objects were traced less frequently to reduce the amount of time wasted on them. But maybe I don't understand the code. --Guido van Rossum (home page: http://www.python.org/~guido/)

Tim Peters

11:24 p.m.

[Guido]

...

I thought that old objects were traced less frequently to reduce the amount of time wasted on them.

They are, but the set of oldest objects grows without bound in this case, and they still get traced from time to time. That's enough to yield quadratic-time behavior.

...

But maybe I don't understand the code.

That too <wink>.

Tim Peters

11:18 p.m.

Let me channel Neil here. "The list" does move out of the "frequently collected" generation (gen0) the first time that gets collected, and works its way into the least frequently collected generation (gen2). The time for futile attempts to collect the first- and second-most (gen1) frequently collected generations doesn't grow as the program goes on. After things get going, they just contain tuples, and those move from gen0 into gen1 into gen2 at the same rate they're created. The quadratic time is due to gen2 collections: gen2 holds "the list", and an ever-increasing number of tuples. gen0 never holds more than about 700 tuples, and gen1 never more than about 8500. This is easy to see by adding import gc gc.set_debug(gc.DEBUG_STATS) at the top of the test case. Here's a snippet from near the end of a run, taken at a time when gen1 and gen2 collections are about to trigger: ... gc: collecting generation 0... gc: objects in each generation: 701 4907 736679 gc: done. gc: collecting generation 0... gc: objects in each generation: 701 5608 736679 gc: done. gc: collecting generation 0... gc: objects in each generation: 701 6309 736679 gc: done. gc: collecting generation 0... gc: objects in each generation: 701 7010 736679 gc: done. gc: collecting generation 1... gc: objects in each generation: 0 8412 736679 gc: done. gc: collecting generation 2... gc: objects in each generation: 0 0 745792 ... If you try this, you'll get a good gut feel for what's happening: the output whizzes by until hitting a gen2 collection, then there's a pause, and the gen2 pause gets longer and longer as the program goes on. So the primary benefit in bad cases from auto-adjusting would come from auto-adjusting the gen2 threshold. That's the only generation that can grow without bound. Before an object moves into gen2, it can get futilely scanned just twice (once in a gen0 collection, and again in a gen1 collection). Boosting the gen0 threshold wouldn't have much effect.

Neil Schemenauer

10:52 p.m.

Jeremy Hylton wrote:

...

I'm not sure what your trick is, since you've only described it as a "decref counter."

Sorry. I'm keeping track of the number of calls to PyObject_GC_Del since the last collection. While it's zero, collection doesn't happen. That makes the justzip function run fast but doesn't seem to help anywhere else.

...

I was imagining a scheme like this: Count increfs and decrefs. Set two thresholds. A collection occurs when both thresholds are exceeded. Perhaps 100 decrefs and 1000 increfs.

That would cut down on the collections more but I'm not sure how much in practice. In real code it seems like allocations and deallocations are pretty mixed up.

...

How does this come into play in the benchmark in question? It seems like we should have gotten a lot of quick collections, but it was still quite slow.

The GC cost is paid early and the objects get put in an older generation. Obviously that's a waste of time if they are deallocated in the near future. justpush deallocates as it goes so the GC is never triggered. I just tried measuring the time spent in the GC while loading some nasty web pages in our system (stuff that looks at thousands of objects). I used the Pentium cycle counter since clock(2) seems to have very low resolution. Setting threshold0 to 7500 makes the GC take up twice the amount of time as with the default settings (700). That surprised me. I thought it wouldn't make much difference. Maybe I screwed up. :-) Neil

Jeremy Hylton

11:02 p.m.

...

...
...
...
...
"NS" == Neil Schemenauer <nas@python.ca> writes:

...

...
How does this come into play in the benchmark in question? It seems like we should have gotten a lot of quick collections, but it was still quite slow.

NS> The GC cost is paid early and the objects get put in an older NS> generation. Obviously that's a waste of time if they are NS> deallocated in the near future. justpush deallocates as it goes NS> so the GC is never triggered. The "0," tuples aren't deallocated until the end of the function, so it seems good to get them out of the current generation. NS> I just tried measuring the time spent in the GC while loading NS> some nasty web pages in our system (stuff that looks at NS> thousands of objects). I used the Pentium cycle counter since NS> clock(2) seems to have very low resolution. Setting threshold0 NS> to 7500 makes the GC take up twice the amount of time as with NS> the default settings (700). That surprised me. I thought it NS> wouldn't make much difference. Maybe I screwed up. :-) Here's some data from gprof to be puzzled by. I ran Tim's test with only run(storetups) enabled. Here's the top of the output: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 60.19 0.62 0.62 112233 0.01 0.01 PyObject_Malloc 6.80 0.69 0.07 622804 0.00 0.00 visit_move 4.85 0.74 0.05 621301 0.00 0.00 visit_decref 4.85 0.79 0.05 168 0.30 0.67 list_traverse 3.88 0.83 0.04 145 0.28 0.41 move_roots 2.91 0.86 0.03 567141 0.00 0.00 tupletraverse 1.94 0.88 0.02 286727 0.00 0.00 gc_list_append 1.94 0.90 0.02 102050 0.00 0.00 tupledealloc 1.94 0.92 0.02 145 0.14 0.89 move_root_reachable 0.97 0.93 0.01 286727 0.00 0.00 gc_list_remove 0.97 0.94 0.01 113171 0.00 0.00 PyObject_Free 0.97 0.95 0.01 100043 0.00 0.00 list_ass_item 0.97 0.96 0.01 32702 0.00 0.00 PyDict_Next 0.97 0.97 0.01 489 0.02 1.78 eval_frame 0.97 0.98 0.01 280 0.04 0.04 instancemethod_dealloc 0.97 0.99 0.01 145 0.07 0.82 subtract_refs 0.97 1.00 0.01 145 0.07 0.07 update_refs 0.97 1.01 0.01 7 1.43 1.43 posix_stat 0.97 1.02 0.01 6 1.67 1.67 member_get 0.97 1.03 0.01 1 10.00 10.00 PyInt_Fini So the profile output suggests that it's spending 60% of its time in pymalloc. Jeremy

Tim Peters

11:51 p.m.

[Jeremy]

...

The "0," tuples aren't deallocated until the end of the function, so it seems good to get them out of the current generation.

They move out of gen0 and gen1 regularly and quickly; it's the gen2 collections that consume the time.

...

... Here's some data from gprof to be puzzled by. I ran Tim's test with only run(storetups) enabled. Here's the top of the output:

I think you also cut the size of the test by about a factor of 10, else you would have seen many more calls to PyObject_Malloc.

...

Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 60.19 0.62 0.62 112233 0.01 0.01 PyObject_Malloc

...

.. So the profile output suggests that it's spending 60% of its time in pymalloc.

Quite possible! If you cut the size of the test to 100,000 tuples, there's only one gen2 collection in the whole run, and it's gen2 collections that are the source of the problems. I expect you're mostly testing what a crappy job vmware does <wink>.

Tim Peters

11:42 p.m.

[Neil Schemenauer]

...

... I just tried measuring the time spent in the GC while loading some nasty web pages in our system (stuff that looks at thousands of objects). I used the Pentium cycle counter since clock(2) seems to have very low resolution. Setting threshold0 to 7500 makes the GC take up twice the amount of time as with the default settings (700). That surprised me. I thought it wouldn't make much difference. Maybe I screwed up. :-)

Did you try reducing theshold0 too? Maybe GC time tends toward zero as threshold0 does <wink>. I wouldn't discount cache effects here. 7500 objects adds up to lot of bytes, and gc traversal touches stuff all over them. This is one good reason for why a gen0 traversal of the 7500 most-recently allocated still-live objects may take significantly longer than 10 gen0 traversals of the 750 suchlike.

Andrew MacIntyre

10:44 a.m.

On Mon, 29 Apr 2002, Tim Peters wrote:

...

That is, at first glance it's *much* faster to do a million appends in pure Python than it is to let zip do them at C speed. I'm curious about what people get for this program on other platforms (pymalloc on or off may make a dramatic difference too -- or not, depending on how the platform malloc works).

OS/2 EMX (2.2.1 without pymalloc) justpush 0.59 justzip 8.85 OS/2 EMX (recent CVS with pymalloc) justpush 0.54 justzip 8.81 FreeBSD 4.4 (2.1.1 w/o pymalloc) justpush 89.72 justzip 110.41 FreeBSD 4.4 (recent CVS with pymalloc) justpush 19.21 justzip 46.32 The FreeBSD box is more mature hardware (P5-166). I'm surprised at the difference in the 2 sets of results on it. AFAIK, the compiler version and switches are identical for the two interpreters (the 2.1.1 is from the binary package on the FreeBSD 4.4 CDs). -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au | Snail: PO Box 370 andymac@pcug.org.au | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia

Tim Peters

May 2002

3:05 a.m.

[Andrew MacIntyre]

...

OS/2 EMX (2.2.1 without pymalloc) justpush 0.59 justzip 8.85

OS/2 EMX (recent CVS with pymalloc) justpush 0.54 justzip 8.81

FreeBSD 4.4 (2.1.1 w/o pymalloc) justpush 89.72 justzip 110.41

FreeBSD 4.4 (recent CVS with pymalloc) justpush 19.21 justzip 46.32

The FreeBSD box is more mature hardware (P5-166).

By "more mature" in this context I assume you mean more "obsolete" than "better developed".

...

I'm surprised at the difference in the 2 sets of results on it. AFAIK, the compiler version and switches are identical for the two interpreters (the 2.1.1 is from the binary package on the FreeBSD 4.4 CDs).

justpush() primarily tests realloc() speed, and pymalloc isn't (yet) involved in managing list-gut memory. So I expect this has much more to do with the libc(s) they're using than with the compiler(s).

Andrew MacIntyre

10:49 p.m.

On Tue, 30 Apr 2002, Tim Peters wrote:

...

[Andrew MacIntyre]

{...}

...

...
FreeBSD 4.4 (2.1.1 w/o pymalloc) justpush 89.72 justzip 110.41

FreeBSD 4.4 (recent CVS with pymalloc) justpush 19.21 justzip 46.32

The FreeBSD box is more mature hardware (P5-166).

By "more mature" in this context I assume you mean more "obsolete" than "better developed".

Elderly but still reliable.

...

...
I'm surprised at the difference in the 2 sets of results on it. AFAIK, the compiler version and switches are identical for the two interpreters (the 2.1.1 is from the binary package on the FreeBSD 4.4 CDs).

justpush() primarily tests realloc() speed, and pymalloc isn't (yet) involved in managing list-gut memory. So I expect this has much more to do with the libc(s) they're using than with the compiler(s).

I was only referring to the two FreeBSD runs, where the same libc is involved. -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au | Snail: PO Box 370 andymac@pcug.org.au | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia

Tim Peters

4:53 a.m.

[Andrew MacIntyre]

...

...
...
FreeBSD 4.4 (2.1.1 w/o pymalloc) justpush 89.72 justzip 110.41

FreeBSD 4.4 (recent CVS with pymalloc) justpush 19.21 justzip 46.32

...

...
I'm surprised at the difference in the 2 sets of results on it. AFAIK, the compiler version and switches are identical for the two interpreters (the 2.1.1 is from the binary package on the FreeBSD 4.4 CDs).

[Tim]

...

...
justpush() primarily tests realloc() speed, and pymalloc isn't (yet) involved in managing list-gut memory. So I expect this has much more to do with the libc(s) they're using than with the compiler(s).

[Andrew]

...

I was only referring to the two FreeBSD runs,

I know.

...

where the same libc is involved.

That I didn't know (only the compiler was mentioned above). So you've got your own mystery. What are you going to do about it <wink>? Calling the platform realloc is the only potentially expensive thing justpush() does, pymalloc isn't involved with list realloc, and your 2.1.1 spent 70 extra seconds doing *something*.

barry＠zope.com

April 2002

1:05 a.m.

...

...
...
...
...
"RH" == Raymond Hettinger <python@rcn.com> writes:

RH> IMHO, This is a better way to loop over multiple sequences and RH> has a chance at becoming the tool of choice. I scanned all of RH> my Python code and found that iterzip() was a better choice in RH> every case except a matrix transpose coded as zip(*mat). I'd much rather see a patch that just changed zip() to be an iterator instead of adding an iterzip(). I doubt that much in-field code would break because of it (but write the PEP to find out. ;). RH> Ditto. Any chance of moving functions like map(), reduce(), RH> and filter() to a functional module; pow() and divmod() to the RH> math module; or input() to oblivion?

...

...
...
...
...
"GvR" == Guido van Rossum <guido@python.org> writes:

GvR> I wish. Since they were there first, it's hard to get rid of GvR> them. (If you're truly masochist, write a PEP and post it to GvR> c.l.py to find out how hard. :-) The PEP would have to specify a migration plan, i.e. the builtins are identical to the functional module versions, and would a deprecation schedule, etc. -Barry

Raymond Hettinger

1:39 a.m.

----- Original Message ----- From: "Barry A. Warsaw" <barry@zope.com> To: "Raymond Hettinger" <python@rcn.com> Cc: "Guido van Rossum" <guido@python.org>; <python-dev@python.org>

...

...
...
...
...
...
"RH" == Raymond Hettinger <python@rcn.com> writes:

RH> IMHO, This is a better way to loop over multiple sequences and RH> has a chance at becoming the tool of choice. I scanned all of RH> my Python code and found that iterzip() was a better choice in RH> every case except a matrix transpose coded as zip(*mat).

I'd much rather see a patch that just changed zip() to be an iterator instead of adding an iterzip(). I doubt that much in-field code would break because of it (but write the PEP to find out. ;).

I started down this road, by checking code in the Vaults of Parnassus on the assumption that zip() is rarely used outside a for-loop. What may be the killer is the examples of zip() in Python books which demonstrate a stand-alone zip() returning a list -- in some ways, textbook examples are just as important as in-field code. I will write-up a PEP to see if the world cares. For all I know, I may be the only who uses zip() throughout my code. Zip is just new enough that it might not be too late to change it to an iterator.

...

RH> Ditto. Any chance of moving functions like map(), reduce(), RH> and filter() to a functional module; pow() and divmod() to the RH> math module; or input() to oblivion?

...
...
...
...
...
"GvR" == Guido van Rossum <guido@python.org> writes:

GvR> I wish. Since they were there first, it's hard to get rid of GvR> them. (If you're truly masochist, write a PEP and post it to GvR> c.l.py to find out how hard. :-)

The PEP would have to specify a migration plan, i.e. the builtins are identical to the functional module versions, and would a deprecation schedule, etc.

Will do. I think the resistance to moving the functionals will be fierce. Divmod probably has very few instances in real code. I think __pow__ would need to be left in (as the ** that calls it), but the function itself may be used only rarely. Does anyone know of an automated what that I can scan a large body of published Python code. I would want some real usage statistics in a PEP but hate pulling modules down one at a time and grepping them. Raymond Hettinger

...

-Barry

barry＠zope.com

1:47 a.m.

...

...
...
...
...
"RH" == Raymond Hettinger <python@rcn.com> writes:

RH> I started down this road, by checking code in the Vaults of RH> Parnassus on the assumption that zip() is rarely used outside RH> a for-loop. What may be the killer is the examples of zip() RH> in Python books which demonstrate a stand-alone zip() RH> returning a list -- in some ways, textbook examples are just RH> as important as in-field code. RH> I will write-up a PEP to see if the world cares. For all I RH> know, I may be the only who uses zip() throughout my code. RH> Zip is just new enough that it might not be too late to change RH> it to an iterator. The question is how the zip() list is used. E.g. if it just ends up being the source of a for-loop sequence, then it makes little difference in practice. RH> Does anyone know of an automated what that I can scan a large RH> body of published Python code. I would want some real usage RH> statistics in a PEP but hate pulling modules down one at a RH> time and grepping them. No, but it would be a really cool tool. And if it's available via cgi or some other commonly available process, Mark Hammond could add it to his neat Mozilla sidebar. Wild thought: write a cron script which does a cvs update every hour, and runs Tools/scripts/eptags.py over the standard library. Then front a simple TAGS file searcher with cgi. Then maybe we can put it up on python.org's cgi-bin directory. -Barry

Trent Mick

4:54 p.m.

[Barry Warsaw wrote]

...

RH> Does anyone know of an automated what that I can scan a large RH> body of published Python code. I would want some real usage RH> statistics in a PEP but hate pulling modules down one at a RH> time and grepping them.

No, but it would be a really cool tool. And if it's available via cgi or some other commonly available process, Mark Hammond could add it to his neat Mozilla sidebar.

Another thought: setup an LXR for Python's source and another LXR for a bunch of common Python extensions/modules. Trent -- Trent Mick TrentM@ActiveState.com

Tim Peters

1:59 a.m.

[Raymond Hettinger]

...

... Will do. I think the resistance to moving the functionals will be fierce. Divmod probably has very few instances in real code. I think __pow__ would need to be left in (as the ** that calls it), but the function itself may be used only rarely.

I suggest you don't slash your chances of success by expanding the scope beyond what you really want. Python is a pleasant language for integer number-crunching code, and some people (I happen to be one of them) have hundreds of instances of divmod and 3-argument pow() in their code (note that the latter cannot be spelled via **). People have been asking for "something like" enumerate() for 10 years, and that's the right amount of time to test that it's really wanted <wink>. Contrarily, people have only been whining about input() for a couple of years (I use it all the time -- and now that you know that I do, I *dare* you to crack my home box as a result <wink>). This thread is the first time I ever saw anyone complain about divmod(). The confusion about builin pow() vs math.pow vs ** is old but infrequent. -1-on-gratuitous-fiddling-ly y'rs - tim

Guido van Rossum

12:18 p.m.

...

Will do. I think the resistance to moving the functionals will be fierce.

You might try to deprecate reduce() though. It's useless.

...

Divmod probably has very few instances in real code.

Not so sure. It's very handy in converting numbers to weird-radix or mixed-radix systems. E.g. it's a natural for converting posix timestamps to broken-out times, and e.g. <CVSROOT>/nondist/sandbox/datetime/datetime.py is full of it.

...

I think __pow__ would need to be left in (as the ** that calls it), but the function itself may be used only rarely.

Unfortunately the function is needed as the API to 3-arg pow() -- which doesn't easily fit in the math library since it's not a floating point thing.

...

Does anyone know of an automated what that I can scan a large body of published Python code. I would want some real usage statistics in a PEP but hate pulling modules down one at a time and grepping them.

You can write something using urllib that pulls things down, and I recommend using the tokenizer module to do the parsing -- much slower than grep, but doesn't get confused by comments etc. See Tools/scripts/finddiv.py for an example. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum

12:04 p.m.

[Barry]

...

I'd much rather see a patch that just changed zip() to be an iterator instead of adding an iterzip(). I doubt that much in-field code would break because of it (but write the PEP to find out. ;).

You're brave. Changing something incompatibly that's been in the language since 2.1? I wish you luck though -- zip() really should have been an interator.

...

GvR> I wish. Since they were there first, it's hard to get rid of GvR> them. (If you're truly masochist, write a PEP and post it to GvR> c.l.py to find out how hard. :-)

The PEP would have to specify a migration plan, i.e. the builtins are identical to the functional module versions, and would a deprecation schedule, etc.

I wasn't serious. There's no way we can deprecate any builtin before Python 3. --Guido van Rossum (home page: http://www.python.org/~guido/)

Greg Ewing

3:40 a.m.

barry@zope.com (Barry A. Warsaw):

...

I'd much rather see a patch that just changed zip() to be an iterator instead of adding an iterzip().

Wouldn't it be better as a lazy sequence a la xrange() that produces an iterator, rather than being an iterator itself? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Guido van Rossum

May 2002

1:16 p.m.

[Barry]

...

...
I'd much rather see a patch that just changed zip() to be an iterator instead of adding an iterzip().

[Greg E]

...

Wouldn't it be better as a lazy sequence a la xrange() that produces an iterator, rather than being an iterator itself?

No, lazy sequences are a thing from the past, trying to do what iterators are for. (And yes, I know that a lazy sequence has different semantics.) --Guido van Rossum (home page: http://www.python.org/~guido/)

Alex Martelli

April 2002

5:59 a.m.

New subject: iterzip 'vs' enumerate

On Monday 29 April 2002 02:16, Raymond Hettinger wrote: ...

...

I posted this one separately because zip() eats memory like crazy and because a Python generator version crawls like a snail.

IMHO, This is a better way to loop over multiple sequences and has a chance at becoming the tool of choice. I scanned all of my

Just musing -- what should enumerate() do when called with N!=1 arguments? Right now it raises, maybe not the most useful thing. Shouldn't it return an iterator for tuples of N+1 items with zip-like behavior? That would neatly substitute for some of my uses of zip, main use cases left out being of the form dict(zip(s1,s2)). [And enumerate() called without arguments should return a "non terminating" iterator yielding 0, 1, 2, ... -- I have a lot of uses of xrange(sys.maxint) that have no real reason to stop at maxint, and other ways to build such an iter, which could go away. But this has little to do with iterzip]. This wouldn't replace all need for iterzip (but might in many cases) but in any case it would be a natural and useful generalization of enumerate's design. YAGNI doesn't seem to apply since looking at my own code with such a general enumerate in mind does show reasonable need (admittedly I crunch a lot of integers in various combinatorial ways and just love the iterator/unending-stream computing architecture/paradigm).

...

...
In general I'm not keen on increasing the number of builtin functions much.

Ditto. Any chance of moving functions like map(), reduce(), and filter() to a functional module; pow() and divmod() to the math module; or input() to oblivion?

The amount of code breakage would be appalling unless some easy way was provided for the programmer to assert "this (module/script/ package/entire Python run) needs the following well-known (names/ name sets) to be installed as built-in names", defaulting at least for now to functional.{math,reduce,filter}, etc. Quite apart from the math.pow issue (current builtin pow should need to go elsewhere than in math -- maybe a new module for "integral-oriented" arithmetic). site and optionally sitecustomize could perhaps help control such a hypothetical "easy way". Easiest in a sense would be do to it directly with import __builtin__ &c, but if that's to remain "strongly discouraged" then some fancy-dress for it, perhaps "unavailable after startup" in the same vein as sys.defaultencoding, could be provided. Perhaps too kludgey, but otherwise it seems to me that no removals from __builtin__ can happen before mythical Python-3000. Alex

Greg Ewing

3:44 a.m.

New subject: iterzip 'vs' enumerate

Alex Martelli <aleax@aleax.it>:

...

Just musing -- what should enumerate() do when called with N!=1 arguments? Right now it raises, maybe not the most useful thing.

Shouldn't it return an iterator for tuples of N+1 items with zip-like behavior?

Maybe instead of introducing a new function, we could just treat enumerate() as a variation on zip(), e.g. zip(seq, ..., numbered = 1) thereby sidestepping the naming problem altogether! Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

8332

Age (days ago)

8336

Last active (days ago)

List overview

Download

85 comments

15 participants

participants (15)

Aahz
Alex Martelli
Andrew MacIntyre
barry＠zope.com
Fredrik Lundh
Greg Ewing
Guido van Rossum
holger krekel
Jeremy Hylton
jeremy＠zope.com
martin＠v.loewis.de
Neil Schemenauer
Raymond Hettinger
Tim Peters
Trent Mick