list() strange behaviour
Cameron Simpson
cs at cskk.id.au
Sun Dec 20 16:09:47 EST 2020
On 20Dec2020 21:00, danilob <tanto at non.va.invalid> wrote:
>I'm an absolute beginner in Python (and in English too ;-)
Well your English is far better than my very poor second language.
>Running this code:
>--------------
># Python 3.9.0
>
>a = [[1, 2, 0, 3, 0],
> [0, 4, 5, 0, 6],
> [7, 0, 8, 0, 9],
> [2, 3, 0, 0, 1],
> [0, 0, 1, 8, 0]]
This is a list of lists.
>b = ((x[0] for x in a))
This is a generator comprehension, and _not_ a list. Explainations
below.
>print(list(b))
>print(list(b))
>---------------
>I get this output:
>
>[1, 0, 7, 2, 0]
As you expect.
>[]
As a surprise.
>I don't know why the second print() output shows an empty list.
>Is it possible that the first print() call might have changed the value
>of "b"?
It hasn't but, it has changed its state. Let me explain.
In Python there are 2 visually similar list-like comprehensions:
This is a _list_ comprehension (note the surrounding square brackets):
[ x for x in range(5) ]
It genuinely constructs a list containing:
[ 0, 1, 2, 3, 4 ]
and would behave as you expect in print().
By contrast, this is a generator comprehension (note the round
brackets):
( x for x in range(5) )
This is a "lazy" construct, and is like writing a generator function:
def g():
for x in range(5):
yield x
It is a little iterable which _counts_ from 0 through 4 inclusive and
yields each value as requested.
Try putting a:
print(b)
before your other print calls. It will not show a list.
So, what is happening?
b = ((x[0] for x in a))
This makes a generator comprehension. The outermost brackets are
redundant, by the way, and can be discarded:
b = (x[0] for x in a)
And does this (using my simpler example range(5)):
>>> b=(x for x in range(5))
>>> b
<generator object <genexpr> at 0x10539e120>
When you make a list from that:
>>> L = list(b)
>>> L
[0, 1, 2, 3, 4]
the generator _runs_ and emits the values to be used in the list. If you
make another list:
>>> L = list(b)
>>> L
[]
The generator has finished. Using it again produces no values, and so
list() constructs an empty list.
That is what is happening in your print statements.
If, instead, you had gone:
b = [x[0] for x in a]
Then "b" would be an actual list (a sequence of values stored in memory)
and your prints would have done what you expect.
Python has several "lazy" operations available, which do not do the
entire computation when they are defined; instead they give you a
"generator" which performs the computation incrementally, running until
the next value is found - when the user asks for the next value, _then_
the generator runs until that value is obtained and "yield"ed.
Supposing your array "a" were extremely large, or perhaps in some way
stored in a database instead of in memory. It might be expensive or very
slow to get _all_ the values. A generator lets you get values as they
are required.
A generator expression like this:
b = ( x for x in range(5) )
counts from 0 through 4 inclusively (or 0 through 5, excluding 5, which
is how ranges egenrally work in Python) when asks. As a function it
might look like this:
def b():
for x in range(5)
yield x
When you call "list(b)" the list constructor collects values from
"iter(b)", which iterates over "b". Like this, written longhand:
L = []
for value in b:
L.append(b)
Written even longer:
L = []
b_values = iter(b)
while True:
try:
value = next(b_values)
except Stopiteration:
break
L.append(value)
which more clearly shows a call to "next()" to run the iterator b_values
once, until it yields a value.
The Python for-statement is a builtin convenient way to write the
while-loop above - it iterates over any iterable like "b".
Cheers,
Cameron Simpson <cs at cskk.id.au>
More information about the Python-list
mailing list