split encloser

Tue Apr 22 07:54:53 EDT 2003

<posted & mailed>

Arm wrote:
   ...
>>     [ <expression> for <target> in <iterable> ]
>> 
>> is known in Python as a "List Comprehension".  There is no issue of
>> "with no loop" -- the "for" keyword inside the list comprehension
>> is indicating exactly the fact that the loop is taking place.  (List
   ...
>>     for mo in are.finditer(instring):
>>         ...
>> (whether written out like this, or inside a list comprehension,
>> makes no difference) makes mo assume, one after the other, the
>> values of RE matchobjects for each of those non-overlapping
   ...
> Alex,
> Thanks for the great explanation. 

You're welcome!

> It seems like both List
> Comprehension and Iterators could use more documentation on
> python.org. 

What, and leave us poor book-writers unemployed?-)  Seriously,
though, the slides of my tutorial on iterators and generators
are on http://www.strakt.com/docs/eup02_itgen_alex.pdf and I'd
have no problem fleshing this out into a text article if there
were demand.  And I'm not sure what more there IS to say about
list comprehensions at a tutorial level -- I cover them in a
page in Python in a Nutshell, and, since the page is part of
Chapter 4, which happens to be the sample chapter for the
Nutshell on O'Reilly's site, you can just download
http://www.oreilly.com/catalog/pythonian/chapter/ch04.pdf
and get it all (and much more:-) for free.

> I think it would have been a lot more straightforward and
> easier to comprehend (at least for beginners), if the object of a
> "list comprehension" had been limited to a "sequence" or an
> "iterable". (I would even prefer a different keyword when using
> iterables).

I'm not sure what you mean here by "a different keyword" or even
by "the object" being "limited".  Any sequence is iterable --
just call iter(S) on any sequence S and you'll get an iterator
on it, so, S is iterable.  Some iterables aren't sequences --
i.e. there are objects X such that iter(X) gets you an iterator,
but you cannot perform other sequence operations on X, such
as indexing X[n] for suitable integers n, slicing, etc, etc.

But in a construct such as "for x in X", whether as a loop
statement or inside a list comprehension, you're only using
the fact that X is iterable -- you're not slicing X, etc etc,
so why would you possibly care in this context whether X is
sliceable etc (i.e. a sequence) or not (a non-sequence iterable)?

And the object you're iterating on MUST be iterable -- no ifs
and buts -- if it isn't you get a suitable error message, e.g.:

>>> [x for x in 42]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: iteration over non-sequence

So, I must be completely missing your point, because I really
do not understand WHAT changes (in the language? in the docs?)
you think "would have been a lot more straightforward and
easier to comprehend (at least for beginners)".  Clarify pls?

> It doesn't make sense to be iterating through an iterator.

Since iterators are iterable, why, sure it does!  given any
iterator x, iter(x) is x.  E.g.:

>>> x=iter(range(999))
>>> iter(x) is x
True

You can iterate on any iterable, and iterators are iterable,
so _of course_ you can iterate on iterators -- it would be
utterly weird if you *couldn't*.

> What is the gain from allowing the iterator to be placed in the
> position a container should be - given that iterable
> containers/objects must return the iterator anyway? 

Are you asking for usage examples where it's particularly
helpful to use an iterator object directly in loops, rather
than being forced to have a full-fledged sequence in order
to perform iteration?  Please read the above-mentioned sample
chapter (half a page on iterators, one-plus on generators --
it's a Nutshell, so of course it's quite terse;-) and the
above-mentioned tutorial slides (well they're just slides
of course, but...).  Then consider for example a task such
as: I have a function process_body(body) that takes a 'body'
(an iterable whose items are lines) and does wonderful things
by looping on it, e.g.:

def process_body(body):
    for line in body:
        process_line(line)

However what I have is a "message" -- an iterable whose
items are lines, but, before the "body", there's an unknown
amount of lines I don't care about (call them "headers")
terminated by a blank line (a line that's whitespace).  So,
how do I extract the body in order to pass it to the above
function process_body?

Thanks to the fact that I can use a for loop on an iterator
just like on any other iterable, it's dead easy:

def process_message(message):
    it = iter(message)
    for line in it:
        if line.isspace(): break
    process_body(it)

Easy, isn't it?  And we could care less whether 'message'
is a full-fledged sequence (e.g. a list of lines) or an
iterable that's not a sequence (e.g. an open text file) --
since all we do is get its iterator, as long as it's an
iterable, we're in clover.  So why ever would one want to
have to have sequences, distinguish (e.g. with different
keywords) between iterating on sequences and iterating on
other iterables, forbid iterating on iterators (why?!),
and all the other complications that you appear to be
advocating?  Guess I must be missing something here...

> And it seems like
> in order to implement the "iterating through an iterator" syntax, an
> equally confusing call was added to iterators - "return self".

If and when you choose to implement an iterator class, you
will need it to have (i.e., define or inherit) a special
method named __iter__ that does indeed use "return self".  So
what's confusing about it (quite apart the fact that there is
absolutely no "call" in this return statement, so, I DO find
it confusing indeed that you choose to call it a "call"!-)?

Just the same "return self" idiom is widespread in many other
special methods (e.g., __iadd__, __imul__, ... just to name
a few), and in a quite widespread programming style (supporting
"method chaining") also in normal methods.  So what's your problem?

Alex