"sins" (aka, acknowledged language problems)

Sun Dec 26 03:58:06 EST 1999

Neel Krishnaswami writes:

> skaller <skaller at maxtal.com.au> wrote:
> >Alex Martelli wrote:
> >
> >> But why can't I change those 4 lines to, say:
> >>     while line := inp.readline():
> >> using the suggested ":=" operator that I've
> >> seen mentioned now and then?  Or, maybe
> >> even better, "while line from inp.readline()"
> >> or other variants suggested in the past.
> >
> >Ok. You have set me a problem here. I need more cases to examine!
> 
> This is a failure of data structure, not of syntax. What's needed
> is a way to write 
> 
>   for foo in file.readlines():
>       ...
>   
> without allocating an entire list. IOW, we want to write:
> 
>  for foo in file.xreadlines():

That would put the burden on the implementor of the "file" object;
in other words, on the implementor of every object that provides
an "enumerating" method, i.e. one "returning the next element".

For my original problem, I switched to file.readlines(), because
having the whole block of lines around made other things much
easier (recursive expansion of conditional and loop statements
embedded in the template, to be precise).

But, the road to one solution in present-day Python was also
pretty clear from some of the responses I had received (Tim's
and the effbot's, most of all).  "for foo in bar:" internally
expands to calls on bar.__getitem__(i), with i starting at
0 and working upwards, and terminating when __getitem__
raises IndexError.  So, all we need is to implement that in
a wrapper class.

Taking a simple case, where we now write:
    while 1:
        item = container.nextitem()
        if not item:
            break
        # process item

and we'd LIKE to write, say:
    while item from container.nextitem():
        # process item

we can ALREADY write:
    for item in enum(container.nextitem)
        # process item

as long as we have:
    class enum:
        def __init__(self,stepper):
            self.stepper=stepper
        def __getitem__(self,key):
            item=self.stepper()
            if not item:
                raise IndexError
            return item

I have asked on this list whether this
kind of wrapper could present some
gotchas, e.g. in performance, but got
no answer, which is promising:-).

In a more general case, we might want
to pass to the "stepper" methods some
arguments, or to subject items to some
other test than Python's "boolean truth".

A natural approach might be to have
enum's __init__ take more (optional)
arguments, to wit, the tuple of args
to pass to stepper (with apply, of course)
and the "tester" function.  This might be
ok, or it might increase overhead in the
simple case since each __getitem__ would
test what testing it could apply.  A simple
alternative would be a C++-like "factory
and virtual functions beat repeated tests
every day" idiom:
-- rename class enum to enum0,
-- add a class enum1:

    class enum:
        def __init__(self,stepper,tester):
            self.stepper=stepper
            self.tester=tester
        def __getitem__(self,key):
            item=self.stepper()
            if not self.tester(item):
                raise IndexError
            return item

and make of enum itself a factory function:
    def enum(stepper,tester=None):
        if tester:
            return enum1(stepper,tester)
        else:
            return enum0(stepper)

with the obvious generalization, if needed,
to other optional arguments such as tuples
(and/or kw-args dictionaries) to apply to
the stepper and/or tester methods, etc (if
apply has no substantial overhead wrt a
direct call, then it would do just as well to
use it all the time, I guess; not tested or
measured, yet).

> just like we currently write
> 
>  for i in xrange(100):
>     ...
> 
> So the hypothetical xreadlines() method should return an object 
> that reads one additional line from the file each time it is called.

Right; but it need not be hypothetical, see above.

> The general problem that needs fixing is that Python really needs a
> better iteration protocol. (I understand that Guido has worked one
> out, but hasn't yet implemented it. You may want to contact him so

The iteration protocol implemented above seems good enough
to me -- even though I was the one first expressing concern on
this thread.

Maybe putting this 'enum' thingy in the standard distribution
(maybe in some example) would help.

That's part of what I love about Python... it's sharp and limpid
enough that a newbie like me can come up with interesting
solutions such as this one, even without a deep understanding
of the internals.  It just takes a few days' reflection (I hope it
will go down as I move from 'newbie' to 'reasonably skilled
Pythonista', of course:-).

> > ifor k,v in e: ..
> >
> >where k and v are the keys and values of a dictionary,
> >or, the indices and values of a sequence. This is
> >commonly needed, instead of:
> >
> > for k in d.keys():
> > v = d[k]
> > ..
> 
> In Cpython, there is the items() method on dictionaries,
> so you can write:
> 
>   for key, val in dict.items()
>      ...
> 
> There's nothing like it for tuples and lists, but that should be
> fixed.

What about just adding a few lines of code to an example...:

    class seqenum:
        def __init__(self,sequence):
            self.sequence=sequence
        def __getitem__(self,key):
            return (key,self.sequence[key])

That's even simpler than the 'enum0' above (the sequence
itself will raise the IndexError when the time comes to do
so...:-).

> As a general rule, syntax is a bad thing, to be avoided whenever
> possible. Calls for additional syntax are typically a sign that
> one of the basic operations of the semantics needs generalization.

Or that one has not thought about things enough to realize
that the desired semantics already _are_ there, and may just
need a few lines' worth of wrapper code for syntax sugar?-)

> Additional syntax adds cruft that makes that generalization doubly
> harder to see. One, the immediate sop silences the people being
> bothered, so you won't think any more about the problem until it crops
> up again in a seemingly-different context. Two, adding special syntax
> makes it harder to build the appropriate generalization, because the
> special syntax reduces the regularity of the language.

Good point.  So, we need some *inertia* in changing a language's
definition -- at least enough to make sure that the solution space
possible within a given language IS well explored, before coming
to the determination that the language needs extension.

Fortunately, we have it -- and maybe we have a deeper explanation
for Guido's apparent stonewalling about language changes...?-)

Alex