Real-world use cases for map's None fill-in feature?

Tue Jan 10 23:32:49 EST 2006

Raymond Hettinger <python at rcn.com> wrote:
>
> > > History of zip()
> > > ----------------
> > > PEP 201 (lock-step iteration) documents that a fill-in feature was
> > > contemplated and rejected for the zip() built-in introduced in Py2.0.
> > > In the years before and after, SourceForge logs show no requests for a
> > > fill-in feature.
> >
> > My perception is that many people view the process
> > of advocating for a library addition as
> > 1. Very time consuming due to the large amount of
> >    work involved in presenting and defending a proposal.
>
> I would characterize it as time consuming due to the amount of
> research, discussion, and analysis it takes to determine whether or not
> a proposal is a good idea.
>
> > 2. Having a very small chance of acceptance.
>
> It is less a matter of chance and more a matter of quality.  Great
> ideas usually make it. Crummy ideas have no chance unless no one takes
> the time to think them through.

Great and crummy are not the problem, since the answer
in those cases is obvious.  It is the middle ground where
the answer is not clear, where different people can hold
different views, that are the problem.

> > I do not know whether this is really the case or even if my
> > perception is correct, but if it is, it could account for the
> > lack of feature requests.
>
> I've been monitoring and adjudicating feature requests for five years.
> Pythonistas are not known for the lack of assertiveness.  If a core
> feature has usability problems, we tend to hear about it quickly.
> Also, at PyCon, people are not shy about discussing issues that have
> arisen.

Yet these are the people both most familiar with the
library as it exists and the most able to easily work
around any limitations, maybe without even thinking
about it.  So I am not surprised that this might not
have come up.

To me, the izip solution for my use case was "obvious".
None of the other solutions posted here were.
Of course that could be fixed with documentation.

> The lack of requests is not a definitive answer; however, it does
> suggest that there is not an strong unmet need.  The lack of examples
> in the standard library and other code scans corroborates that notion.
> This newsgroup query with further serve to gauge the level of interest
> and to ferret-out real-word use cases.  The jury is still out.

Comments at end re use cases.

> > How well correlated in the use of map()-with-fill with the
> > (need for) the use of zip/izip-with-fill?
>
> Close to 100%.  A non-iterator version of izip_longest() is exactly
> equivalent to map(None, it1, it2, ...).

Isn't non-iterator and iterator very significant?  If I use map()
I can trivially determine the arguments lengths and deal with
unequal length before map().  With iterators that is more
difficult.  So I can imagine many cases where izip might
be applicable but map not, and a lack of map use cases
not representative of izip use cases.

> Since "we already got one", the real issue is whether it has been so
> darned useful that it warrants a second variant with two new features
> (returns an iterator instead of a list and allows a user-specifiable
> fill value).

I don't see it as having one and adding a second variant.
I see it as having 1/2 and adding the other 1/2.

 > > FWIW, the OP's use case involved printing files in multiple
> > > columns:
> > >
> > >     for f, g in itertools.izip_longest(file1, file2, fillin_value=''):
> > >         print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip())
>  . . .
>
> > Actuall my use case did not have quite so much
> > perlish line noise :-)
>
> The code was not intended to recapitulate your thread; instead, it was
> a compact way of summarizing the problem context that first suggested
> some value to izip_longest().

I realize that.  I just thought that having a
lot extraneous stuff like the formatting made
it look at first glance, messier than it should.

> >     for i1, i2 in itertools.izip (iterable_1, iterable_2):
> >           print '%-20s\t|\t%-20s' % (i1.rstrip(), i2.rstrip())
> >
> > can be replaced by:
> >     while 1:
> >         i1 = iterable_1.next()
> >         i2 = iterable_2.next()
> >         print '%-20s\t|\t%-20s' % (i1.rstrip(), i2.rstrip())
> >
> > yet that was not justification for rejecting izip()'s
> > inclusion in itertools.
>
> Two thoughts:
>
> 1) The easily-coded-simple-alternative argument applies less strongly
> to common cases (equal sequence lengths and finite sequences mixed with
> infinite suppliers) than it does to less common cases (unequal sequence
> lengths where order is important and missing data elements have
> meaning).
>
> 2) The replacement code is not quite accurate -- the StopIteration
> exception needs to be trapped.

Yes, but I don't think that negates the point.

> > The other use case I had was a simple file diff.
> > All I cared about was if the files were the same or
> > not, and if not, what were the first differing lines.
>
> Did you look at difflib?

Yes, but it was way overkill for what I needed.

> Raymond

~~~
Thanks for your response but I'm curious why you
mailed it rather than posted?

I am still left with a difficult to express feeling of
dissatifaction at this process.

Plese try to see it from the point of view of
someone who it not a expert at Python:

Here is izip().
My conception is it takes two sequence generators
and matches up the items from each.  (I am talking
overall coceptual models here, not details.)
Here is my problem.
I have two files that produce lines and I want to
compare each line.
Seems like a perfect fit.

So I read that izip() only goes to shortest itereable,
I think, "why only the shortest?  why not the longest?
what's so special about the shortest?"
At this point explanations involving lack of uses cases
are not very convincing.  I have a use.  All the
alternative solutions are more code, less clear, less
obvious, less right.  But most importantly, there
seems to be a symmetry between the two cases
(shortest vs longest) that makes the lack of
support for matching-to-longest somehow a
defect.

Now if there is something fundamental about
matching items in parallel lists that makes it a
sensible thing to do only for equal lists (or to the
shortest list) that's fine.  You seem to imply that's
the case by referencing Haskell, ML, etc.  If so,
that needs to be pointed out in izip's docs.
(Though nothing I have read in this thread has
been convincing.)

If it is the case that a matching-longest izip is easily
handled by adding a line or to code using izip-shortest
that should be pointed out in the doc.

But if the answer is to write out an equivalent generator
in basic python, I cannot see izip but as being
excessively specialized, and needing to be fixed.

Re use-cases...

Uses cases seem to be sought from readers
of c.l.p. and python-dev.  That is a pretty small
percentage of python users, and those that
choose to respond are self-selecting.  I would
expect the distribution of responders to be
skewed toward advanced users for example.
The other source seems to be a search of
the standard libraries but isn't that also likely
not representative of all the code out in the
wild?

Also, can anyone really remember their code
well enough to recall when some proposed
enhancement would be beneficial?

What I am suggesting is that use cases are
important but it also should be realized is that
they may not always give an accurate quantitative
picture, and that some things still might be good
ideas even without use cases (and the converse of
course), not because the use cases don't exist,
but because they may not be seen by the current
use case solicitation process.