[ python-Bugs-1517509 ] filter() implementation does not match docs

SourceForge.net noreply at sourceforge.net
Fri Jul 14 00:35:30 CEST 2006


Bugs item #1517509, was opened at 2006-07-05 08:09
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1517509&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: Python 2.5
Status: Closed
Resolution: Wont Fix
Priority: 5
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: filter() implementation does not match docs

Initial Comment:
The docs for the built-in function filter() claim that
"filter(function, list) is equivalent to [item for item
in list if function(item)] if function is not None and
[item for item in list if item] if function is None".

>>> class infinite_str(str):
...     def __getitem__(self, index):
...         return "a"
...
>>> filter(None, infinite_str("1234"))
'aaaa'


Now, if we translate this to a listcomp according to
the docs:

>>> [x for x in infinite_str("1234") if x]

The listcomp version proceeds to chew up memory until
it exhausts the system resources or is killed by the user.

If the docs are to be believed, the filter() version
should do the same thing. 

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2006-07-13 17:35

Message:
Logged In: YES 
user_id=80475

It's a speed hack.  The result list for filter() is
pre-allocated to the input length and then resized downward
to the actual length.

Stop posting patches for non-bugs.


----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2006-07-13 14:26

Message:
Logged In: YES 
user_id=1344176

I've posted a patch for the __len__ issue; it's SF #1522016.

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2006-07-05 10:16

Message:
Logged In: YES 
user_id=1344176

I wasn't just reviewing the docs; I came upon this while
re-implementing filter() in pure Python for a compatibility
module I'm working on. I discovered this particular
implementation detail when my naive implementation (as a
listcomp) didn't pass test_builtin.

My question is why filter() even cares about __len__, ie,
why it doesn't simply use the iterator protocol. This is
especially curious since filter() actively ignores the
iterator protocol when dealing with subclasses of str,
unicode and tuple:

>>> class badstr(str):
...     def __len__(self):
...         return 4
...     def __getitem__(self, index):
...         return "a"
...     def __iter__(self):
...         for i in range(5):
...             yield "b"
...
>>> filter(None, badseq("cccc"))
aaaa
>>> [x for x in badseq("cccc") if x]
['b', 'b', 'b', 'b', 'b']

Ignore the difference in return types (which is expected),
these two do not do the same thing; filter() uses a
combination of __len__ and __getitem__ while the listcomp
version uses the iterator protocol.

Clearly, in this case, "is equivalent to" means "bears only
a superficial relationship to".

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2006-07-05 09:58

Message:
Logged In: YES 
user_id=80475

Please take a larger view when reviewing the docs. 
The listcomp analogy is very helpful in explaining what 
filter() does and readers would not benefit by its removal.

Throughtout the docs, the phrase "is equivalent to" does 
not mean "is identical to" or "exactly the same as".   In 
this case, you have isolated a non-guaranteed 
implementation detail that is almost always irrelevant.  
When an object such as infinite_str lies about its length, 
the consequent behavior is undefined.  It is not hard to 
produce weird results when objects violate basic 
invariants such as len(istr)!=len(list(istr)) or the 
expected relation between __eq__ and __hash__.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1517509&group_id=5470


More information about the Python-bugs-list mailing list