[Python-ideas] fnmatch.filter_false

Wolfgang Maier wolfgang.maier at biologie.uni-freiburg.de
Sat May 20 07:05:18 EDT 2017


On 19.05.2017 20:01, 
tritium-list at sdamon.com wrote:
> 
> 
>> -----Original Message-----
>> From: Python-ideas [mailto:python-ideas-bounces+tritium-
>> list=sdamon.com at python.org] On Behalf Of Wolfgang Maier
>> Sent: Friday, May 19, 2017 10:03 AM
>> To: python-ideas at python.org
>> Subject: Re: [Python-ideas] fnmatch.filter_false
>>
>> On 05/17/2017 07:55 PM,
>> tritium-list at sdamon.com wrote:
>>> Top posting, apologies.
>>>
>>> I'm sure there is a better way to do it, and there is a performance hit,
> but
>>> its negligible.  This is also a three line delta of the function.
>>>
>>> from fnmatch import _compile_pattern, filter as old_filter
>>> import os
>>> import os.path
>>> import posixpath
>>>
>>>
>>> data = os.listdir()
>>>
>>> def filter(names, pat, *, invert=False):
>>>       """Return the subset of the list NAMES that match PAT."""
>>>       result = []
>>>       pat = os.path.normcase(pat)
>>>       match = _compile_pattern(pat)
>>>       if os.path is posixpath:
>>>           # normcase on posix is NOP. Optimize it away from the loop.
>>>           for name in names:
>>>               if bool(match(name)) == (not invert):
>>>                   result.append(name)
>>>       else:
>>>           for name in names:
>>>               if bool(match(os.path.normcase(name))) == (not invert):
>>>                   result.append(name)
>>>       return result
>>>
>>> if __name__ == '__main__':
>>>       import timeit
>>>       print(timeit.timeit(
>>>           "filter(data, '__*')",
>>>           setup="from __main__ import filter, data"
>>>        ))
>>>       print(timeit.timeit(
>>>           "filter(data, '__*')",
>>>           setup="from __main__ import old_filter as filter, data"
>>>       ))
>>>
>>> The first test (modified code) timed at 22.492161903402575, where the
>> second
>>> test (unmodified) timed at 19.555531892032324
>>>
>>
>> If you don't care about slow-downs in this range, you could use this
>> pattern:
>>
>> excluded = set(filter(data, '__*'))
>> result = [item for item in data if item not in excluded]
>>
>> It seems to take just as much longer although the slow-down is not
>> constant but depends on the size of the set you need to generate.
>>
>> Wolfgang
>>
> 
> 
> If I didn't care about performance, I wouldn't be using filter - the only
> reason to use filter over a list comprehension is performance.  The standard
> library has a performant inclusion filter, but does not have a performant
> exclusion filter.
> 

I'm sorry, but then your statement above doesn't make any sense to me:
"I'm sure there is a better way to do it, and there is a performance 
hit, but its negligible."
I'm proposing an alternative to you which times in very similarly to 
your own suggestion without copying or modifying stdlib code.

That said I still like your idea of adding the exclude functionality to 
fnmatch. I just thought you may be interested in a solution that works 
right now.



More information about the Python-ideas mailing list