[Python-ideas] fnmatch.filter_false
Wolfgang Maier
wolfgang.maier at biologie.uni-freiburg.de
Sat May 20 07:05:18 EDT 2017
On 19.05.2017 20:01,
tritium-list at sdamon.com wrote:
>
>
>> -----Original Message-----
>> From: Python-ideas [mailto:python-ideas-bounces+tritium-
>> list=sdamon.com at python.org] On Behalf Of Wolfgang Maier
>> Sent: Friday, May 19, 2017 10:03 AM
>> To: python-ideas at python.org
>> Subject: Re: [Python-ideas] fnmatch.filter_false
>>
>> On 05/17/2017 07:55 PM,
>> tritium-list at sdamon.com wrote:
>>> Top posting, apologies.
>>>
>>> I'm sure there is a better way to do it, and there is a performance hit,
> but
>>> its negligible. This is also a three line delta of the function.
>>>
>>> from fnmatch import _compile_pattern, filter as old_filter
>>> import os
>>> import os.path
>>> import posixpath
>>>
>>>
>>> data = os.listdir()
>>>
>>> def filter(names, pat, *, invert=False):
>>> """Return the subset of the list NAMES that match PAT."""
>>> result = []
>>> pat = os.path.normcase(pat)
>>> match = _compile_pattern(pat)
>>> if os.path is posixpath:
>>> # normcase on posix is NOP. Optimize it away from the loop.
>>> for name in names:
>>> if bool(match(name)) == (not invert):
>>> result.append(name)
>>> else:
>>> for name in names:
>>> if bool(match(os.path.normcase(name))) == (not invert):
>>> result.append(name)
>>> return result
>>>
>>> if __name__ == '__main__':
>>> import timeit
>>> print(timeit.timeit(
>>> "filter(data, '__*')",
>>> setup="from __main__ import filter, data"
>>> ))
>>> print(timeit.timeit(
>>> "filter(data, '__*')",
>>> setup="from __main__ import old_filter as filter, data"
>>> ))
>>>
>>> The first test (modified code) timed at 22.492161903402575, where the
>> second
>>> test (unmodified) timed at 19.555531892032324
>>>
>>
>> If you don't care about slow-downs in this range, you could use this
>> pattern:
>>
>> excluded = set(filter(data, '__*'))
>> result = [item for item in data if item not in excluded]
>>
>> It seems to take just as much longer although the slow-down is not
>> constant but depends on the size of the set you need to generate.
>>
>> Wolfgang
>>
>
>
> If I didn't care about performance, I wouldn't be using filter - the only
> reason to use filter over a list comprehension is performance. The standard
> library has a performant inclusion filter, but does not have a performant
> exclusion filter.
>
I'm sorry, but then your statement above doesn't make any sense to me:
"I'm sure there is a better way to do it, and there is a performance
hit, but its negligible."
I'm proposing an alternative to you which times in very similarly to
your own suggestion without copying or modifying stdlib code.
That said I still like your idea of adding the exclude functionality to
fnmatch. I just thought you may be interested in a solution that works
right now.
More information about the Python-ideas
mailing list