[Python-ideas] fnmatch.filter_false

tritium-list at sdamon.com tritium-list at sdamon.com
Fri May 19 14:01:27 EDT 2017



> -----Original Message-----
> From: Python-ideas [mailto:python-ideas-bounces+tritium-
> list=sdamon.com at python.org] On Behalf Of Wolfgang Maier
> Sent: Friday, May 19, 2017 10:03 AM
> To: python-ideas at python.org
> Subject: Re: [Python-ideas] fnmatch.filter_false
> 
> On 05/17/2017 07:55 PM,
> tritium-list at sdamon.com wrote:
> > Top posting, apologies.
> >
> > I'm sure there is a better way to do it, and there is a performance hit,
but
> > its negligible.  This is also a three line delta of the function.
> >
> > from fnmatch import _compile_pattern, filter as old_filter
> > import os
> > import os.path
> > import posixpath
> >
> >
> > data = os.listdir()
> >
> > def filter(names, pat, *, invert=False):
> >      """Return the subset of the list NAMES that match PAT."""
> >      result = []
> >      pat = os.path.normcase(pat)
> >      match = _compile_pattern(pat)
> >      if os.path is posixpath:
> >          # normcase on posix is NOP. Optimize it away from the loop.
> >          for name in names:
> >              if bool(match(name)) == (not invert):
> >                  result.append(name)
> >      else:
> >          for name in names:
> >              if bool(match(os.path.normcase(name))) == (not invert):
> >                  result.append(name)
> >      return result
> >
> > if __name__ == '__main__':
> >      import timeit
> >      print(timeit.timeit(
> >          "filter(data, '__*')",
> >          setup="from __main__ import filter, data"
> >       ))
> >      print(timeit.timeit(
> >          "filter(data, '__*')",
> >          setup="from __main__ import old_filter as filter, data"
> >      ))
> >
> > The first test (modified code) timed at 22.492161903402575, where the
> second
> > test (unmodified) timed at 19.555531892032324
> >
> 
> If you don't care about slow-downs in this range, you could use this
> pattern:
> 
> excluded = set(filter(data, '__*'))
> result = [item for item in data if item not in excluded]
> 
> It seems to take just as much longer although the slow-down is not
> constant but depends on the size of the set you need to generate.
> 
> Wolfgang
> 


If I didn't care about performance, I wouldn't be using filter - the only
reason to use filter over a list comprehension is performance.  The standard
library has a performant inclusion filter, but does not have a performant
exclusion filter.



More information about the Python-ideas mailing list