[Tutor] Is there a simpler way to remove from a set?

Peter Otten __peter__ at web.de
Fri May 7 17:00:19 EDT 2021


On 07/05/2021 02:52, Leam Hall wrote:
> Base code:
> 
> ###
> def build_dir_set(my_list, exclude_dirs):
>    my_set  = set()
>    for item in my_list:
>      path = Path(item)
>      parent = str(path.parent)
>      my_set.add(parent)
>    my_new_set = set()
>    for exclude_dir in exclude_dirs:
>      for parent in my_set:
>        if re.match(exclude_dir, parent):
>          my_new_set.add(parent)
>    return my_set - my_new_set
> ###
> 
> Where "my_list" and "exclude_dirs" are lists of directories

> Can this be made cleaner?

When I want make things "clean" I usually put the "dirty" parts under 
the rug -- or rather into helper functions. The function above would become

def build_dir_set(children, exclude_dirs):
     parents = (get_parent(child) for child in children)
     return {
         dir for dir in parents if not is_excluded(dir, exclude_dirs)
     }

At this point I might reason about parents -- how likely are duplicate 
parents? If there are many, or is_excluded() is costly I might create a 
set instead of the generator expression before applying the filter, i.e.

     parents = {get_parent(child) for child in children}

Now for the helper functions -- get_parent() is obvious:

def get_parent(path):
     return str(Path(path.parent))

However, the Path object seems superfluous if you want strings:

def get_parent(path):
     return os.path.dirname(path)

OK, that's a single function, so it could be inlined.

Now for is_excluded(): using almost your code:

def is_excluded(dir, exclude_dirs):
     for exclude_dir in exclude_dirs:
         if re.match(exclude_dir, dir):
             return True
     return False

To me it seems clearer because it is just a check and does not itself 
build a set. If you are happy* with the test you can consolidate the 
regex to

def is_excluded(dir, exclude_dirs):
     return re.compile("|".join(exclude_dirs).match(dir)

or, using startswith() as suggested by Alan,

def is_excluded(dir, exclude_dirs):
     assert isinstance(exclude_dirs, tuple), "startswith() wants a tuple"
     return dir.startswith(exclude_dirs)

Again we have a single function in the body that we might inline. The 
final function then becomes

def build_dir_set(children, exclude_dirs):
     exclude_dirs = tuple(exclude_dirs)
     parents = {os.path.basename(child) for child in children}
     return {
         dir for dir in parents if not dir.startswith(exclude_dirs)
     }

(*) I think I would /not/ be happy with startswith -- I'd cook up 
something based on os.path.commonpath() to avoid partial matches of 
directory names; at the moment /foo/bar in exclude_dirs would suppress 
/foo/barbaz from the result set.

Also, I'd like to remind you that paths may contain characters that have 
a special meaning in regular expressions -- if you are using them at 
least apply re.escape() before you feed a path to re.compile().

PS: All code untested.



More information about the Tutor mailing list