[Tutor] Is there a simpler way to remove from a set?
Peter Otten
__peter__ at web.de
Fri May 7 17:00:19 EDT 2021
On 07/05/2021 02:52, Leam Hall wrote:
> Base code:
>
> ###
> def build_dir_set(my_list, exclude_dirs):
> my_set = set()
> for item in my_list:
> path = Path(item)
> parent = str(path.parent)
> my_set.add(parent)
> my_new_set = set()
> for exclude_dir in exclude_dirs:
> for parent in my_set:
> if re.match(exclude_dir, parent):
> my_new_set.add(parent)
> return my_set - my_new_set
> ###
>
> Where "my_list" and "exclude_dirs" are lists of directories
> Can this be made cleaner?
When I want make things "clean" I usually put the "dirty" parts under
the rug -- or rather into helper functions. The function above would become
def build_dir_set(children, exclude_dirs):
parents = (get_parent(child) for child in children)
return {
dir for dir in parents if not is_excluded(dir, exclude_dirs)
}
At this point I might reason about parents -- how likely are duplicate
parents? If there are many, or is_excluded() is costly I might create a
set instead of the generator expression before applying the filter, i.e.
parents = {get_parent(child) for child in children}
Now for the helper functions -- get_parent() is obvious:
def get_parent(path):
return str(Path(path.parent))
However, the Path object seems superfluous if you want strings:
def get_parent(path):
return os.path.dirname(path)
OK, that's a single function, so it could be inlined.
Now for is_excluded(): using almost your code:
def is_excluded(dir, exclude_dirs):
for exclude_dir in exclude_dirs:
if re.match(exclude_dir, dir):
return True
return False
To me it seems clearer because it is just a check and does not itself
build a set. If you are happy* with the test you can consolidate the
regex to
def is_excluded(dir, exclude_dirs):
return re.compile("|".join(exclude_dirs).match(dir)
or, using startswith() as suggested by Alan,
def is_excluded(dir, exclude_dirs):
assert isinstance(exclude_dirs, tuple), "startswith() wants a tuple"
return dir.startswith(exclude_dirs)
Again we have a single function in the body that we might inline. The
final function then becomes
def build_dir_set(children, exclude_dirs):
exclude_dirs = tuple(exclude_dirs)
parents = {os.path.basename(child) for child in children}
return {
dir for dir in parents if not dir.startswith(exclude_dirs)
}
(*) I think I would /not/ be happy with startswith -- I'd cook up
something based on os.path.commonpath() to avoid partial matches of
directory names; at the moment /foo/bar in exclude_dirs would suppress
/foo/barbaz from the result set.
Also, I'd like to remind you that paths may contain characters that have
a special meaning in regular expressions -- if you are using them at
least apply re.escape() before you feed a path to re.compile().
PS: All code untested.
More information about the Tutor
mailing list