[Python-Dev] Updates to PEP 471, the os.scandir() proposal

Paul Moore p.f.moore at gmail.com
Wed Jul 9 15:12:34 CEST 2014


On 9 July 2014 13:48, Ben Hoyt <benhoyt at gmail.com> wrote:
> Okay folks -- please respond: option #1 as per the current PEP 471, or
> option #2 with Ethan's multi-level thing tweaks as per the above?

I'm probably about 50/50 at the moment. What will swing it for me is
likely error handling, so let's try both approaches with some error
handling:

Rules are that we calculate the total size of all files in a tree (as
returned from lstat), with files that fail to stat being logged and
their size assumed to be 0.

Option 1:

def get_tree_size(path):
    total = 0
    for entry in os.scandir(path):
        try:
            isdir = entry.is_dir()
        except OSError:
            logger.warn("Cannot stat {}".format(entry.full_name))
            continue
        if entry.is_dir():
            total += get_tree_size(entry.full_name)
        else:
            try:
                total += entry.lstat().st_size
            except OSError:
                logger.warn("Cannot stat {}".format(entry.full_name))
    return total

Option 2:
def log_err(exc):
    logger.warn("Cannot stat {}".format(exc.filename))

def get_tree_size(path):
    total = 0
    for entry in os.scandir(path, info='lstat', onerror=log_err):
        if entry.is_dir:
            total += get_tree_size(entry.full_name)
        else:
            total += entry.lstat.st_size
    return total

On this basis, #2 wins. However, I'm slightly uncomfortable using the
filename attribute of the exception in the logging, as there is
nothing in the docs saying that this will give a full pathname. I'd
hate to see "Unable to stat __init__.py"!!!

So maybe the onerror function should also receive the DirEntry object
- which will only have the name and full_name attributes, but that's
all that is needed.

OK, looks like option #2 is now my preferred option. My gut instinct
still rebels over an API that deliberately throws information away in
the default case, even though there is now an option to ask it to keep
that information, but I see the logic and can learn to live with it.

Paul


More information about the Python-Dev mailing list