[Python-Dev] Updates to PEP 471, the os.scandir() proposal

Ben Hoyt benhoyt at gmail.com
Wed Jul 9 21:03:20 CEST 2014

This is just getting way too complex ... further thoughts below.

>> This is an interesting idea, but it's just getting more and more
>> complex, and I'm guessing that being able to change the attributes of
>> DirEntry will make the C implementation more complex.
>> Also, I'm not sure it's very workable. For log_err above, you'd
>> actually have to do something like this, right?
>> def log_err(exc, entry):
>>      logger.warn("Cannot stat {}".format(exc.filename))
>>      entry.lstat = os.stat_result((0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
>>      return entry
> I would imagine we would provide a helper function:
>   def stat_result(st_size=0, st_atime=0, st_mtime=0, ...):
>       return os.stat_result((st_size, st_atime, st_mtime, ...))
> and then in onerror
>       entry.lstat = stat_result()
>> Unless there's another simple way around this issue, I'm back to
>> loving the simplicity of option #1, which avoids this whole question.
> Too simple is just as bad as too complex, and properly handling errors is
> rarely a simple task.  Either we provide a clean way to deal with errors in
> the API, or we force every user everywhere to come up with their own system.
> Also, just because we provide it doesn't force people to use it, but if we
> don't provide it then people cannot use it.

So here's the ways in which option #2 is now more complicated than option #1:

1) it has an additional "info" argument, the values of which have to
be documented ('os', 'type', 'lstat', and what each one means)
2) it has an additional "onerror" argument, the signature of which and
fairly complicated return value is non-obvious and has to be
3) it requires user modification of the DirEntry object, which needs
documentation, and is potentially hard to implement
4) because the DirEntry object now allows modification, you need a
stat_result() helper function to help you build your own stat values

I'm afraid points 3 and 4 here add way too much complexity.

Remind me why all this is better than the PEP 471 approach again? It
handles all of these problems, is very direct, and uses built-in
Python constructs (method calls and try/except error handling).

And it's also simple to document -- much simpler than the above 4
things, which could be a couple of pages in the docs. Here's the doc
required for the PEP 471 approach:

"Note about caching and error handling: The is_X() and lstat()
functions may perform an lstat() on first call if the OS didn't
already fetch this data when reading the directory. So if you need
fine-grained error handling, catch OSError exceptions around these
method calls. After the first call, the is_X() and lstat() functions
cache the value on the DirEntry."


More information about the Python-Dev mailing list