[Python-ideas] find-like functionality in pathlib

Thu Jan 7 04:08:45 EST 2016

The PyFilesystem filesystem abstraction APIs may also have / be in need of
a sensible .walk() API
http://pyfilesystem.readthedocs.org/en/latest/path.html#module-fs.path

http://pyfilesystem.readthedocs.org/en/latest/interface.html

  walk() Like listdir() but descends in to sub-directories

  walkdirs() Returns an iterable of paths to sub-directories

  walkfiles() Returns an iterable of file paths in a directory, and its
sub-directories
On Jan 7, 2016 3:03 AM, "Wes Turner" <wes.turner at gmail.com> wrote:

> A bit OT, possibly, but this may be a long way around (to a cached *graph*
> of paths and metadata) with similar use cases:
>
> path.py#walk(), NetworkX edge, node dicts
>
> https://github.com/westurner/pyleset/blob/249a0837/structp/structp.py
>
> def walk_path_into_graph(g, path_, errors='warn'):
>     """
>     """
>
> This stats and reads limited image format metadata as CSV, TSV, JSON:
> https://github.com/westurner/image_size/blob/ab46de73/get_image_size.py
>
> I suppose because of race conditions this metadata should actually be
> stored in a filesystem triplestore with extended attributes and also
> secontext attributes.
>
> (... gnome-tracker reads filesystem stat data into RDF, for SPARQL).
>
> BSP vertex messaging can probably handle cascading cache invalidation
> (with supersteps).
> On Jan 6, 2016 4:44 PM, "Guido van Rossum" <guido at python.org> wrote:
>
>> I couldn't help myself and coded up a prototype for the StatCache design
>> I sketched. See http://bugs.python.org/issue26031. Feedback welcome! On
>> my Mac it only seems to offer limited benefits though...
>>
>> On Wed, Jan 6, 2016 at 8:48 AM, Guido van Rossum <guido at python.org>
>> wrote:
>>
>>> On Wed, Jan 6, 2016 at 8:11 AM, Random832 <random832 at fastmail.com>
>>> wrote:
>>>
>>>> On Tue, Jan 5, 2016, at 16:04, Guido van Rossum wrote:
>>>> > One problem with stat() caching is that Path objects are considered
>>>> > immutable, and two Path objects referring to the same path are
>>>> completely
>>>> > interchangeable. For example, {pathlib.Path('/a'),
>>>> pathlib.Path('/a')} is
>>>> > a set of length 1: {PosixPath('/a')}. But if we had e.g. Path('/a',
>>>> > cache_stat=True), the behavior of two instances of that object might
>>>> be
>>>> > observably different (if they were instantiated at times when the
>>>> > contents of the filesystem was different). So maybe stat-caching Path
>>>> instances
>>>> > should be considered unequal, or perhaps unhashable. Or perhaps they
>>>> > should only be considered equal if their stat() values are actually
>>>> equal (i.e.
>>>> > if the file's stat() info didn't change).
>>>>
>>>> What about a global cache?
>>>
>>>
>>> It would have to use a weak dict so if the last reference goes away it
>>> discards the cached stats for a given path, otherwise you'd have trouble
>>> containing the cache size.
>>>
>>> And caching Path objects should still not be comparable to non-caching
>>> Path objects (which we will need to preserve the semantics that repeatedly
>>> calling stat() on a Path object created the default way will always redo
>>> the syscall). The main advantage would be that caching Path objects could
>>> be compared safely.
>>>
>>> It could still cause unexpected results. E.g. if you have just traversed
>>> some big tree using caching, and saved some results (so hanging on to some
>>> paths and hence their stat() results), and then you make some changes and
>>> traverse it again to look for something else, you might accidentally be
>>> seeing stale (i.e. cached) stat() results.
>>>
>>> Maybe there's a middle ground, where the user can create a StatCache
>>> object and pass it into Path creation and traversal operations. Paths with
>>> the same StatCache object (or both None) compare equal if their path
>>> components are equal. Paths with different StatCache objects never compare
>>> equal (but otherwise are ordered by path as usual -- the StatCache object's
>>> identity is only used when the paths are equal.
>>>
>>> Are you (or anyone still reading this) interested in implementing this
>>> idea?
>>>
>>> --
>>> --Guido van Rossum (python.org/~guido)
>>>
>>
>>
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160107/ed362879/attachment.html>