[Import-SIG] Loading Resources From a Python Module/Package

Brett Cannon brett at python.org
Mon Feb 2 15:18:05 CET 2015


On Sun Feb 01 2015 at 12:28:46 AM Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 1 February 2015 at 08:27, Brett Cannon <brett at python.org> wrote:
> > As I said above, I partially feel like the desire for this support is to
> > work around some API decisions that are somewhat poor.
> >
> > How about this: get_path(package, path, *, real=False) or
> get_path(package,
> > filename, *, real=False) -- depending on whether Barry and me get our way
> > about paths or you do, Donald -- where 'real' is a flag specifying
> whether
> > the path has to work as a path argument to builtins.open() and thus fails
> > accordingly (in instances where it won't work it can fail immediately
> and so
> > loader implementers only have two lines of code to care about to manage
> it).
> > Then loaders can keep their get_data() method without issue and the API
> for
> > loaders only grew by 1 (or stays constant depending on whether we
> want/can
> > have it subsume get_filename() long-term).
>
> Jumping in here, since I specifically object to the "real=<boolean
> flag>" API design concept (on the grounds of that the presence of that
> kind of flag means you have two different methods trying to get out),
> this thread is already quite long and there are several different
> aspects I'd like to comment on :)
>
> * I like the overall naming suggestion of referring to this as a
> "resources" API. That not only has precedent in pkg_resources, but is
> also the standard terminology for referring to this kind of thing in
> rich client applications. (see
> https://msdn.microsoft.com/en-us/library/windows/apps/hh465241.aspx
> for example)
>
> * I think the PEP 302 approach of referring to resource anchors as
> "paths" is inherently confusing, especially when the most common
> anchor is __file__. As a result, I think we should refer to "resource
> anchors" and "relative paths", rather than the current approach of
> trying to create and pass around "absolute paths" (which then end up
> only working properly when packages are installed to a real
> filesystem).
>

Yes, which is what has made this whole discussion "fun". =)


>
> * I think Donald's overview at https://bpaste.net/show/0c490aa07c07 is
> a good summary of the functionality we should aim to provide (naming
> bikesheds aside)
>
> * I agree we should treat extraction and loading of C extension
> modules (and shared libraries in general) as out of scope for the
> resource API. They face several restrictions that don't apply to other
> pure data files
>

I'm not even willing to go there with that. You can talk to Thomas Wouters
at PyCon if you want to hear how he had tried to deal with it at Google.


>
> * I agree that the resource APIs should be for read-only access only.
> Images, localisation strings, application templates, those are the
> kinds of things this API is aimed at: they're an essential part of the
> application, and hence it's appropriate to bundle them with it in a
> way that still works for single-file zip archive applications, but
> they're not Python code.
>
> * For the "must exist as a real shareable filesystem artefact, fail
> immediately if that isn't possible" API, I think we should support
> both implicit cleanup *and* explicit context managers for
> deterministic resource control. "Make this available until I'm done
> with it, regardless of where I use it" and "make this available for
> this defined region of code" are different use cases. Depending on how
> these objects are modelled in the API (more on that below), we could
> potentially drop the atexit handler in favour of suitable
> weakref.finalize() calls (which would then clean them up once the last
> reference to the resource was dropped, rather than always waiting
> until the end of the process - "keep this resource available until the
> process ends" would then be a matter of reference it from the
> appropriate module globals or some other similarly long lived data
> structure). Leaks due to process crashes would then be cleaned up by
> normal OS tempfile management processes.
>
> * I don't think we should couple the concept of resource anchors
> directly to package names (as discussed, it doesn't work for namespace
> packages, for example). I think we *should* be able to *look up*
> resource anchors by package name, although this may fail in some cases
> (such as namespace packages), and that the top level API should do
> that lookup implicitly (allowing package names to be passed wherever
> an anchor is expected). A module object should also be usable as its
> own anchor. I believe we should disallow the use of filesystem paths
> as resource anchors, as that breaks the intended abstraction (looking
> resources up relative to the related modules), and the API behaviour
> is clearer if strings are always assumed to be referring to
> package/module names.
>

Not quite following here. So are you saying we should define the location
as ('foo.bar', 'baz/file.txt') or as ('foo.bar.baz', 'file.txt')? You say
you "don't think we should couple the concept of resource anchors directly"
but then say "we should disallow the use of filesystem paths".


>
> * I *don't* think it's a good idea to incorporate this idea directly
> onto the existing module Loader API. Better to create a new
> "ResourceLoader" abstraction, such that we can easily provide a
> default LocationResourceLoader. Reusing module Loader instances across
> modules would still be permitted, reusing ResourceLoader instances
> *would not*. This allows the resource anchor to be specified when
> creating the resource loader, rather than on every call.
>

You do realize that importlib.abc.ResourceLoader
<https://docs.python.org/3/library/importlib.html#importlib.abc.ResourceLoader>
already exists, right? Otherwise I'm rather confused by the terminology. =)

And are you saying that we should have special rules for
LocationResourceLoader instances such that you can not have to specify the
anchoring package and thus force loader creators to provide unique
instances per package? Or are you talking about some new thing that is tied
to specs?


>
> * As a consequence of the previous point, the ResourceLoader instance
> would be linked *from the module spec* (and perhaps from the module
> globals), rather than from the module loader instance. (This is how we
> would support using a module as its own anchor). Having a resource
> loader defined in the spec would be optional, making it clear that
> namespace modules (for example), don't provide a resource access API -
> if you want to store resources inside a namespace package, you need to
> create a submodule or self-contained subpackage to serve as the
> resource anchor.
>

So are you suggesting we add a new attribute to specs which would store a
certain ABC subclass which implements an API for loading resources?


>
> * As a consequence of making a suitably configured resource loader
> available through the module spec as part of the module finding
> process it would become possible to access module relative resources
> *without actually loading the module itself*.
>

OK, you are suggesting adding a new object type and attribute to specs. Can
we call them "resource readers" so we don't conflate the "loader" term?

And doing it through specs also means that the overhead of requiring the
file name not have any directory parts is not extra overhead.


>
> * If the import system gets a module spec where "spec.has_location" is
> set and Loader.get_data is available, but the new
> "spec.resource_loader" attribute is set to None, then it will set it
> to "LocationResourceLoader(spec.origin)", which will rely solely on
> Loader.get_data() for content access
>

This is a little finicky. Are we going to simply say that we assume
spec.origin is some path that works with os.path functions? Will Windows be
okay if someone decided to standardize on / as a path separator instead of
\ ? I get this buys us support from older loader implementations but I just
want to make sure that it will work 80% of the time before we add more
implicit magic to importlib.


>
> * We'd also provide an optimised FilesystemResourceLoader for use with
> actual installed packages where the resources already exist on disk
> and don't need to be copied to memory or a temporary directory to
> provide a suitable API.
>
> * For abstract data access at the ResourceLoader API level, I like
> "get_anchor()" (returning a suitably descriptive string such that
> "os.path.join(anchor, <relative path>)" will work with get_data() on
> the corresponding module Loader),


I would rather call it get_location() since get_anchor() using 'anchor'
seems to conflate what an anchor is representing.


> "get_bytes(<relative path>)",
> "get_bytestream(<relative path>" and "get_filesystem_path(<relative
> path>)". get_anchor() would be the minimum API, with default
> implementations of the other three based on Loader.get_data(), BytesIO
> and tempfile (this would involve suitable use of lazy or on-demand
> imports for the latter two, as we'd need access to these from
> importlib._bootstrap, but wouldn't want to load them on every
> interpreter startup).
>
> * For the top-level API, I similarly favour
> importlib.resources.get_bytes(), get_bytestream() and
> get_filesystem_path(). However, I would propose that the latter be an
> object implementing a to-be-defined subset of the pathlib Path API,
> rather than a string. Resource listing, etc, would then be handled
> through the existing Path abstraction, rather than defining a new one.
> In the standard library, because we'd just be using a temporary
> directory, we could use real Path objects (although we'd need to add
> weakref support to them to implement the weakref.finalize suggestion I
> make above)
>

Seems reasonable to me to start getting Path objects into the stdlib more.

-Brett


>
> > As for importlib.resources, that can provide a higher-level API for a
> > file-like object along with some way to say whether the file must be
> > addressable on the filesystem to know if tempfile.NamedTemporaryFile()
> may
> > be backing the file-like object or if io.BytesIO could provide the API.
> >
> > This gets me a clean API for loaders and importlib and gets you your real
> > file paths as needed.
>
> Yep, as you can see above, I agree there are two APIs to be designed
> here - the high level user facing one, and the one between the import
> machinery and plugin authors.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20150202/f569b3c2/attachment-0001.html>


More information about the Import-SIG mailing list