Mailman 3 Re: Resource imports (as strings/bytes) - Python-ideas

newer
Re: addition of "nameof" operator

Re: Resource imports (as strings/bytes)

older
List - append

Steven D'Aprano

Jan. 20, 2020

12:54 a.m.

On Sun, Jan 19, 2020 at 08:09:32PM -0300, Soni L. wrote:

...

We have importlib. We have importlib.resources. We can import modules. We cannot (yet) import resources using the same-ish module import machinery.

Actually we can, using importlib.resources, which you already mentioned. The machinery is right there, although the documentation lacks a good tutorial or How To. Nevertheless, I got it to work correctly the first time after less than a minute's reading of the docs: * create a package in the PYTHONPATH spam/ +-- __init__.py * add a text file "config" under the spam directory * run: import importlib.resources import spam rsrc = importlib.resources.read_text(spam, "config") and it Just Works. What we can't (easily) do is import resources using the import *syntax*. I am lead to believe that there is such as thing as "import hooks" that let you customize how and what gets imported. I have no idea how these hooks work, perhaps there's a good How To or guide somewhere? But as I understand it, you could right a hook which allows you to use the import syntax to load any thing you want from anywhere you want, so long as it fits into the existing import syntax.

...

from foo.bar import resources "foo.txt" as foo, "bar.txt" as bar

The downsides of this proposal include: (1) It requires a new keyword, one which I'm sure will clash with about a thousand bazillion modules that already use "resources" as a variable. (2) It requires a change to the syntax of import statements to allow quoted file names, or else it requires you to name your data files with names which are legal identifiers. E.g. "foo" rather than "foo.txt". I don't think this functionality is important enough or common enough to warrant not one but two new pieces of syntax, not when it is so easy to get the same functionality using the functional syntax rsrc = importlib.resources.read_text(spam, "config") But perhaps you can start with an import hook that lets you write: import spam.config as rsrc and leave the functional syntax for file names which aren't valid identifiers.

...

or maybe:

foo = f"{from foo.bar import 'foo.txt'}"

An f-string is a disguised eval expression that returns a string. Using the same f-string syntax as a disguised import statement is a great example of the "Golden Hammer" anti-pattern applied to language design. "F-strings are cool, so we should make completely unrelated things like loading resources look like an f-string!". https://en.wikipedia.org/wiki/Law_of_the_instrument This likewise has a disadvantage that it requires new syntax. F-strings are expressions and they requires all the evaluated terms to be expressions, not statements: py> f'{import math}' File "<fstring>", line 1 (import math) ^ SyntaxError: invalid syntax so your proposal would require f-strings to: * accept import and from...import statements as a special case; * rather than eval these terms, exec them, while still eval'ing everything else; * but using a special, implicit, version of import which calls importlib.resources.read_text rather than the standard import; * leading to confusion when people try things like f'{from math import pi}' and get a surprising TypeError that math is not a package. -- Steven

Show replies by date

Soni L.

January 2020

8:56 a.m.

New subject: Resource imports (as strings/bytes)

On 2020-01-20 5:54 a.m., Steven D'Aprano wrote:

...

On Sun, Jan 19, 2020 at 08:09:32PM -0300, Soni L. wrote:

...
We have importlib. We have importlib.resources. We can import modules. We cannot (yet) import resources using the same-ish module import machinery.

Actually we can, using importlib.resources, which you already mentioned. The machinery is right there, although the documentation lacks a good tutorial or How To. Nevertheless, I got it to work correctly the first time after less than a minute's reading of the docs:

* create a package in the PYTHONPATH

spam/ +-- __init__.py

* add a text file "config" under the spam directory

* run:

import importlib.resources import spam rsrc = importlib.resources.read_text(spam, "config")

and it Just Works.

What we can't (easily) do is import resources using the import *syntax*.

I am lead to believe that there is such as thing as "import hooks" that let you customize how and what gets imported. I have no idea how these hooks work, perhaps there's a good How To or guide somewhere? But as I understand it, you could right a hook which allows you to use the import syntax to load any thing you want from anywhere you want, so long as it fits into the existing import syntax.

...
from foo.bar import resources "foo.txt" as foo, "bar.txt" as bar

The downsides of this proposal include:

(1) It requires a new keyword, one which I'm sure will clash with about a thousand bazillion modules that already use "resources" as a variable.

(2) It requires a change to the syntax of import statements to allow quoted file names, or else it requires you to name your data files with names which are legal identifiers. E.g. "foo" rather than "foo.txt".

I don't think this functionality is important enough or common enough to warrant not one but two new pieces of syntax, not when it is so easy to get the same functionality using the functional syntax

rsrc = importlib.resources.read_text(spam, "config")

But perhaps you can start with an import hook that lets you write:

import spam.config as rsrc

and leave the functional syntax for file names which aren't valid identifiers.

...
or maybe:

foo = f"{from foo.bar import 'foo.txt'}"

An f-string is a disguised eval expression that returns a string. Using the same f-string syntax as a disguised import statement is a great example of the "Golden Hammer" anti-pattern applied to language design. "F-strings are cool, so we should make completely unrelated things like loading resources look like an f-string!".

https://en.wikipedia.org/wiki/Law_of_the_instrument

This likewise has a disadvantage that it requires new syntax. F-strings are expressions and they requires all the evaluated terms to be expressions, not statements:

py> f'{import math}' File "<fstring>", line 1 (import math) ^ SyntaxError: invalid syntax

so your proposal would require f-strings to:

* accept import and from...import statements as a special case;

* rather than eval these terms, exec them, while still eval'ing everything else;

* but using a special, implicit, version of import which calls importlib.resources.read_text rather than the standard import;

* leading to confusion when people try things like

f'{from math import pi}'

and get a surprising TypeError that math is not a package.

that pi is not a resource*

Steven D'Aprano

12:13 p.m.

New subject: Resource imports (as strings/bytes)

Soni, you seem to be using Thunderbird as a mail client. As far as I remember from my time with Thunderbird, it allows, and makes it quite simple, to trim your quoting. There's no need to quote an entire 100 line message to add a single one sentence comment at the end: it is not nice to your readers to force them to scroll past paragraph after paragraph of text they have already read that isn't directly relevant to your reply. I wrote:

...

...
* but using a special, implicit, version of import which calls importlib.resources.read_text rather than the standard import;

* leading to confusion when people try things like

f'{from math import pi}'

and get a surprising TypeError that math is not a package.

And Soni replied:

...

that pi is not a resource*

I certainly hope not! The error message should be the *direct* cause of the error. In this case, pi is not a resource because math is not a package, and so we should report that error. Think about how you would fix the problem: before you can add a "pi" resource, you first have to make math a package. Likewise, if we wrote something like this: f'{from blibble_dibble import pi}' we ought to get something like: ModuleNotFoundError: No module named 'blibble_dibble' rather than waste the caller's time trying to diagnose why the import system can't see the resource "pi" when the actual problem is that the import system can't even see the package itself. (Perhaps it is not on the path, perhaps it is misspelled.) -- Steven

Soni L.

4:13 p.m.

New subject: Resource imports (as strings/bytes)

On 2020-01-20 5:13 p.m., Steven D'Aprano wrote:

...

Soni, you seem to be using Thunderbird as a mail client. As far as I remember from my time with Thunderbird, it allows, and makes it quite simple, to trim your quoting. There's no need to quote an entire 100 line message to add a single one sentence comment at the end: it is not nice to your readers to force them to scroll past paragraph after paragraph of text they have already read that isn't directly relevant to your reply.

I generally do that but I was frustrated and exhausted. That is, that may have been slightly passive-aggressive... Sorry.

...

I wrote:

...
...
* but using a special, implicit, version of import which calls importlib.resources.read_text rather than the standard import;

* leading to confusion when people try things like

f'{from math import pi}'

and get a surprising TypeError that math is not a package.

And Soni replied:

...
that pi is not a resource*

I certainly hope not! The error message should be the *direct* cause of the error. In this case, pi is not a resource because math is not a package, and so we should report that error.

Think about how you would fix the problem: before you can add a "pi" resource, you first have to make math a package. Likewise, if we wrote something like this:

f'{from blibble_dibble import pi}'

we ought to get something like:

ModuleNotFoundError: No module named 'blibble_dibble'

rather than waste the caller's time trying to diagnose why the import system can't see the resource "pi" when the actual problem is that the import system can't even see the package itself. (Perhaps it is not on the path, perhaps it is misspelled.)

f'{from math import pi}' ^ SyntaxError: invalid syntax (at compile-time) f'{from math import "pi"}' ResourceNotFoundError: No resource named 'pi' @ 'math'; 'math' is not a package (at runtime) in any case the whole thing I'm arguing for in this thread, is to *draw parallels* between module imports and resource imports. ppl talk about it like it would be "confusingly similar" but I argue that it would be *non-confusingly* similar instead. because the whole point of this syntax *is* to look similar to the other syntax. it helps show the users the parallels between the two - they both use the import machinery, they do similar things (and, they *do* do similar things - they both load stuff from a package!), etc. I *want* the parallels to be drawn. I *want* the similarities to be highlighted. this is where using importlib.resources breaks down because it *doesn't* highlight those similarities. how many ppl know about importlib.resources? I still have my (large-ish) strings (such as built-in HTML and TOML templates, etc) shoved straight into my code instead of loading them with importlib.resources, altho that's mostly because I haven't refactored it to use importlib.resources. The python tutorial doesn't even touch on managing and importing resources. If it had its own syntax, the tutorial would be forced to talk about it. I think this would be a huge win for everyone.

Christopher Barker

9:03 a.m.

New subject: Resource imports (as strings/bytes)

On Mon, Jan 20, 2020 at 4:15 PM Soni L. <fakedme+py@gmail.com> wrote:

...

I generally do that but I was frustrated and exhausted. That is, that may have been slightly passive-aggressive... Sorry.

and yet you barely trimmed this one ;-) in any case the whole thing I'm arguing for in this thread, is to *draw

...

parallels* between module imports and resource imports.

The problem I see with all this is that there isn't much of a parallel -- modules can only contain Python objects, and, like the title of this thread indicates, the only two python objects that directly map to a file are strings and bytes. You *may* be able to do something directly with strings, but most likely you'll pass it off to something else: a template renderer, JSON parser, what have you. And I can’t think of a single instance where you would just want the bytes in a file without processing them into a Python object. Given that you have to do that next step anyway, I don't see much gain here. That is: what’s wrong with hard-coding, say, a template into Python source and assigning it a string? And we DO have importlib and setuptools solutions already -- if you really think they're useful, then better documentation is in order. That being said, I have wanted to put resources in with my Python code, for two reasons: 1) It’s easier to bundle them up with the package (I used to use py2exe and the like a lot) 2) it’s nice to be able to simply import something and have the Python object I want right away. For (1): this is made easier by the current packaging solutions — it could be a bit cleaner (messing around with __file__ is pretty ugly) — so by all means explore the available solutions and maybe make a better one. For (2) — see above— I want the relevant Python object, not just the string or bytes. A good example of this is the utilities (img2py I think) that come with wxPython: They bundle up a set of images into a Python module that creates wxImages on the fly. So you can import the module, and have a set of wxImages ready to go. That is pretty nice. I do think there could be some utilities to make that kind of thing easier, but making it easy, with a built in, to simply get the bytes or text from a file wouldn’t buy us much. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

David Mertz

9:38 a.m.

New subject: Resource imports (as strings/bytes)

On Tue, Jan 21, 2020 at 12:15 PM Christopher Barker <pythonchb@gmail.com> wrote:

...

You *may* be able to do something directly with strings, but most likely you'll pass it off to something else: a template renderer, JSON parser, what have you. And I can’t think of a single instance where you would just want the bytes in a file without processing them into a Python object. Given that you have to do that next step anyway, I don't see much gain here.

I've gotten in the habit of putting resources inside modules, moderately often. But what I've been doing lately is developing training materials, which I think has a different use case. I'm not sure I'd want to do that nearly so much in production code. I absolutely do not get the desire to have the import mechanism create string objects, let alone stick them in f-strings. That just seems strange and unnatural to me. The sort of thing I find myself doing is: from resources import data1, data2, data3 Then later in the training notebooks, I'll demonstrate doing various things with the datasets. but the data objects are not raw string objects, they are NumPy arrays, or Pandas DataFrames, or nested dictionaries, or some other more complex data that is relevant to what I am teaching. Underneath that are steps like: # resources.py import special_lib data1 = special_lib.reader(src, option1=foo, option2=bar) touch_up(data1, with_=this, also=that) Where special_lib is something like numpy, or simplejson, or xarray, or whatever. The actual data may or may not live inside the same resources.py file. Even if I was showing off text processing on a string, this same pattern works fine. One of those data objects can be a string or bytes object. But again, I mostly do this because *for a particular lesson*, I want to draw attention to working with that particular data rather than to the specifics of the loading mechanism. In "real code" I'd still put those few lines in the context where the data was processed, usually. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Andrew Barnert

11:22 a.m.

New subject: Resource imports (as strings/bytes)

On Jan 21, 2020, at 09:13, Christopher Barker <pythonchb@gmail.com> wrote:

...

For (2) — see above— I want the relevant Python object, not just the string or bytes. A good example of this is the utilities (img2py I think) that come with wxPython: They bundle up a set of images into a Python module that creates wxImages on the fly. So you can import the module, and have a set of wxImages ready to go. That is pretty nice.

I do think there could be some utilities to make that kind of thing easier, but making it easy, with a built in, to simply get the bytes or text from a file wouldn’t buy us much.

I just had a thought on this. What if you had an importhooker module that allowed you to do this in your top-level script or module: from importhooker import hook import json from PIL import Image import mything @hook(text=True) def json(f): return json.load(f) @hook(names="png jpg jpeg gif".split()) def image(f): return Image.load(f) hook(mything.import, text=True, names=["my”]) Note that the last one could actually be a custom Python dialect that adds some new DSL or macro system or whatever; you’d still have to write the hard part of the code that duplicates the normal py importer but adds in a text/token/ast/bytecode transformer or whatever in the middle, but you’d get all the boilerplate (the part that’s very hard to figure out how to write the first time, and uninteresting after that) for free. But the other ones are the real point; they make it trivial to load image resources, etc., because there is no hard part beyond the boilerplate. The import hook effectively does this pseudo code: packagename, basename = split(name) package = load_package(packagename) for hook in registry: filename = f"{basename}.{hook.ext}" try: f = importlib.resources.open_file(hook.text) except ?: continue with f: return hook.callback(f) And now, in any of your other modules, you could do this: # load cheese.png as a Pillow image named cheese from spam.images import cheese # load devconfig.json as a dict named config from spam import devconfig as config # load thing.my as a module named thing from spam import thing The API is just off the top of my head. You could add more arguments to control encoding (otherwise it does the normal importlib thing of looking for a coding declaration and falling back to UTF-8, which could be very wrong for some file types), .pyc caching, sys.modules caching, etc.; I haven’t thought of what might be useful and doable. Anyway, this would still have all the limitations of import hooks. For example, you can’t have a cheese.png and cheese.py or cheese.json in the same package. And using it would have a performance cost (e.g., with all the hooks I added, a failed import is going to take at least 3x as long to fail…). It just means that you don’t have to write and install an import hook, figure out how to use the normal machinery for the parts you’re not trying to change, worry about the annoying error handling stuff (last I checked you have to manually construct the ImportError out of the inner exception while doing some clunky stuff to make it skip over irrelevant machinery in the callstack), etc. Obviously you’d have to require importhooker off PyPI (and it would require 3.7+ or 3.4+ with a resources backport), which isn’t a problem for things you’re distributing on PyPI or packaging up with py2exe or whatever but might be limiting for people who want to create complicated packages and use them right out of the build tree. But if this were useful enough, it could be added into importlib in some future Python version. Anyway, I think this would be a lot more useful than anything that just gives you the raw bytes or text of a resource file. I’m not sure it’s useful enough to be worth building. But it might be fun enough to try even if I can’t think of a good use. :)

Andrew Barnert

9:50 a.m.

New subject: Resource imports (as strings/bytes)

On Jan 20, 2020, at 16:17, Soni L. <fakedme+py@gmail.com> wrote:

...

in any case the whole thing I'm arguing for in this thread, is to *draw parallels* between module imports and resource imports. ppl talk about it like it would be "confusingly similar" but I argue that it would be *non-confusingly* similar instead. because the whole point of this syntax *is* to look similar to the other syntax.

But it doesn’t work like the other syntax. The difference between using an import statement and using an importlib function when importing packages or modules or objects out of modules is that the statement binds things to names, while the function returns values. You want something that returns values, but you want it to look like the statement instead of like the function. That’s the false parallel you’re drawing. The fact that it leads you to do other things like create a statement that has a value and can be used in places that statements can’t be used and so on, that’s all secondary and follows from the root problem. This is why at least two people have suggested an import hook. It would allow you to write `from spam import eggs` to get the contents of spam.eggs bound to a variable named eggs (and cached, unless you deliberately circumvent that). It would be using the import statement as an import statement. I even suggested that if you put the import hook on PyPI and show people using it, you could propose adding it to Python, but you’ve still ignored it. The other thing worth noting is that the function allows you to do more. For example, if I want to read a PNG file, I don’t want to import it as bytes and wrap it in a BytesIO to pass to my image library, I want to get an open file object to pass to my image library. Or, even better, I want to wrap that in a function that gets an open file object, passes it to my image library, closes it, and returns the image object. That’s a two-liner with the function (and it would be easy to add to an import hook), but it couldn’t be done with your syntax.

...

I *want* the parallels to be drawn. I *want* the similarities to be highlighted. this is where using importlib.resources breaks down because it *doesn't* highlight those similarities. how many ppl know about importlib.resources? I

How many people know about any new feature? How many people would know about your new import expression if they had to get Python 3.10 or later and use a future directive until 3.12 to use it? Not many. People have been using pkgutil or setuptools for decades to do this. The next time they look up the resources docs in the PyPA documentation and it tells them to consider using importlib instead if they can require newer Python, they will learn about it. Of course old blogs, StackOverflow answers, etc. will take longer to mention it. But all of this would be the same with your proposal, except that it would all start with 3.10 instead of 3.7, and it would come with more caveats.

...

still have my (large-ish) strings (such as built-in HTML and TOML templates, etc) shoved straight into my code instead of loading them with importlib.resources, altho that's mostly because I haven't refactored it to use importlib.resources.

You’d have to refactor it the same way to use your new import expression. And I don’t know why you haven’t been using pkgutil/setuptools resources to do this since the start. Presumably you never read the PyPA documentation that recommended them. Which means presumably you wouldn’t have read newer PyPA documentation that recommended something new. So you’d discover it after the fact and then have to refactor.

...

The python tutorial doesn't even touch on managing and importing resources. If it had its own syntax, the tutorial would be forced to talk about it.

No it wouldn’t. There’s all kinds of stuff in Python that the tutorial doesn’t cover—even syntax like yield from, yield expressions, async/await, etc. It doesn’t explain namespace packages. It doesn’t explain how to properly structure a package, how to write a setup.py to use with setuptools, etc. It’s just meant to get you started with the language, give you a tour of highlights of other features, and point you to other documentation, not to cover everything you could possibly want to do. And even if everyone agrees that the tutorial should cover some new feature X, that doesn’t happen automatically, someone has to write the new docs and for them into the existing structure and shepherd it through all the annoying bikeshedding. (Look at the venv feature for a good example; I’m pretty sure that included adding a new section to the tutorial, which meant reorganizing the chapter on pip, and Included an argument about where it should go and whether to include forward references to it earlier in the tutorial, etc. Maybe something similar should have been done with the importlib.resources change, but nobody volunteered to do it, much less argued to convince everyone, so nothing changed in the tutorial. As with most changes in Python. In this case, the relevant other documentation is “Distributing Python Modules”. Which is not called out explicitly anywhere, and unfortunately named for anyone who finished the tutorial and wants to learn how to do more complicated stuff with packages if that complicated stuff doesn’t include putting them on PyPI. And, even if you find it, it’s just a stripped-down skeleton of the obsolete docs with pointers to the external PyPA docs. Probably it would be better if the tutorial chapter on modules referenced the PyPA docs and gave some hints of the kinds of things you can do. But it doesn’t. I think a bigger hole in the documentation is that there’s no howto on import hooks, with the result that very few people know how to write them and everyone thinks it’s really difficult. And the fact that most of the examples you can find out there are meant to work with all of 2.7 and early 3.x and modern 3.x (which actually is hard…) doesn’t help. The only docs that explain the point of import hooks are in the PEP that added the 2.x feature, and even if you find that, trying to figure out how to do it in modern Python just by reading through the library reference docs for the helper tools is a nightmare. In a world with ideal docs, you would have already guessed that you could do what you want with an import hook, written it, published it to PyPI, and then come here with either a proposal to add your import hook (which is now being used in the wild) to the standard import system—or an explanation of the unsolveable flaws in that approach and how a new language feature could solve them (a la the @ operator proposal), instead of a vague proposal to “use the import system” without clearly specifying what that means and without an answer to why you can’t just do it with the existing syntax beyond just “if there were syntactic sugar for this function it would be easier”, which is true for every function in the language. And, once one or the other was done, the PyPA docs and maybe even the tutorial in that ideal world would have already included an introduction to importlib.resources, so all you’d have to do is update them to show the simpler new way. In the real world, I still think that’s the right approach to take. It’s much more of a pain, but that doesn’t mean there’s a better answer that isn’t a pain.

Stephen J. Turnbull

9:39 p.m.

New subject: Resource imports (as strings/bytes)

Soni L. writes:

...

in any case the whole thing I'm arguing for in this thread, is to *draw parallels* between module imports and resource imports.

The only parallel I see is that you read a file found on a path. Modules are *special* because they're the only built-in object that normally lives in a file "somewhere" (except for the interpreter itself, which is pretty meta). All the other Python objects live in modules. That's why modules have a keyword for accessing them and binding names (three keywords, in fact!) Because a Python program is composed from a versioned language and an environment and an application and libraries, finding the right module can be complex. Once found, "import" goes and does module-specific things, some of which are also moderately complex. All this complexity is enough to justify three keywords! Other resources are mostly specific to applications. They don't need the complexity of path search all that often (though not all that rarely either). When they do, it's not obvious that the right way to search is going to be the same as searching for a module (certainly you'll use a different path and the test for whether a file appears to contain a valid resource will be resource-specific). Once you identify a file containing a resource, what you do with it will be completely different from the normal import process (unless it's Python code you're intending to import, in which case why not just use an import statement?) Finally, import is a statement because it changes the environment of the program globally. The imported defs and globals become part of the program, and those objects are linked to the program by the top-level code. What other resources do that? It's a matter of style to do that in a statement which can't be included in an expression, but I think that it's a good thing that Python does it that way. I see some parallels, but I'm definitely in the camp of "confusingly similar" rather than "instructively similar", and I definitely don't see a need for a syntax change to enable importing anything inside expressions.

...

The python tutorial doesn't even touch on managing and importing resources.

Why would it? Resource management is an application-level concept which is far more general than Python, and far more diverse than the standard suite of Python objects. It is not a language-level concept. The language manages code and data objects, and the tutorial explains the basics of the language-level facilities for working with them *in Python*. To the extent that objects may be contained in external files, those files are read with open() and the methods on the resulting file object. The tutorial explains those, and even goes on to describe object interchange with JSON, and mention Python-specific data persistence via pickle. That's as far as it should go, IMO. Steve

Steven D'Aprano

3 a.m.

New subject: Resource imports (as strings/bytes)

On Wed, Jan 22, 2020 at 02:39:57PM +0900, Stephen J. Turnbull wrote:

...

I see some parallels, but I'm definitely in the camp of "confusingly similar" rather than "instructively similar", and I definitely don't see a need for a syntax change to enable importing anything inside expressions.

+1 QOTW importlib.resources already provides this functionality in an easy to use functional form. We don't need a weird, unusual magic f-string variant, or even a new form of the import statement for this. -- Steven

1838

Age (days ago)

1840

Last active (days ago)

List overview

Download

9 comments

6 participants

participants (6)

Andrew Barnert
Christopher Barker
David Mertz
Soni L.
Stephen J. Turnbull
Steven D'Aprano

Re: Resource imports (as strings/bytes)

tags

participants (6)