Idea: Importing from arbitrary filenames

Hi all, First of all, please excuse me if I'm presenting this idea in the wrong way or at the wrong time - I'm new to this mailing list and haven't seen anyone propose a new idea on it yet, so I don't know the customs. I have an idea for importing files with arbitrary names. Currently, the "official" way to import arbitrary files is to use the "imp" module, as shown by this answer: https://stackoverflow.com/a/3137914/6605349 However, this method takes two function calls and is not as (aesthetically pleasing? is that the word?) as a simple "import" statement. Therefore, my idea is to allow the "import" statement to accept one of three targets. First, the normal "import": import antigravity which simply imports from sys.path. Second, importing with a string literal specifying the path to a file: import '/home/pi/anti-gravity.py' *as antigravity* Note the "as antigravity" in this statement - this is to avoid ambiguities when choosing the global name to bind to. Should "import '/home/pi/anti-gravity.py'" import to the name "/home/pi/anti-gravity.py", "anti-gravity.py", "anti-gravity", or "anti_gravity"? None of those are really ideal. Therefore, when the import target is a string literal, the statement must include "as NAME". Third, importing with an expression providing a value castable to a string, specifying the path to a file: def file_in_home(filename): return '/home/pi/' + filename import *$*file_in_home('anti-gravity.py') *as antigravity* Once again, for the same reasons, import statements like this must include "as NAME" to avoid ambiguities. Notice that the expression is preceded by a dollar sign ($) to indicate that what follows is an expression rather than a name - imagine a scenario like this: antigravity_file = '/home/pi/anti-gravity.py' import antigravity_file as antigravity Should it look for a sys.path module with the name "antigravity_file" or should it use the value of the variable "antigravity_file"? Looking for the sys.path module first before trying a variable's value would waste processing time and potentially be unexpected behavior. Trying a variable's value first before looking for a sys.path module would be even less expected behavior. Therefore, a dollar sign must come before expression imports to indicate that the import target is an expression. Side note: the dollar sign was chosen because it mimics other languages' conventions of preceding variable names with dollar signs, but any arbitrary character not present at the start of an expression would work. One more thing about expression imports: if the final returned value of the expression is not a string, I believe the statement should raise a TypeError (the same way that __repr__ or __str__ raise TypeError if they return a non-string). Why? If the statement attempted to cast the return value to a string, and the return value's __str__ method raised an error, then should the statement allow the error to pass through, or should it attempt to use a parent class's __str__ method? Allowing the error to pass through would almost certainly be unexpected behavior; attempting to use a parent class's __str__ method would take more time and more processing power (though it would eventually reach "object"'s __str__ method and succeed). Therefore, non-string expression values should raise TypeError. What are your thoughts? Regards , Ken Hilton ;

I'm fairly certain similar changes have been discussed in the past. Someone else can probably find / link / rehash the reasons why imports deliberately use dot notation instead of path? I can think of a few: 1) Portability. dotted imports looked up from sys.path are platform-portable # $HOME/\this\/__init__.py sys.path.append( os.path.expanduser( os.path.join('~', 'this',))) 2) Flexibility. importlib Loaders (3.1+) are abstract; they only know what to do with dotted paths. They can import from the filesystem, zip files, ... git repos, https://docs.python.org/3/library/importlib.html https://docs.python.org/3/library/importlib.html#importlib.abc.SourceLoader https://pypi.org/search/?q=importlib https://docs.python.org/3/library/imp.html#imp.find_module https://docs.python.org/3/library/imp.html#imp.load_module sys.path (`python -m site`) can be configured with: - $PYTHONSTARTUP='~/.pythonrc.py' - modules.py and dirs/__init__.py in site-packages/ - .pth files in site-packages/ - idempotent sys.path config at the top of a .py source file - sys.USER_SITE in sys.path - ~/.local/lib/python*X.Y*/site-packages - ~/Library/Python/*X.Y*/lib/python/site-packages - *%APPDATA%*\Python\Python*XY*\site-packages - https://docs.python.org/3/library/site.html - https://docs.python.org/3/using/cmdline.html#envvar-PYTHONSTARTUP - https://docs.python.org/3/using/cmdline.html#envvar-PYTHONUSERBASE - https://docs.python.org/3/library/site.html#site.USER_SITE Is there a good write-up of how, where, and in what order sys.path is configured, by default, in Python? TLDR dotted names are preferable for sharing code with people who don't have the same paths, os.pathsep, or os.platform_info. (Stevedore and Jupyter Notebook take different approaches to handling plugins, if that's your use case?) Though I could just be arguing for the status quo; there are probably good reasons to consider changing EVERYTHING On Friday, April 13, 2018, Ken Hilton <kenlhilton@gmail.com> wrote:

On 14 April 2018 at 13:28, Ken Hilton <kenlhilton@gmail.com> wrote:
Modules aren't required to be stored on the filesystem, so we have no plans to offer this. `runpy.run_path()` exists to let folks run arbitrary Python files and collect the resulting namespace, while if folks really want to implement pseudo-imports based on filenames we expose the necessary building blocks in importlib (https://docs.python.org/3/library/importlib.html#importing-a-source-file-dir...) The fact that run_path() has a nice straightforward invocation model, and the import emulation recipe doesn't is intended as a hint :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 14/04/2018 06:27, Nick Coghlan wrote:
I generally love the current import system for "just working" regardless of platform, installation details, etc., but what I would like to see is a clear import local, (as opposed to import from wherever you can find something to satisfy mechanism). This is the one thing that I miss from C/C++ where #include <x> is system includes and #include "x" search differing include paths, (if used well). -- Steve (Gadget) Barnes Any opinions in this message are my personal opinions and do not reflect those of my employer. --- This email has been checked for viruses by AVG. http://www.avg.com

On 14 April 2018 at 19:22, Steve Barnes <gadgetsteve@live.co.uk> wrote:
For the latter purpose, we prefer that folks use either explicit relative imports (if they want to search the current package specifically), or else direct manipulation of package.__path__. That is, if you do: from . import custom_imports # Definitely from your own project custom_imports.__path__[:] = (some_directory, some_other_directory) then: from .custom_imports import name will search those directories for packages & modules to import, while still cleanly mapping to a well-defined location in the module namespace for the process as a whole (and hence being able to use all the same caches as other imports, without causing name conflicts or other problems). If you want to do this dynamically relative to the current module, then it's possible to do: global __path__ __path__[:] = (some_directory, some_other_directory) custom_mod = importlib.import_module(".name", package=__name__) The discoverability of these kinds of techniques could definitely stand to be improved, but the benefit of adopting them is that they work on all currently supported versions of Python (even importlib.import_module exists in Python 2.7 as a convenience wrapper around __import__), rather than needing to wait for new language level syntax for them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 15/04/2018 08:12, Nick Coghlan wrote:
Thanks Nick, As you say not too discoverable at the moment - I have just reread PEP328 & https://docs.python.org/3/library/importlib.html but did not find any mention of these mechanisms or even that setting an external __path__ variable existed as a possibility. Maybe a documentation enhancement proposal would be in order? -- Steve (Gadget) Barnes Any opinions in this message are my personal opinions and do not reflect those of my employer. --- This email has been checked for viruses by AVG. http://www.avg.com

On 16 April 2018 at 03:45, Steve Barnes <gadgetsteve@live.co.uk> wrote:
Yeah, the fact that "packages are ultimately just modules with a __path__ attribute that works like sys.path" tends to get obscured by the close association between package hierarchies and file system layouts in the default filesystem importer. The docs for that are all the way back in PEP 302: https://www.python.org/dev/peps/pep-0302/#packages-and-the-role-of-path
Maybe a documentation enhancement proposal would be in order?
If we're not covering explicit __path__ manipulation anywhere, we should definitely mention that possibility. https://docs.python.org/3/library/pkgutil.html#pkgutil.extend_path does talk about it, but only in the context of scanning sys.path for matching names, not in the context of building a package from an arbitrary set of directory names. I'm not sure where we could put an explanation of some of the broader implications of that fact, though - while __path__ manipulation is usually fairly safe, we're always a little hesitant about encouraging too many dynamic modifications to the import system state, since it can sometimes have odd side effects based on whether imports happen before or after that state is adjusted.. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 16 April 2018 at 17:22, Nick Coghlan <ncoghlan@gmail.com> wrote:
It's quite possible that we're not.
One of the problems with PEP 302 was that there was no really good place in the documentation to put all the information that was present (certainly not in the version of the docs that was around when we wrote it). So a lot of the important details remained buried in PEP 302. Since then, a lot of the details ended up in the docs, mostly in the importlib sections, but I don't recall ever seeing anything about __path__ (and particularly not the nice summary you gave, "packages are ultimately just modules with a __path__ attribute that works like sys.path". Paul

The documentation is pretty opaque or non-existent on other aspects of importlib use, too. If I enable warnings, I see this (and many more like it). I've read PEP 302 a couple times, read the code in importlib that detects the warning and searched down several rabbit holes, only to come up empty... T:\Python36\lib\importlib\_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__ My thoughts when I see it: "Ok. So what does that mean? Is it bad? It must be bad, otherwise I wouldn't get a warning. How do I reconcile __spec__ and __package__? Which one is missing and/or incorrect?" On Mon, Apr 16, 2018 at 9:36 AM, Paul Moore <p.f.moore@gmail.com> wrote:

On Mon, 16 Apr 2018 at 09:58 Eric Fahlgren <ericfahlgren@gmail.com> wrote:
The documentation is pretty opaque or non-existent on other aspects of importlib use, too.
Well, we are diving into the dark corners of import here. (Details can be found in the language reference: https://docs.python.org/3/reference/import.html).
It means that the mechanisms import typically uses to calculate the importing module's name in order to resolve relative imports wasn't where it should be, and so we fell back to the Python 2 way of doing it.
Is it bad?
Eh, it isn't ideal. ;)
It must be bad, otherwise I wouldn't get a warning. How do I reconcile __spec__ and __package__?
You should be setting __spec__.parent, but we will fall back to __package__ if that doesn't exist (and raise a different warning). :)
Which one is missing and/or incorrect?"
Both are missing. :) -Brett

On Mon, Apr 16, 2018 at 10:23 AM, Brett Cannon <brett@python.org> wrote:
Thanks, Brett, I'll read through that and see where I get. Those corners /are/ pretty dark. The backstory is that I'm doing the final port from Py2 to Py3 (it's been a long time coming, mostly years of waiting for extension modules to get ported, notably wxPython and VTK). In Py2, all warnings were enabled and disallowed, so big surprise on first run, hundreds of lines of the aforementioned one and "ImportWarning: __package__ != __spec__.parent". We have manually defined "__package__" all over the place, for reasons lost in the fog of time, which I believe to be the culprit for the latter warning. Eric

On 4/13/2018 11:28 PM, Ken Hilton wrote:
Alex Martelli intentionally put that in quotes.
way to import arbitrary files is to use the "imp" module, as shown by this answer: https://stackoverflow.com/a/3137914/6605349
Read the first comment -- the above is deprecated. There was always the __import__(name) function, but importlib.import_module is recommended now.
However, this method takes two function calls and is not as (aesthetically pleasing? is that the word?) as a simple "import" statement.
Only one is needed for most purposes. importlib has separate find and load functions, which are used by 'import', and which are available to those who need them.
Second, importing with a string literal specifying the path to a file:
import '/home/pi/anti-gravity.py' *as antigravity*
antigravity = import_module('/home/pi/anti-gravity.py') -- Terry Jan Reedy

I'm fairly certain similar changes have been discussed in the past. Someone else can probably find / link / rehash the reasons why imports deliberately use dot notation instead of path? I can think of a few: 1) Portability. dotted imports looked up from sys.path are platform-portable # $HOME/\this\/__init__.py sys.path.append( os.path.expanduser( os.path.join('~', 'this',))) 2) Flexibility. importlib Loaders (3.1+) are abstract; they only know what to do with dotted paths. They can import from the filesystem, zip files, ... git repos, https://docs.python.org/3/library/importlib.html https://docs.python.org/3/library/importlib.html#importlib.abc.SourceLoader https://pypi.org/search/?q=importlib https://docs.python.org/3/library/imp.html#imp.find_module https://docs.python.org/3/library/imp.html#imp.load_module sys.path (`python -m site`) can be configured with: - $PYTHONSTARTUP='~/.pythonrc.py' - modules.py and dirs/__init__.py in site-packages/ - .pth files in site-packages/ - idempotent sys.path config at the top of a .py source file - sys.USER_SITE in sys.path - ~/.local/lib/python*X.Y*/site-packages - ~/Library/Python/*X.Y*/lib/python/site-packages - *%APPDATA%*\Python\Python*XY*\site-packages - https://docs.python.org/3/library/site.html - https://docs.python.org/3/using/cmdline.html#envvar-PYTHONSTARTUP - https://docs.python.org/3/using/cmdline.html#envvar-PYTHONUSERBASE - https://docs.python.org/3/library/site.html#site.USER_SITE Is there a good write-up of how, where, and in what order sys.path is configured, by default, in Python? TLDR dotted names are preferable for sharing code with people who don't have the same paths, os.pathsep, or os.platform_info. (Stevedore and Jupyter Notebook take different approaches to handling plugins, if that's your use case?) Though I could just be arguing for the status quo; there are probably good reasons to consider changing EVERYTHING On Friday, April 13, 2018, Ken Hilton <kenlhilton@gmail.com> wrote:

On 14 April 2018 at 13:28, Ken Hilton <kenlhilton@gmail.com> wrote:
Modules aren't required to be stored on the filesystem, so we have no plans to offer this. `runpy.run_path()` exists to let folks run arbitrary Python files and collect the resulting namespace, while if folks really want to implement pseudo-imports based on filenames we expose the necessary building blocks in importlib (https://docs.python.org/3/library/importlib.html#importing-a-source-file-dir...) The fact that run_path() has a nice straightforward invocation model, and the import emulation recipe doesn't is intended as a hint :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 14/04/2018 06:27, Nick Coghlan wrote:
I generally love the current import system for "just working" regardless of platform, installation details, etc., but what I would like to see is a clear import local, (as opposed to import from wherever you can find something to satisfy mechanism). This is the one thing that I miss from C/C++ where #include <x> is system includes and #include "x" search differing include paths, (if used well). -- Steve (Gadget) Barnes Any opinions in this message are my personal opinions and do not reflect those of my employer. --- This email has been checked for viruses by AVG. http://www.avg.com

On 14 April 2018 at 19:22, Steve Barnes <gadgetsteve@live.co.uk> wrote:
For the latter purpose, we prefer that folks use either explicit relative imports (if they want to search the current package specifically), or else direct manipulation of package.__path__. That is, if you do: from . import custom_imports # Definitely from your own project custom_imports.__path__[:] = (some_directory, some_other_directory) then: from .custom_imports import name will search those directories for packages & modules to import, while still cleanly mapping to a well-defined location in the module namespace for the process as a whole (and hence being able to use all the same caches as other imports, without causing name conflicts or other problems). If you want to do this dynamically relative to the current module, then it's possible to do: global __path__ __path__[:] = (some_directory, some_other_directory) custom_mod = importlib.import_module(".name", package=__name__) The discoverability of these kinds of techniques could definitely stand to be improved, but the benefit of adopting them is that they work on all currently supported versions of Python (even importlib.import_module exists in Python 2.7 as a convenience wrapper around __import__), rather than needing to wait for new language level syntax for them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 15/04/2018 08:12, Nick Coghlan wrote:
Thanks Nick, As you say not too discoverable at the moment - I have just reread PEP328 & https://docs.python.org/3/library/importlib.html but did not find any mention of these mechanisms or even that setting an external __path__ variable existed as a possibility. Maybe a documentation enhancement proposal would be in order? -- Steve (Gadget) Barnes Any opinions in this message are my personal opinions and do not reflect those of my employer. --- This email has been checked for viruses by AVG. http://www.avg.com

On 16 April 2018 at 03:45, Steve Barnes <gadgetsteve@live.co.uk> wrote:
Yeah, the fact that "packages are ultimately just modules with a __path__ attribute that works like sys.path" tends to get obscured by the close association between package hierarchies and file system layouts in the default filesystem importer. The docs for that are all the way back in PEP 302: https://www.python.org/dev/peps/pep-0302/#packages-and-the-role-of-path
Maybe a documentation enhancement proposal would be in order?
If we're not covering explicit __path__ manipulation anywhere, we should definitely mention that possibility. https://docs.python.org/3/library/pkgutil.html#pkgutil.extend_path does talk about it, but only in the context of scanning sys.path for matching names, not in the context of building a package from an arbitrary set of directory names. I'm not sure where we could put an explanation of some of the broader implications of that fact, though - while __path__ manipulation is usually fairly safe, we're always a little hesitant about encouraging too many dynamic modifications to the import system state, since it can sometimes have odd side effects based on whether imports happen before or after that state is adjusted.. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 16 April 2018 at 17:22, Nick Coghlan <ncoghlan@gmail.com> wrote:
It's quite possible that we're not.
One of the problems with PEP 302 was that there was no really good place in the documentation to put all the information that was present (certainly not in the version of the docs that was around when we wrote it). So a lot of the important details remained buried in PEP 302. Since then, a lot of the details ended up in the docs, mostly in the importlib sections, but I don't recall ever seeing anything about __path__ (and particularly not the nice summary you gave, "packages are ultimately just modules with a __path__ attribute that works like sys.path". Paul

The documentation is pretty opaque or non-existent on other aspects of importlib use, too. If I enable warnings, I see this (and many more like it). I've read PEP 302 a couple times, read the code in importlib that detects the warning and searched down several rabbit holes, only to come up empty... T:\Python36\lib\importlib\_bootstrap.py:219: ImportWarning: can't resolve package from __spec__ or __package__, falling back on __name__ and __path__ My thoughts when I see it: "Ok. So what does that mean? Is it bad? It must be bad, otherwise I wouldn't get a warning. How do I reconcile __spec__ and __package__? Which one is missing and/or incorrect?" On Mon, Apr 16, 2018 at 9:36 AM, Paul Moore <p.f.moore@gmail.com> wrote:

On Mon, 16 Apr 2018 at 09:58 Eric Fahlgren <ericfahlgren@gmail.com> wrote:
The documentation is pretty opaque or non-existent on other aspects of importlib use, too.
Well, we are diving into the dark corners of import here. (Details can be found in the language reference: https://docs.python.org/3/reference/import.html).
It means that the mechanisms import typically uses to calculate the importing module's name in order to resolve relative imports wasn't where it should be, and so we fell back to the Python 2 way of doing it.
Is it bad?
Eh, it isn't ideal. ;)
It must be bad, otherwise I wouldn't get a warning. How do I reconcile __spec__ and __package__?
You should be setting __spec__.parent, but we will fall back to __package__ if that doesn't exist (and raise a different warning). :)
Which one is missing and/or incorrect?"
Both are missing. :) -Brett

On Mon, Apr 16, 2018 at 10:23 AM, Brett Cannon <brett@python.org> wrote:
Thanks, Brett, I'll read through that and see where I get. Those corners /are/ pretty dark. The backstory is that I'm doing the final port from Py2 to Py3 (it's been a long time coming, mostly years of waiting for extension modules to get ported, notably wxPython and VTK). In Py2, all warnings were enabled and disallowed, so big surprise on first run, hundreds of lines of the aforementioned one and "ImportWarning: __package__ != __spec__.parent". We have manually defined "__package__" all over the place, for reasons lost in the fog of time, which I believe to be the culprit for the latter warning. Eric

On 4/13/2018 11:28 PM, Ken Hilton wrote:
Alex Martelli intentionally put that in quotes.
way to import arbitrary files is to use the "imp" module, as shown by this answer: https://stackoverflow.com/a/3137914/6605349
Read the first comment -- the above is deprecated. There was always the __import__(name) function, but importlib.import_module is recommended now.
However, this method takes two function calls and is not as (aesthetically pleasing? is that the word?) as a simple "import" statement.
Only one is needed for most purposes. importlib has separate find and load functions, which are used by 'import', and which are available to those who need them.
Second, importing with a string literal specifying the path to a file:
import '/home/pi/anti-gravity.py' *as antigravity*
antigravity = import_module('/home/pi/anti-gravity.py') -- Terry Jan Reedy
participants (8)
-
Brett Cannon
-
Eric Fahlgren
-
Ken Hilton
-
Nick Coghlan
-
Paul Moore
-
Steve Barnes
-
Terry Reedy
-
Wes Turner