python -m quality of life improvements

currently python -m requires you to cwd to the desired package root. I'd like to suggest the ability to python -m relative/path/to/package/root/module.submodule and python -m /absolute/path/to/package/root/module.submodule thoughts?

On Fri, Jan 10, 2020 at 11:53:10PM -0300, Soni L. wrote:
What do you mean? `python -m` ought to run any module in the path, regardless of whether it is a .py file or a package. (Or for that matter, a .pyc file.) If `import spam.eggs` will find it, `python -m spam.eggs` ought to run it. If it doesn't, that's a bug. The only tricky thing is that to run a top level package, you need `__main__.py` as the entry point, not `__init__.py`. But running submodules of a package should just work. To test it, I creates a package "spam" on the PYTHONPATH: spam +-- __init__.py +-- __main__.py +-- eggs.py where `__init__.py` is an empty file, and the other two contain: if __name__ == '__main__': import __main__ print(__main__.__file__) the cd'ed to a location off the PYTHONPATH, and ran these: [steve@ando tmp]$ python3.5 -m spam.eggs /home/steve/python/spam/eggs.py [steve@ando tmp]$ python3.5 -m spam /home/steve/python/spam/__main__.py Have I misunderstood what you are trying to describe? -- Steven

On Fri, Jan 10, 2020 at 11:53:10PM -0300, Soni L. wrote:
Oh, a further thought comes to mind... if your modules aren't on the PYTHONPATH, just drop the -m and you can execute any legal Python file, even if the name isn't a legal identifier. [steve@ando tmp]$ cp ~/python/spam/eggs.py /tmp/"some name".foo [steve@ando tmp]$ python3.5 "some name.foo" some name.foo So as I understand it, the functionality you want already exists, both for running scripts on the path using -m and scripts with arbitrary file names without -m. Have I missed some subtlety in your proposal? -- Steven

On Jan 10, 2020, at 20:40, Steven D'Aprano <steve@pearwood.info> wrote:
Well, there are definitely subtle differences. If it’s a single file rather than a package, running it as a script means you have to include the extension; with -m you can’t. If you -m a package, argv[0] is the path to its __main__.py; if you run a package as a script it’s the path to the package. If you -m the parent directory of the package obviously has to be on sys.path, but if you run it as a script, it generally won’t be. There might also be differences with which paths (argv[0], __file__, __cached__, etc.) get abspathed on each platform (I can never remember the rules for that anyway). Do they both work with namespace packages? Fire the same auditing events? Interact the same way with the Windows py.exe launcher and shbang lines? Does -m ignore case on *nix on case-insensitive filesystems? What if you’ve got weird stuff in your metapath from a site-installed import hook? I don’t know the answers for most of these. Of course part of the reason I don’t know the answers is that I don’t think I’ve ever had a case where any of these differences mattered. But I could imagine there might be one. But really, 80% of the time, people who are asking for anything like this actually have a perfect use case for building a setuptools-installable package (maybe even with a generated entry-point script) and just don’t know it. A lot of people think it’s really hard if they’ve never done it, or that it’s only appropriate for stuff you want to publish on PyPI, or whatever. Anyway, I don’t really like the proposal as given even if there is a use for it. Mixing the module hierarchy and the file system hierarchy like that seems like a recipe for confusion—if not in the interpreter (and the shell’s tab completer) then at least in the human reader. Is a\b.c/d.e module e in package d in directory a\b.c\, or module e in (invalid) package c/d in package b in directory a\, or what? In this case, I think only one of the ambiguous ways you could read that is actually valid, but surely the rule can’t be “parse it every way and pick one of the valid ones arbitrarily”. (And imagine the confusion if your package has a module named py or pyw.) Wouldn’t it be simpler to just use a separate arg to add a path to the sys.path, or to cd to, or to use in place of sys.path for finding the main module? (Without knowing the intended use I don’t know which of those is appropriate, but I’m guessing one of them is.)

On Sat, Jan 11, 2020, at 02:06, Andrew Barnert via Python-ideas wrote:
speaking of executing packages from the command line... I noticed a different set of anomalies (I was going to suggest this as a solution or workaround, but it turned out to be non-viable). If you specify a directory name as the primary script argument to python *without* -m: - __main__.py will be executed - argv[0] will be the dir name as-is, not a path to __main__.py - sys.path[0] is the given directory - __package__ is *the empty string*. [under normal circumstances for a module not in a package, __package__ is None. However, the empty string here does not allow from . import to work] I expected: - __main__.py will be executed - I had no particular expectations regarding argv[0], but I believe this turns out to be *only* circumstance under which it can name something other than a python file - sys.path[0] to be the parent directory of the given directory - __package__ to be the name of the directory Also note that executing a zip file on the command line behaves similarly to the actual behavior described above, and there is no way to achieve behavior analogous to what I expected because there is no clear way to treat a zip file as a package. I have not considered the implications regarding zip files further.

On Fri, Jan 10, 2020 at 11:06:25PM -0800, Andrew Barnert wrote:
On Jan 10, 2020, at 20:40, Steven D'Aprano <steve@pearwood.info> wrote:
[...]
Have I missed some subtlety in your proposal?
Well, there are definitely subtle differences.
I'm sure there are, but it isn't clear to me that the proposed enhancement has anything to do with the differences you mention.
If it’s a single file rather than a package, running it as a script means you have to include the extension; with -m you can’t.
Well, sure, but when running as a script you are specifying a location, and if you leave the extension off (assuming the script has one, which it may not) then you are specifying a different, and probably non-existing, location. When using -m, you are telling Python to search for and run the named module, and you have to use the same rules for legal module names as you have to use inside the interpreter. And that means no extensions: import math # not math.dll I'm not concerned that the signatures are different: python pathname # run the entity specified by the pathname python -m module # run the module specified by the identifier but whether between those two mechanisms (pick one or the other) it solves Soni's issue. I think they might. Soni suggested that you can't run a submodule in a package unless you cd into the package, but I think that's wrong. I think that between the two mechanisms for calling modules/packages/submodules, and the "__main__.py" entry point to packages, this should cover pretty much all the obvious use-cases for calling Python modules from the command line. (Modulo any potential bugs in the implementation.) I'm not worried about all the methods using precisely the same interface, nor am I worried about subtle differences in the way the various special variables argv[0], `__file__` etc are filled in. (Those differences can be used by scripts/modules etc to identify how they are called, in case they care.) If Soni is worried about those things, they should say so, but I suspect that they merely don't know about the `__main__.py` entry point. Or perhaps I have completely misunderstood their proposal and I'm talking nonsense.
Wouldn’t it be simpler to just use a separate arg to add a path to the sys.path, or to cd to
Or we can learn to use your shell effectively. Does Python need to offer equivalent ways of doing everything the OS shell provides? I hope not :-) In POSIX systems you can just set the PYTHONPATH for a single invocation of the interpreter: PYTHONPATH="/tmp" python whatever As for the "cd to" option, you can get that with parentheses: [steve@ando tmp]$ pwd /tmp [steve@ando tmp]$ (cd /; python3.5 -c "import os; print(os.getcwd())") / [steve@ando tmp]$ I assume Windows has equivalent features to the above. If not, then perhaps we ought to consider what you say, but otherwise, let's not duplicate shell features in the interpreter if we don't have a good reason to do so. -- Steven

On 2020-01-11 4:06 a.m., Andrew Barnert via Python-ideas wrote:
PYTHONPATH=foo/bar python -m baz.qux becomes python -m foo/bar/baz.qux which is less of a kludge. and baz.qux can still import stuff on the foo/bar "package root", so e.g. it could "import baz" or "import something_else_entirely". This is for development (or running things straight off the dev environment), not deployment.

On Sat, Jan 11, 2020 at 11:27:51AM -0300, Soni L. wrote:
Sorry Soni, I completely disagree with you. The status quo `PYTHONPATH=foo/bar python -m baz.qux` is explicit about changing the PYTHONPATH and it uses a common, standard shell feature. This takes two well-designed components that work well, and can be understood in isolation, and plugging them together. The first part of the command explicitly sets the PYTHONPATH, the second part of the command searches the PYTHONPATH for the named module. Far from being a kludge, I think this is elegant, effective design. It seems to me that your proposed syntax is the kludge: it mixes pathnames and module identifiers into a complex, potentially ambiguous "half path, half module spec" hybrid: foo/bar/baz.qux * foo/bar/ is a pathname * baz.qux is a fully-qualified module identifier, not a file name The reader has to read that and remember that even though it looks exactly like a pathname, it isn't, it does not refer to the file "baz.qux" in directory "foo/bar/". It means: * temporarily add "foo/bar/" to the PYTHONPATH * find package "baz" (which could be anywhere in the PYTHONPATH) * run the module baz.qux (which might not be qux.py) -- Steven

I just want python foo/bar/baz/qux/__main__.py but with imports that actually work. -m works, but requires you to cd. -m with path would be an more than huge improvement. and it absolutely should look for the given module in the given path. not "anywhere in the PYTHONPATH". On 2020-01-11 2:21 p.m., Steven D'Aprano wrote:

Soni, Perhaps what you're looking for is available by writing a short Python program with a shebang? Then PYTHONPATH would be set to the directory of the program (many small projects include a `run.py` in the project's base directory). You can also place the program in ~/bin if it does `export PYTHONPATH`. Then, I have this alias for one of my home-brewed tools, and it works as I want: alias chubby='PYTHONPATH=~/chubby ~/.virtualenvs/chubby/bin/python -Oum chubby' I too think that the semantics of `python -m` are fine. On Sat, Jan 11, 2020 at 1:46 PM Soni L. <fakedme+py@gmail.com> wrote:
-- Juancarlo *Añez*

why are we allowed to have fancy `python /path/to/foo.py` but not fancy `python -m /path/to/foo`? if `python` was capable of detecting modules and automatically deciding package roots, none of this would even be an argument and I'd just use `python /path/to/module/submodule/__main__.py` (with "module" having an __init__.py) and be done with it. but python can't do that because backwards compatibility and whatnot. so I propose we shove the solution into python -m instead. why's that so bad? it's simply ergonomics. On 2020-01-11 6:28 p.m., Juancarlo Añez wrote:

Soni, Others have explained it already. `python -m` expects a _module_ as parameter, and that module is searched by the rules `import` follows under `PYTHONPATH`. What you're asking for is that `python` sets `PYTHONPATH` before executing a module. Maybe another option to `python`? python -p /path/to -m foo I would agree that would be nice. On Sat, Jan 11, 2020 at 6:07 PM Soni L. <fakedme+py@gmail.com> wrote:
-- Juancarlo *Añez*

except I don't want to override PYTHONPATH. when you run a module with "python -m", it uses "." as one of the path elements. when you run a script with "python" it *doesn't use "." as one of the path elements*, instead replacing it with the path to the script. ideally "python -m" would also be able to check that you're running what you think you're running. maybe "python -m module.submodule@/path"? and then it'd check that "/path/module/submodule.py" or "/path/module/submodule/__main__.py" exists. and use "/path" instead of "." in the sys.path. I want a "python -m" clone with "python" semantics, basically. it makes development easier all around. and "python -m" provides a much nicer project structure than "python" IMO and I'd like to encourage ppl to switch their "python" projects to "python -m" projects. On 2020-01-11 7:28 p.m., Juancarlo Añez wrote:

Hi Soni, For the last time, `python -m` takes a _module identifier_ as argument, not a path. For a module identifier to make sense, the `PYTHONPATH` must be set, or assumed. Remember that you're free to write your own `mypython` script which takes whatever arguments you want, with whatever actions you need. Cheers! On Sat, Jan 11, 2020 at 7:02 PM Soni L. <fakedme+py@gmail.com> wrote:
-- Juancarlo *Añez*

On Jan 11, 2020, at 14:09, Soni L. <fakedme+py@gmail.com> wrote:
why are we allowed to have fancy `python /path/to/foo.py` but not fancy `python -m /path/to/foo`?
There’s nothing fancy about the first one. It’s a path, and it’s up to your OS what a path means. It’s exactly the same as passing a path to sh or cmd or ffmpeg or any other tool. The argument to -m is also not fancy, because it’s not a path at all, it’s a module name. Which means Python searches the PYTHONPATH for a module with that name. What you’re asking for is something that’s a path up to some point (even though your shell can’t tell) and then switches to being a module name at some unspecified point in the middle. For your trivial case it’s obvious where you want that point to be, but what’s the rule that handles non-trivial cases that aren’t obvious? And, even if there is a rule that’s easily understandable and unambiguous (to humans, to Python, to shell completers, etc.), why is that better than having separate arguments for the path and the module name in the first place?
if `python` was capable of detecting modules and automatically deciding package roots, none of this would even be an argument and I'd just use `python /path/to/module/submodule/__main__.py` (with "module" having an __init__.py) and be done with it. but python can't do that because backwards compatibility and whatnot.
It’s nothing to do with backward compatibility. Any directory is a package. (There are also packages that aren’t even directories, but ignore that extra complexity here.) If it doesn’t have an __init__.py, it’s a namespace package, which is still a package. That same path is just as validly interpreted five different ways, from path.to.module.submodule.__main__ in directory / to __main__ in directory /path/to/module/submodule. What possible rule could tell Python which of those five you’re intending, short of reading your mind? A human can do some limited mind reading. Usually you don’t have modules named ”__main__” except as the main script of a package; “module” sounds like a good name for a module but “to” doesn’t; people rarely use hierarchies more than 2 or 3 deep for packages but often do use deep hierarchies for filesystem paths; etc. But even a human can’t guess for a case like spam/eggs/cheese.py which one of those is the root. A solution that made it easier to tell Python that something is a package would solve your problem without requiring magic telepathy, and without breaking modules. The obvious way to do that is to put its parent directory on the PYTHONPATH. I don’t understand why you don’t like that, but I suppose you could just as easily have a mechanism that says “this directory is a package just as if its parent were on the PYTHONPATH even though it isn’t”. But I don’t see why making Python and the reader and the shell split a hybrid path into hierarchical fs path and hierarchical module path according to some rule you can’t even tell us is supposed to make that any easier for anyone.

On 2020-01-11 8:33 p.m., Andrew Barnert wrote:
algorithm for -m idea: def find_package_root_and_module_name(argm: Path): return (argm.parent, argm.name) algorithm for non -m idea: def find_package_root_and_module_name(argv0: Path): argv0 = argv0.resolve() package_root = argv0.parent # or however we resolve the package_root today if argv0.suffix != "py": return (package_root, None) module_name = argv0.stem # may be __main__! while package_root.exists(): if (package_root / '__init__.py').exists(): module_name = f"{package_root.name}.{module_name}" package_root = package_root.parent else: return (package_root, module_name) raise ??? # this means every directory up to the root/drive/filesystem root/whatever is a package. there is no package root so just raise.

On Sat, Jan 11, 2020 at 02:46:14PM -0300, Soni L. wrote:
That's what you said in your first post. In my first response, I said it works for me. Unless I have misunderstood you, I think you are mistaken about needing to cd into the package root. Can you give a clear (and simple) example of a package in the PYTHONPATH where python -m package.module doesn't work?
and it absolutely should look for the given module in the given path. not "anywhere in the PYTHONPATH".
If you know the precise location of the module, and you don't want to search the PYTHONPATH, why are you using -m? `python -m module` is for searching the PYTHONPATH when you don't know or care precisely where module is located. Think of it as a hammer. `python filename` is for running the module when you do know and care precisely which file you are running. Think of it as a screwdriver. You seem to be asking to weld a screwdriver head to a hammer so that you can have the hammer behave like a screwdriver. If you know the precise path you want to run, why are you using # your proposed syntax python -m spam/eggs/cheese.aardvark when you could just as easily run this and get the effect you want? python spam/eggs/cheese/aardvark.py This is not a rhetorical question. As far as I can tell from your explanation so far, what you want to do is possible *right now* if you just stop typing `-m` after the `python` command and use the right tool for the job. If you see a difference that I don't, please explain what that difference is. -- Steven

On 2020-01-11 9:01 p.m., Steven D'Aprano wrote:
but those are *not* equivalent. the first one has spam/eggs/ as the package root, with cheese.aardvark as the module (the thing passed to "import" (except with the __main__ hacks that I don't wanna go into)). the second one has spam/eggs/cheese/ as the package root, and "aardvark" isn't even loaded as a module! if you did "import cheese.aardvark" it'd fail, and if you did "import .aardvark" it would load the module twice, once under "__main__" and once under "aardvark".

On Sun, Jan 12, 2020 at 11:10 AM Steven D'Aprano <steve@pearwood.info> wrote:
The biggest difference is that scripts can't do relative imports. So here's a counter-proposal: Allow "from . import modulename" to import "modulename.py" from the directory that contains the script that Python first executed (or, for interactive Python, the current directory as Python started). ChrisA

Understanding "script" as a free standing ".py"... I loved your suggestion because of it's convenience, but I worry it could be a security hole. I have scripts in ~/bin, in ./bin, and in ./scripts. And for ./scripts, it would be most useful if `from .. import` was allowed, so there was no more `import sys; sys.path.insert(0, '.'). I believe the above is addressed on a PEP (which number I don't remember). The discussion that has followed validates the OP's concern (even if the originally proposed solution is not right). It's hard to write scripts in Python that have access to the modules they intuitively should have access to, without patching (`sys.path.insert(...)` is awful) -- Juancarlo *Añez*

On Sun, Jan 12, 2020 at 12:30 PM Juancarlo Añez <apalala@gmail.com> wrote:
How is this a security hole? If anything, it's LESS of a security hole. Consider: $ echo 'print("You got p0wned")' >re.py $ echo 'import re' >demo.py $ python3 demo.py You got p0wned That can ALREADY happen, and it's possible for a script to shadow something from the stdlib. You can even trigger this indirectly - for instance, attempting to import argparse will cause re to be imported. Using "from . import modulename" implies that you specifically do NOT want to load up a module based on sys.path, but are looking for one in the current package's directory. That almost certainly means that such a file will indeed exist, so there's a few attack surfaces closed off. (For the record, I'm not complaining about the status quo, nor proposing this as any sort of "solution" to a "problem". Just saying that it isn't creating any problem that doesn't already exist.) ChrisA

On Sun, Jan 12, 2020 at 11:59:20AM +1100, Chris Angelico wrote:
The biggest difference is that scripts can't do relative imports.
How is that relevent? People keep mentioning minor differences between different ways of executing different kinds of entities (scripts, packages, submodules etc) but not why those differences are important or why they would justify any change in the way -m works. A script can't do relative imports because they aren't packages and so don't have any submodules that they could do a relative import of. If relative imports are important to you, why not use a package? (I don't want to keep saying this, so I will say it only once: none of my questions in this post are rhetorical questions.) Or have the script adjust the path itself. Say I want to package up a bunch of related scripts and libraries, so I create this structure: myscripts/ +-- spam.py +-- eggs.py +-- cheese.py but I want spam to be able to import eggs and cheese, and vice versa, the obvious solutions (in order of decreasing obviousness) are: * Just use a package. * Set the PYTHONPATH from the command line: PYTHONPATH='myscripts/' python -m spam * Have each module insert its parent directory on the path: import pathlib, sys sys.path.insert(0, pathlib.Path(__file__).parent.absolute()) There may be pros and cons of each approach, there may even be slight differences in behaviour, but is there some reason why none of these are satisfactory and the only solution is special support from the interpreter?
I wish people wouldn't jump straight to proposing solutions before they have articulated the specific nature of the problem they wish to solve first :-( (This isn't aimed *specifically* at Chris, it is a general observation.) Even if the problem is blindingly obvious to (generic) you, it may not be obvious to all of us. We shouldn't get twenty posts into a thread and have the problem being solved still be unclear. Why is it necessary to use a relative import in a non-package script? Is this a way to solve name collisions? "I want a script spam.py that depends on a module X.py, but X clashes with a standard library module, and I need both the stdlib X and the local X." I think a dotted import would work (but so would renaming X) and the obvious way to make the dotted import work is to put spam.py and X.py into a package. -- Steven

On Sun, Jan 12, 2020 at 6:38 PM Steven D'Aprano <steve@pearwood.info> wrote:
Have done so. The net result is often that there's a shim script that has no purpose but to adjust the path and import a module, and an extra directory level between that and the "real" code.
The problem is that there's a stark division between scripts and modules, and a directory is kinda a package, but kinda not. Compare: On Sun, Jan 12, 2020 at 10:34 AM Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Any directory is a package. ... If it doesn’t have an __init__.py, it’s a namespace package, which is still a package.
So if a directory is a package, but a script in a directory isn't a module in a package, what gives? Wouldn't it be cleaner if a script in a directory could do a relative import, given that directories ARE capable of behaving as packages?
Fair point. Hopefully I've clarified it above.
It's about clarity and potential forward compatibility. Imagine if someone had a "secrets.py" containing their encryption keys and whatnot, and then used "import secrets" to load it up. This code would continue to run successfully when the stdlib gained its own secrets module, but could potentially break depending on import order and other imports. If, instead, the normal thing to do is "from . import secrets", then there's no conflict ever, and the intent is very clear - this should be imported from the surrounding package directory, not searched for along sys.path. Remember, sys.modules is global, so "I need both stdlib and local" might actually mean "I need my local module, but something that I import indirectly needs the stdlib one". And clashes are hard to debug. ChrisA

Consider a directory structure like this one: ganarchy/ ├── cli │ ├── debug.py │ ├── __init__.py │ └── __pycache__ ├── config.py ├── __init__.py ├── __main__.py └── __pycache__ abdl/ ├── exceptions.py ├── __init__.py ├── _parser.py ├── __pycache__ ├── validators.py └── _vm.py If one cd'd to the package root (in this case ./, or ~/.git/selfhosted/ganarchy), one can do "python -m ganarchy", which can then "import abdl" and it works. If one does "python ganarchy/__main__.py", nothing works anymore. This means one must use -m with this project structure. Which is okay, except for the part where there's no way to set the package root outside environment variables. (And yes this is a real project structure. Also note that there's no setup.py because setup.py has too many unnecessary knobs, and also note that this works with all the tools you want: pylint, pytest, etc except pip.) I just wanna be able to "python -m ~/.git/selfhosted/ganarchy/ganarchy" or "python -m ganarchy@~/.git/selfhosted/ganarchy" or something. On 2020-01-12 5:06 a.m., Chris Angelico wrote:

On Jan 12, 2020, at 06:10, Soni L. <fakedme+py@gmail.com> wrote:
And yes this is a real project structure. Also note that there's no setup.py because setup.py has too many unnecessary knobs
Who cares what knobs it has? The ones you don’t need to twiddle don’t matter. For most projects, writing an appropriate setup.py is trivial (and when it isn’t—e.g., because you need to dynamically generate some Cython code and compile it—it’s still usually simpler than any other way to get the code to run). Refusing to use it because it has options you aren’t using is like refusing to use the normal Python interpreter because it has even more options. (You could write a site.py that installs import hooks that change the way .py files are compiled, and then your module wouldn’t work. But the answer is just: then don’t turn that knob, not: you can’t use Python.!

On 2020-01-11 23:34, Steven D'Aprano wrote:
I don't think I condone the details of the OP's proposal, but I do agree that the process for executing Python files has some irritating warts. In fact, I would say the problem is precisely that a difference exists between running a "script" and a "module". So let me explain why I think this is annoying. The pain point is relative imports. The docs at https://docs.python.org/3/reference/import.html#packages say: "You can think of packages as the directories on a file system and modules as files within directories, but don’t take this analogy too literally since packages and modules need not originate from the file system." The basic problem is that the overwhelming majority of packages and modules DO originate from the filesystem, and so people naturally want to be able to use the filesystem directly to represent package structure, REGARDLESS OF HOW OR WHETHER THE FILES ARE RUN OR IMPORTED. I'm sorry to put that in caps but that is really the fundamental issue. People want to be able to write something like "from . import stuff" in a file, and know that that will work purely based on the filesystem location in which that file is situated, regardless of how the file is "accessed" by Python (i.e., as a module, script, program, whatever you want to call it). In other words, what non-expert users expect is that if there is a directory called `foo` with a subdirectory `bar` with some more files, that alone should be sufficient to establish that `foo` is a package with `bar` as a subpackage and the other files available as modules like `foo.stuff` and `foo.bar.morestuff`. (Some users perhaps understand that the folders should have an __init__.py to be considered part of the package, but I think even this is less well understood in the era of namespace packages.) It should not matter exactly how you "get to" these files in the first place --- that is, it should not matter whether you are importing a file or running one "as a script" or "as a module", nor should it matter precisely which file you run. The mere fact that a file "a.py" exists and is in the same directory with a file called "b.py" should be enough for "a.py" to use "from . import b" and have it work, always. Now, I realize that there are various reasons why it doesn't work this way. Basically these reasons boil down to the fact that although most packages are transparently represented by their file/directory structure, there are also exist namespace packages, which can have a more diffuse file/directory structure, and it's also possible to create "virtual" packages that have no filesystem representation at all. But the documentation is a long, long way from making this clear. For instance, it says this: "For example, the following file system layout defines a top level parent package with three subpackages:" But that's not true! The filesystem layout itself does not define the package! For relative import purposes, it only "counts" as a package if it's imported, not if a file in it is run directly. Otherwise it's just some files on disk, and if you run one of them "as a script", no package exists as far as Python is concerned. The documentation does go on to describe how __main__ works and how the file's __name__ is set if it's run, and so on. But it does all this using the term "package", which is a trap for the unwary, because they already think package means "a directory with a certain structure" and not "something you get via the `import` statement". Ultimately, the problem is that users (especially beginners) want to be able to put some files in a folder and have it work as a package as long as they are working locally in that folder --- without messing with sys.path or "installing" anything. In other words they want to create a directory and put "my_script.py" in there, and then put "mylib.py" in there and have the former use relative imports to get stuff from the latter. But they can't. Personally, I am in agreement that this behavior is extremely bothersome. (In particular, the fact that __name__ becomes __main__ when the script is run, but is set to its usual name when it is imported, was a poor design decision that creates confusing asymmetries between the run and import cases.) It makes it unnecessarily difficult to write small, self-contained programs which make use of relative imports. Yes, it is better to write a setup.py and specify the dependencies, and blah blah, but for small tasks people often simply don't want to do that. They want to unzip their files into a directory and have it work, without notifying Python about installing anything or putting anything on the path. As far as solutions, I think an idea worth considering would be a new command-line option similar to "-m" which effectively says "run this FILE that I am telling you, but pretend it is in whatever package it seems to be in based on the directory structure". So like suppose the option is -f for "file as module". It means if I do "python -f script.py", it would run that file, but correctly set up __package__ and so on so that "script.py" (and other files it imports) would be able to use relative imports. Maybe that would mean they could unexpectedly import higher than their level (i.e., use relative-import dots going above the actual top level of the package), or maybe the relative imports would be local to the directory where "script.py" is located, or maybe you could even specify the relative import "root" in a separate option, like "python -f script.py -r my/package/root". The basic point is that people want to use relative imports without including boilerplate code to put themselves on sys.path, and without caring about whether the file is run directly or imported as a module, and without "installing" anything, and in general without thinking about anything except the local directory structure in which the file they are running is situated. I realize that in many ways this is sloppy and you could say "don't do that", but I think if that is the position, the documentation needs to be seriously tightened up. In particular it needs to be made clear --- at every single mention! --- that "package" refers only to something that is imported and not to a file's "identity" based on its filesystem location. Just over six years ago I wrote an answer about this on StackOverflow (https://stackoverflow.com/questions/14132789/relative-imports-for-the-billio...) that continues to get upvotes and comments of the form "wow why isn't this explained in the documentation" almost daily. I hope it is clear that, even if we want to leave the behavior exactly as it is, there is a major problem with how people think they can use relative imports based on the official documentation. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Fri, Jan 10, 2020 at 11:53:10PM -0300, Soni L. wrote:
What do you mean? `python -m` ought to run any module in the path, regardless of whether it is a .py file or a package. (Or for that matter, a .pyc file.) If `import spam.eggs` will find it, `python -m spam.eggs` ought to run it. If it doesn't, that's a bug. The only tricky thing is that to run a top level package, you need `__main__.py` as the entry point, not `__init__.py`. But running submodules of a package should just work. To test it, I creates a package "spam" on the PYTHONPATH: spam +-- __init__.py +-- __main__.py +-- eggs.py where `__init__.py` is an empty file, and the other two contain: if __name__ == '__main__': import __main__ print(__main__.__file__) the cd'ed to a location off the PYTHONPATH, and ran these: [steve@ando tmp]$ python3.5 -m spam.eggs /home/steve/python/spam/eggs.py [steve@ando tmp]$ python3.5 -m spam /home/steve/python/spam/__main__.py Have I misunderstood what you are trying to describe? -- Steven

On Fri, Jan 10, 2020 at 11:53:10PM -0300, Soni L. wrote:
Oh, a further thought comes to mind... if your modules aren't on the PYTHONPATH, just drop the -m and you can execute any legal Python file, even if the name isn't a legal identifier. [steve@ando tmp]$ cp ~/python/spam/eggs.py /tmp/"some name".foo [steve@ando tmp]$ python3.5 "some name.foo" some name.foo So as I understand it, the functionality you want already exists, both for running scripts on the path using -m and scripts with arbitrary file names without -m. Have I missed some subtlety in your proposal? -- Steven

On Jan 10, 2020, at 20:40, Steven D'Aprano <steve@pearwood.info> wrote:
Well, there are definitely subtle differences. If it’s a single file rather than a package, running it as a script means you have to include the extension; with -m you can’t. If you -m a package, argv[0] is the path to its __main__.py; if you run a package as a script it’s the path to the package. If you -m the parent directory of the package obviously has to be on sys.path, but if you run it as a script, it generally won’t be. There might also be differences with which paths (argv[0], __file__, __cached__, etc.) get abspathed on each platform (I can never remember the rules for that anyway). Do they both work with namespace packages? Fire the same auditing events? Interact the same way with the Windows py.exe launcher and shbang lines? Does -m ignore case on *nix on case-insensitive filesystems? What if you’ve got weird stuff in your metapath from a site-installed import hook? I don’t know the answers for most of these. Of course part of the reason I don’t know the answers is that I don’t think I’ve ever had a case where any of these differences mattered. But I could imagine there might be one. But really, 80% of the time, people who are asking for anything like this actually have a perfect use case for building a setuptools-installable package (maybe even with a generated entry-point script) and just don’t know it. A lot of people think it’s really hard if they’ve never done it, or that it’s only appropriate for stuff you want to publish on PyPI, or whatever. Anyway, I don’t really like the proposal as given even if there is a use for it. Mixing the module hierarchy and the file system hierarchy like that seems like a recipe for confusion—if not in the interpreter (and the shell’s tab completer) then at least in the human reader. Is a\b.c/d.e module e in package d in directory a\b.c\, or module e in (invalid) package c/d in package b in directory a\, or what? In this case, I think only one of the ambiguous ways you could read that is actually valid, but surely the rule can’t be “parse it every way and pick one of the valid ones arbitrarily”. (And imagine the confusion if your package has a module named py or pyw.) Wouldn’t it be simpler to just use a separate arg to add a path to the sys.path, or to cd to, or to use in place of sys.path for finding the main module? (Without knowing the intended use I don’t know which of those is appropriate, but I’m guessing one of them is.)

On Sat, Jan 11, 2020, at 02:06, Andrew Barnert via Python-ideas wrote:
speaking of executing packages from the command line... I noticed a different set of anomalies (I was going to suggest this as a solution or workaround, but it turned out to be non-viable). If you specify a directory name as the primary script argument to python *without* -m: - __main__.py will be executed - argv[0] will be the dir name as-is, not a path to __main__.py - sys.path[0] is the given directory - __package__ is *the empty string*. [under normal circumstances for a module not in a package, __package__ is None. However, the empty string here does not allow from . import to work] I expected: - __main__.py will be executed - I had no particular expectations regarding argv[0], but I believe this turns out to be *only* circumstance under which it can name something other than a python file - sys.path[0] to be the parent directory of the given directory - __package__ to be the name of the directory Also note that executing a zip file on the command line behaves similarly to the actual behavior described above, and there is no way to achieve behavior analogous to what I expected because there is no clear way to treat a zip file as a package. I have not considered the implications regarding zip files further.

On Fri, Jan 10, 2020 at 11:06:25PM -0800, Andrew Barnert wrote:
On Jan 10, 2020, at 20:40, Steven D'Aprano <steve@pearwood.info> wrote:
[...]
Have I missed some subtlety in your proposal?
Well, there are definitely subtle differences.
I'm sure there are, but it isn't clear to me that the proposed enhancement has anything to do with the differences you mention.
If it’s a single file rather than a package, running it as a script means you have to include the extension; with -m you can’t.
Well, sure, but when running as a script you are specifying a location, and if you leave the extension off (assuming the script has one, which it may not) then you are specifying a different, and probably non-existing, location. When using -m, you are telling Python to search for and run the named module, and you have to use the same rules for legal module names as you have to use inside the interpreter. And that means no extensions: import math # not math.dll I'm not concerned that the signatures are different: python pathname # run the entity specified by the pathname python -m module # run the module specified by the identifier but whether between those two mechanisms (pick one or the other) it solves Soni's issue. I think they might. Soni suggested that you can't run a submodule in a package unless you cd into the package, but I think that's wrong. I think that between the two mechanisms for calling modules/packages/submodules, and the "__main__.py" entry point to packages, this should cover pretty much all the obvious use-cases for calling Python modules from the command line. (Modulo any potential bugs in the implementation.) I'm not worried about all the methods using precisely the same interface, nor am I worried about subtle differences in the way the various special variables argv[0], `__file__` etc are filled in. (Those differences can be used by scripts/modules etc to identify how they are called, in case they care.) If Soni is worried about those things, they should say so, but I suspect that they merely don't know about the `__main__.py` entry point. Or perhaps I have completely misunderstood their proposal and I'm talking nonsense.
Wouldn’t it be simpler to just use a separate arg to add a path to the sys.path, or to cd to
Or we can learn to use your shell effectively. Does Python need to offer equivalent ways of doing everything the OS shell provides? I hope not :-) In POSIX systems you can just set the PYTHONPATH for a single invocation of the interpreter: PYTHONPATH="/tmp" python whatever As for the "cd to" option, you can get that with parentheses: [steve@ando tmp]$ pwd /tmp [steve@ando tmp]$ (cd /; python3.5 -c "import os; print(os.getcwd())") / [steve@ando tmp]$ I assume Windows has equivalent features to the above. If not, then perhaps we ought to consider what you say, but otherwise, let's not duplicate shell features in the interpreter if we don't have a good reason to do so. -- Steven

On 2020-01-11 4:06 a.m., Andrew Barnert via Python-ideas wrote:
PYTHONPATH=foo/bar python -m baz.qux becomes python -m foo/bar/baz.qux which is less of a kludge. and baz.qux can still import stuff on the foo/bar "package root", so e.g. it could "import baz" or "import something_else_entirely". This is for development (or running things straight off the dev environment), not deployment.

On Sat, Jan 11, 2020 at 11:27:51AM -0300, Soni L. wrote:
Sorry Soni, I completely disagree with you. The status quo `PYTHONPATH=foo/bar python -m baz.qux` is explicit about changing the PYTHONPATH and it uses a common, standard shell feature. This takes two well-designed components that work well, and can be understood in isolation, and plugging them together. The first part of the command explicitly sets the PYTHONPATH, the second part of the command searches the PYTHONPATH for the named module. Far from being a kludge, I think this is elegant, effective design. It seems to me that your proposed syntax is the kludge: it mixes pathnames and module identifiers into a complex, potentially ambiguous "half path, half module spec" hybrid: foo/bar/baz.qux * foo/bar/ is a pathname * baz.qux is a fully-qualified module identifier, not a file name The reader has to read that and remember that even though it looks exactly like a pathname, it isn't, it does not refer to the file "baz.qux" in directory "foo/bar/". It means: * temporarily add "foo/bar/" to the PYTHONPATH * find package "baz" (which could be anywhere in the PYTHONPATH) * run the module baz.qux (which might not be qux.py) -- Steven

I just want python foo/bar/baz/qux/__main__.py but with imports that actually work. -m works, but requires you to cd. -m with path would be an more than huge improvement. and it absolutely should look for the given module in the given path. not "anywhere in the PYTHONPATH". On 2020-01-11 2:21 p.m., Steven D'Aprano wrote:

Soni, Perhaps what you're looking for is available by writing a short Python program with a shebang? Then PYTHONPATH would be set to the directory of the program (many small projects include a `run.py` in the project's base directory). You can also place the program in ~/bin if it does `export PYTHONPATH`. Then, I have this alias for one of my home-brewed tools, and it works as I want: alias chubby='PYTHONPATH=~/chubby ~/.virtualenvs/chubby/bin/python -Oum chubby' I too think that the semantics of `python -m` are fine. On Sat, Jan 11, 2020 at 1:46 PM Soni L. <fakedme+py@gmail.com> wrote:
-- Juancarlo *Añez*

why are we allowed to have fancy `python /path/to/foo.py` but not fancy `python -m /path/to/foo`? if `python` was capable of detecting modules and automatically deciding package roots, none of this would even be an argument and I'd just use `python /path/to/module/submodule/__main__.py` (with "module" having an __init__.py) and be done with it. but python can't do that because backwards compatibility and whatnot. so I propose we shove the solution into python -m instead. why's that so bad? it's simply ergonomics. On 2020-01-11 6:28 p.m., Juancarlo Añez wrote:

Soni, Others have explained it already. `python -m` expects a _module_ as parameter, and that module is searched by the rules `import` follows under `PYTHONPATH`. What you're asking for is that `python` sets `PYTHONPATH` before executing a module. Maybe another option to `python`? python -p /path/to -m foo I would agree that would be nice. On Sat, Jan 11, 2020 at 6:07 PM Soni L. <fakedme+py@gmail.com> wrote:
-- Juancarlo *Añez*

except I don't want to override PYTHONPATH. when you run a module with "python -m", it uses "." as one of the path elements. when you run a script with "python" it *doesn't use "." as one of the path elements*, instead replacing it with the path to the script. ideally "python -m" would also be able to check that you're running what you think you're running. maybe "python -m module.submodule@/path"? and then it'd check that "/path/module/submodule.py" or "/path/module/submodule/__main__.py" exists. and use "/path" instead of "." in the sys.path. I want a "python -m" clone with "python" semantics, basically. it makes development easier all around. and "python -m" provides a much nicer project structure than "python" IMO and I'd like to encourage ppl to switch their "python" projects to "python -m" projects. On 2020-01-11 7:28 p.m., Juancarlo Añez wrote:

Hi Soni, For the last time, `python -m` takes a _module identifier_ as argument, not a path. For a module identifier to make sense, the `PYTHONPATH` must be set, or assumed. Remember that you're free to write your own `mypython` script which takes whatever arguments you want, with whatever actions you need. Cheers! On Sat, Jan 11, 2020 at 7:02 PM Soni L. <fakedme+py@gmail.com> wrote:
-- Juancarlo *Añez*

On Jan 11, 2020, at 14:09, Soni L. <fakedme+py@gmail.com> wrote:
why are we allowed to have fancy `python /path/to/foo.py` but not fancy `python -m /path/to/foo`?
There’s nothing fancy about the first one. It’s a path, and it’s up to your OS what a path means. It’s exactly the same as passing a path to sh or cmd or ffmpeg or any other tool. The argument to -m is also not fancy, because it’s not a path at all, it’s a module name. Which means Python searches the PYTHONPATH for a module with that name. What you’re asking for is something that’s a path up to some point (even though your shell can’t tell) and then switches to being a module name at some unspecified point in the middle. For your trivial case it’s obvious where you want that point to be, but what’s the rule that handles non-trivial cases that aren’t obvious? And, even if there is a rule that’s easily understandable and unambiguous (to humans, to Python, to shell completers, etc.), why is that better than having separate arguments for the path and the module name in the first place?
if `python` was capable of detecting modules and automatically deciding package roots, none of this would even be an argument and I'd just use `python /path/to/module/submodule/__main__.py` (with "module" having an __init__.py) and be done with it. but python can't do that because backwards compatibility and whatnot.
It’s nothing to do with backward compatibility. Any directory is a package. (There are also packages that aren’t even directories, but ignore that extra complexity here.) If it doesn’t have an __init__.py, it’s a namespace package, which is still a package. That same path is just as validly interpreted five different ways, from path.to.module.submodule.__main__ in directory / to __main__ in directory /path/to/module/submodule. What possible rule could tell Python which of those five you’re intending, short of reading your mind? A human can do some limited mind reading. Usually you don’t have modules named ”__main__” except as the main script of a package; “module” sounds like a good name for a module but “to” doesn’t; people rarely use hierarchies more than 2 or 3 deep for packages but often do use deep hierarchies for filesystem paths; etc. But even a human can’t guess for a case like spam/eggs/cheese.py which one of those is the root. A solution that made it easier to tell Python that something is a package would solve your problem without requiring magic telepathy, and without breaking modules. The obvious way to do that is to put its parent directory on the PYTHONPATH. I don’t understand why you don’t like that, but I suppose you could just as easily have a mechanism that says “this directory is a package just as if its parent were on the PYTHONPATH even though it isn’t”. But I don’t see why making Python and the reader and the shell split a hybrid path into hierarchical fs path and hierarchical module path according to some rule you can’t even tell us is supposed to make that any easier for anyone.

On 2020-01-11 8:33 p.m., Andrew Barnert wrote:
algorithm for -m idea: def find_package_root_and_module_name(argm: Path): return (argm.parent, argm.name) algorithm for non -m idea: def find_package_root_and_module_name(argv0: Path): argv0 = argv0.resolve() package_root = argv0.parent # or however we resolve the package_root today if argv0.suffix != "py": return (package_root, None) module_name = argv0.stem # may be __main__! while package_root.exists(): if (package_root / '__init__.py').exists(): module_name = f"{package_root.name}.{module_name}" package_root = package_root.parent else: return (package_root, module_name) raise ??? # this means every directory up to the root/drive/filesystem root/whatever is a package. there is no package root so just raise.

On Sat, Jan 11, 2020 at 02:46:14PM -0300, Soni L. wrote:
That's what you said in your first post. In my first response, I said it works for me. Unless I have misunderstood you, I think you are mistaken about needing to cd into the package root. Can you give a clear (and simple) example of a package in the PYTHONPATH where python -m package.module doesn't work?
and it absolutely should look for the given module in the given path. not "anywhere in the PYTHONPATH".
If you know the precise location of the module, and you don't want to search the PYTHONPATH, why are you using -m? `python -m module` is for searching the PYTHONPATH when you don't know or care precisely where module is located. Think of it as a hammer. `python filename` is for running the module when you do know and care precisely which file you are running. Think of it as a screwdriver. You seem to be asking to weld a screwdriver head to a hammer so that you can have the hammer behave like a screwdriver. If you know the precise path you want to run, why are you using # your proposed syntax python -m spam/eggs/cheese.aardvark when you could just as easily run this and get the effect you want? python spam/eggs/cheese/aardvark.py This is not a rhetorical question. As far as I can tell from your explanation so far, what you want to do is possible *right now* if you just stop typing `-m` after the `python` command and use the right tool for the job. If you see a difference that I don't, please explain what that difference is. -- Steven

On 2020-01-11 9:01 p.m., Steven D'Aprano wrote:
but those are *not* equivalent. the first one has spam/eggs/ as the package root, with cheese.aardvark as the module (the thing passed to "import" (except with the __main__ hacks that I don't wanna go into)). the second one has spam/eggs/cheese/ as the package root, and "aardvark" isn't even loaded as a module! if you did "import cheese.aardvark" it'd fail, and if you did "import .aardvark" it would load the module twice, once under "__main__" and once under "aardvark".

On Sun, Jan 12, 2020 at 11:10 AM Steven D'Aprano <steve@pearwood.info> wrote:
The biggest difference is that scripts can't do relative imports. So here's a counter-proposal: Allow "from . import modulename" to import "modulename.py" from the directory that contains the script that Python first executed (or, for interactive Python, the current directory as Python started). ChrisA

Understanding "script" as a free standing ".py"... I loved your suggestion because of it's convenience, but I worry it could be a security hole. I have scripts in ~/bin, in ./bin, and in ./scripts. And for ./scripts, it would be most useful if `from .. import` was allowed, so there was no more `import sys; sys.path.insert(0, '.'). I believe the above is addressed on a PEP (which number I don't remember). The discussion that has followed validates the OP's concern (even if the originally proposed solution is not right). It's hard to write scripts in Python that have access to the modules they intuitively should have access to, without patching (`sys.path.insert(...)` is awful) -- Juancarlo *Añez*

On Sun, Jan 12, 2020 at 12:30 PM Juancarlo Añez <apalala@gmail.com> wrote:
How is this a security hole? If anything, it's LESS of a security hole. Consider: $ echo 'print("You got p0wned")' >re.py $ echo 'import re' >demo.py $ python3 demo.py You got p0wned That can ALREADY happen, and it's possible for a script to shadow something from the stdlib. You can even trigger this indirectly - for instance, attempting to import argparse will cause re to be imported. Using "from . import modulename" implies that you specifically do NOT want to load up a module based on sys.path, but are looking for one in the current package's directory. That almost certainly means that such a file will indeed exist, so there's a few attack surfaces closed off. (For the record, I'm not complaining about the status quo, nor proposing this as any sort of "solution" to a "problem". Just saying that it isn't creating any problem that doesn't already exist.) ChrisA

On Sun, Jan 12, 2020 at 11:59:20AM +1100, Chris Angelico wrote:
The biggest difference is that scripts can't do relative imports.
How is that relevent? People keep mentioning minor differences between different ways of executing different kinds of entities (scripts, packages, submodules etc) but not why those differences are important or why they would justify any change in the way -m works. A script can't do relative imports because they aren't packages and so don't have any submodules that they could do a relative import of. If relative imports are important to you, why not use a package? (I don't want to keep saying this, so I will say it only once: none of my questions in this post are rhetorical questions.) Or have the script adjust the path itself. Say I want to package up a bunch of related scripts and libraries, so I create this structure: myscripts/ +-- spam.py +-- eggs.py +-- cheese.py but I want spam to be able to import eggs and cheese, and vice versa, the obvious solutions (in order of decreasing obviousness) are: * Just use a package. * Set the PYTHONPATH from the command line: PYTHONPATH='myscripts/' python -m spam * Have each module insert its parent directory on the path: import pathlib, sys sys.path.insert(0, pathlib.Path(__file__).parent.absolute()) There may be pros and cons of each approach, there may even be slight differences in behaviour, but is there some reason why none of these are satisfactory and the only solution is special support from the interpreter?
I wish people wouldn't jump straight to proposing solutions before they have articulated the specific nature of the problem they wish to solve first :-( (This isn't aimed *specifically* at Chris, it is a general observation.) Even if the problem is blindingly obvious to (generic) you, it may not be obvious to all of us. We shouldn't get twenty posts into a thread and have the problem being solved still be unclear. Why is it necessary to use a relative import in a non-package script? Is this a way to solve name collisions? "I want a script spam.py that depends on a module X.py, but X clashes with a standard library module, and I need both the stdlib X and the local X." I think a dotted import would work (but so would renaming X) and the obvious way to make the dotted import work is to put spam.py and X.py into a package. -- Steven

On Sun, Jan 12, 2020 at 6:38 PM Steven D'Aprano <steve@pearwood.info> wrote:
Have done so. The net result is often that there's a shim script that has no purpose but to adjust the path and import a module, and an extra directory level between that and the "real" code.
The problem is that there's a stark division between scripts and modules, and a directory is kinda a package, but kinda not. Compare: On Sun, Jan 12, 2020 at 10:34 AM Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Any directory is a package. ... If it doesn’t have an __init__.py, it’s a namespace package, which is still a package.
So if a directory is a package, but a script in a directory isn't a module in a package, what gives? Wouldn't it be cleaner if a script in a directory could do a relative import, given that directories ARE capable of behaving as packages?
Fair point. Hopefully I've clarified it above.
It's about clarity and potential forward compatibility. Imagine if someone had a "secrets.py" containing their encryption keys and whatnot, and then used "import secrets" to load it up. This code would continue to run successfully when the stdlib gained its own secrets module, but could potentially break depending on import order and other imports. If, instead, the normal thing to do is "from . import secrets", then there's no conflict ever, and the intent is very clear - this should be imported from the surrounding package directory, not searched for along sys.path. Remember, sys.modules is global, so "I need both stdlib and local" might actually mean "I need my local module, but something that I import indirectly needs the stdlib one". And clashes are hard to debug. ChrisA

Consider a directory structure like this one: ganarchy/ ├── cli │ ├── debug.py │ ├── __init__.py │ └── __pycache__ ├── config.py ├── __init__.py ├── __main__.py └── __pycache__ abdl/ ├── exceptions.py ├── __init__.py ├── _parser.py ├── __pycache__ ├── validators.py └── _vm.py If one cd'd to the package root (in this case ./, or ~/.git/selfhosted/ganarchy), one can do "python -m ganarchy", which can then "import abdl" and it works. If one does "python ganarchy/__main__.py", nothing works anymore. This means one must use -m with this project structure. Which is okay, except for the part where there's no way to set the package root outside environment variables. (And yes this is a real project structure. Also note that there's no setup.py because setup.py has too many unnecessary knobs, and also note that this works with all the tools you want: pylint, pytest, etc except pip.) I just wanna be able to "python -m ~/.git/selfhosted/ganarchy/ganarchy" or "python -m ganarchy@~/.git/selfhosted/ganarchy" or something. On 2020-01-12 5:06 a.m., Chris Angelico wrote:

On Jan 12, 2020, at 06:10, Soni L. <fakedme+py@gmail.com> wrote:
And yes this is a real project structure. Also note that there's no setup.py because setup.py has too many unnecessary knobs
Who cares what knobs it has? The ones you don’t need to twiddle don’t matter. For most projects, writing an appropriate setup.py is trivial (and when it isn’t—e.g., because you need to dynamically generate some Cython code and compile it—it’s still usually simpler than any other way to get the code to run). Refusing to use it because it has options you aren’t using is like refusing to use the normal Python interpreter because it has even more options. (You could write a site.py that installs import hooks that change the way .py files are compiled, and then your module wouldn’t work. But the answer is just: then don’t turn that knob, not: you can’t use Python.!

On 2020-01-11 23:34, Steven D'Aprano wrote:
I don't think I condone the details of the OP's proposal, but I do agree that the process for executing Python files has some irritating warts. In fact, I would say the problem is precisely that a difference exists between running a "script" and a "module". So let me explain why I think this is annoying. The pain point is relative imports. The docs at https://docs.python.org/3/reference/import.html#packages say: "You can think of packages as the directories on a file system and modules as files within directories, but don’t take this analogy too literally since packages and modules need not originate from the file system." The basic problem is that the overwhelming majority of packages and modules DO originate from the filesystem, and so people naturally want to be able to use the filesystem directly to represent package structure, REGARDLESS OF HOW OR WHETHER THE FILES ARE RUN OR IMPORTED. I'm sorry to put that in caps but that is really the fundamental issue. People want to be able to write something like "from . import stuff" in a file, and know that that will work purely based on the filesystem location in which that file is situated, regardless of how the file is "accessed" by Python (i.e., as a module, script, program, whatever you want to call it). In other words, what non-expert users expect is that if there is a directory called `foo` with a subdirectory `bar` with some more files, that alone should be sufficient to establish that `foo` is a package with `bar` as a subpackage and the other files available as modules like `foo.stuff` and `foo.bar.morestuff`. (Some users perhaps understand that the folders should have an __init__.py to be considered part of the package, but I think even this is less well understood in the era of namespace packages.) It should not matter exactly how you "get to" these files in the first place --- that is, it should not matter whether you are importing a file or running one "as a script" or "as a module", nor should it matter precisely which file you run. The mere fact that a file "a.py" exists and is in the same directory with a file called "b.py" should be enough for "a.py" to use "from . import b" and have it work, always. Now, I realize that there are various reasons why it doesn't work this way. Basically these reasons boil down to the fact that although most packages are transparently represented by their file/directory structure, there are also exist namespace packages, which can have a more diffuse file/directory structure, and it's also possible to create "virtual" packages that have no filesystem representation at all. But the documentation is a long, long way from making this clear. For instance, it says this: "For example, the following file system layout defines a top level parent package with three subpackages:" But that's not true! The filesystem layout itself does not define the package! For relative import purposes, it only "counts" as a package if it's imported, not if a file in it is run directly. Otherwise it's just some files on disk, and if you run one of them "as a script", no package exists as far as Python is concerned. The documentation does go on to describe how __main__ works and how the file's __name__ is set if it's run, and so on. But it does all this using the term "package", which is a trap for the unwary, because they already think package means "a directory with a certain structure" and not "something you get via the `import` statement". Ultimately, the problem is that users (especially beginners) want to be able to put some files in a folder and have it work as a package as long as they are working locally in that folder --- without messing with sys.path or "installing" anything. In other words they want to create a directory and put "my_script.py" in there, and then put "mylib.py" in there and have the former use relative imports to get stuff from the latter. But they can't. Personally, I am in agreement that this behavior is extremely bothersome. (In particular, the fact that __name__ becomes __main__ when the script is run, but is set to its usual name when it is imported, was a poor design decision that creates confusing asymmetries between the run and import cases.) It makes it unnecessarily difficult to write small, self-contained programs which make use of relative imports. Yes, it is better to write a setup.py and specify the dependencies, and blah blah, but for small tasks people often simply don't want to do that. They want to unzip their files into a directory and have it work, without notifying Python about installing anything or putting anything on the path. As far as solutions, I think an idea worth considering would be a new command-line option similar to "-m" which effectively says "run this FILE that I am telling you, but pretend it is in whatever package it seems to be in based on the directory structure". So like suppose the option is -f for "file as module". It means if I do "python -f script.py", it would run that file, but correctly set up __package__ and so on so that "script.py" (and other files it imports) would be able to use relative imports. Maybe that would mean they could unexpectedly import higher than their level (i.e., use relative-import dots going above the actual top level of the package), or maybe the relative imports would be local to the directory where "script.py" is located, or maybe you could even specify the relative import "root" in a separate option, like "python -f script.py -r my/package/root". The basic point is that people want to use relative imports without including boilerplate code to put themselves on sys.path, and without caring about whether the file is run directly or imported as a module, and without "installing" anything, and in general without thinking about anything except the local directory structure in which the file they are running is situated. I realize that in many ways this is sloppy and you could say "don't do that", but I think if that is the position, the documentation needs to be seriously tightened up. In particular it needs to be made clear --- at every single mention! --- that "package" refers only to something that is imported and not to a file's "identity" based on its filesystem location. Just over six years ago I wrote an answer about this on StackOverflow (https://stackoverflow.com/questions/14132789/relative-imports-for-the-billio...) that continues to get upvotes and comments of the form "wow why isn't this explained in the documentation" almost daily. I hope it is clear that, even if we want to leave the behavior exactly as it is, there is a major problem with how people think they can use relative imports based on the official documentation. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
participants (7)
-
Andrew Barnert
-
Brendan Barnwell
-
Chris Angelico
-
Juancarlo Añez
-
Random832
-
Soni L.
-
Steven D'Aprano