__dir__ in which folder is this py file
![](https://secure.gravatar.com/avatar/48aae91146c6aa266ff2c9b3d1ab5d8b.jpg?s=120&d=mm&r=g)
Hi Ideas, I often need to reference a script's current directory. I end up writing: import os SRC_DIR = os.path.dirname(__file__) But I would prefer to have a new dunder for that. I propose: "__dir__". I was wondering if others would find it convenient to include such a shortcut. Here are some examples of dirname(__file__) in prominent projects. https://github.com/tensorflow/models/search?l=Python&q=dirname&type= https://github.com/django/django/search?l=Python&q=dirname&type= https://github.com/nose-devs/nose/search?l=Python&q=dirname&type= Reasons not to add __dir__: * There already is one way to do it and it's clear and fairly short. * Avoid the bikeshed discussion of __dir__, __folder__, and other candidates. Reasons to add it: * os.path.dirname(__file__) returns the empty string when you're in the same directory as the script. Luckily, os.path.join understands an empty string as a ".", but this still is suboptimal for logging where it might be surprising to find the empty string. __dir__ could be implemented to contain a "." in that case. * I would save about 20 characters and a line from 50% of my python scripts. * This is such a common construct that everyone giving it their own name seems suboptimal for communicating. Common names include: here, path, dirname, module_dir. Cheers, Yuval Greenfield P.s. nodejs has it - https://nodejs.org/docs/latest/api/modules.html#modules_dirname also I apologize if this has been suggested before - my googling didn't find a previous thread.
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Sun, May 06, 2018 at 06:53:11AM +0000, Yuval Greenfield wrote:
Hi Ideas,
I often need to reference a script's current directory. I end up writing:
import os SRC_DIR = os.path.dirname(__file__)
But I would prefer to have a new dunder for that. I propose: "__dir__". I was wondering if others would find it convenient to include such a shortcut.
Not really, no. If I'm doing file name processing such that I need the script's directory, I already need to import os, so providing this pre-calculated would only save at most a single line. Not every one-liner needs to be a built-in. I don't strongly oppose this, but given how easy it is, I don't really see the point. New features add a cost, and while these costs are individually tiny: - one more thing to document - one more thing for people to learn and memorise - every script pays the cost of calculating this dirname whether it is needed or not etc, the corresponding benefit is also tiny, so it is not clear to me that the benefit from having this is greater than the cost of having it. If you can demonstrate a clear, significant benefit, the balance would shift in favour of this proposal, but as it stands, it seems like a mere matter of personal taste. So in the absence of any clear, non-trivial benefit, I'm vaguely -0 on this. However, I am opposed to the use of __dir__ as the dunder name, since __dir__ is already used as the dunder method for the dir() builtin. Even though strictly speaking there is no conflict between a method and a module global, conceptually they would be better kept distinct. If this is approved, I suggest __dirname__ instead.
Reasons not to add __dir__: * There already is one way to do it and it's clear and fairly short.
Indeed.
Reasons to add it: * os.path.dirname(__file__) returns the empty string when you're in the same directory as the script. Luckily, os.path.join understands an empty string as a ".", but this still is suboptimal for logging where it might be surprising to find the empty string.
Can you give an example where the empty string actually is a real problem, rather than just "might be"? Bonus points if it was an actual problem in real code, not just a hypothetical problem made up as an example. For what it's worth, if there is such a genuine problem, that would shift me to +0.5 on the proposal: SRC_DIR = os.path.dirname(__file__) or '.' versus __dirname__ -- Steve
![](https://secure.gravatar.com/avatar/3000e268c155dc2ca5c6e5f71af92e8a.jpg?s=120&d=mm&r=g)
With PEP 562, the name __dir__ is off limits for this. Cody On Sun, May 6, 2018, 1:54 AM Yuval Greenfield <ubershmekel@gmail.com> wrote:
Hi Ideas,
I often need to reference a script's current directory. I end up writing:
import os SRC_DIR = os.path.dirname(__file__)
But I would prefer to have a new dunder for that. I propose: "__dir__". I was wondering if others would find it convenient to include such a shortcut.
Here are some examples of dirname(__file__) in prominent projects.
https://github.com/tensorflow/models/search?l=Python&q=dirname&type= https://github.com/django/django/search?l=Python&q=dirname&type= https://github.com/nose-devs/nose/search?l=Python&q=dirname&type=
Reasons not to add __dir__: * There already is one way to do it and it's clear and fairly short. * Avoid the bikeshed discussion of __dir__, __folder__, and other candidates.
Reasons to add it: * os.path.dirname(__file__) returns the empty string when you're in the same directory as the script. Luckily, os.path.join understands an empty string as a ".", but this still is suboptimal for logging where it might be surprising to find the empty string. __dir__ could be implemented to contain a "." in that case. * I would save about 20 characters and a line from 50% of my python scripts. * This is such a common construct that everyone giving it their own name seems suboptimal for communicating. Common names include: here, path, dirname, module_dir.
Cheers,
Yuval Greenfield
P.s. nodejs has it - https://nodejs.org/docs/latest/api/modules.html#modules_dirname also I apologize if this has been suggested before - my googling didn't find a previous thread.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
![](https://secure.gravatar.com/avatar/a22bccc97ef1f69516894f1d150b81a2.jpg?s=120&d=mm&r=g)
2018-05-06 15:28 GMT+02:00 Cody Piersall <cody.piersall@gmail.com>:
With PEP 562, the name __dir__ is off limits for this.
Cody
On Sun, May 6, 2018, 1:54 AM Yuval Greenfield <ubershmekel@gmail.com> wrote:
Hi Ideas,
I often need to reference a script's current directory. I end up writing:
import os SRC_DIR = os.path.dirname(__file__)
But I would prefer to have a new dunder for that. I propose: "__dir__". I was wondering if others would find it convenient to include such a shortcut.
Here are some examples of dirname(__file__) in prominent projects.
https://github.com/tensorflow/models/search?l=Python&q=dirname&type= https://github.com/django/django/search?l=Python&q=dirname&type= https://github.com/nose-devs/nose/search?l=Python&q=dirname&type=
Reasons not to add __dir__: * There already is one way to do it and it's clear and fairly short. * Avoid the bikeshed discussion of __dir__, __folder__, and other candidates.
Reasons to add it: * os.path.dirname(__file__) returns the empty string when you're in the same directory as the script. Luckily, os.path.join understands an empty string as a ".", but this still is suboptimal for logging where it might be surprising to find the empty string. __dir__ could be implemented to contain a "." in that case. * I would save about 20 characters and a line from 50% of my python scripts. * This is such a common construct that everyone giving it their own name seems suboptimal for communicating. Common names include: here, path, dirname, module_dir.
Cheers,
Yuval Greenfield
P.s. nodejs has it - https://nodejs.org/docs/latest/api/modules.html# modules_dirname also I apologize if this has been suggested before - my googling didn't find a previous thread.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Hi, I would give +1 for __dirname__ George
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Mon, May 7, 2018 at 1:05 AM, George Fischhof <george@fischhof.hu> wrote:
On Sun, May 6, 2018, 1:54 AM Yuval Greenfield <ubershmekel@gmail.com> wrote:
Hi Ideas,
I often need to reference a script's current directory. I end up writing:
import os SRC_DIR = os.path.dirname(__file__)
I would give +1 for __dirname__
Something to keep in mind: making this available to every module, whether it's wanted or not, means that the Python interpreter has to prepare that just in case it's wanted. That's extra work as part of setting up a module. Which, in turn, means it's extra work for EVERY import, and consequently, slower Python startup. It might only be a small slowdown, but it's also an extremely small benefit. -1. ChrisA
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 7 May 2018 at 03:44, Chris Angelico <rosuav@gmail.com> wrote:
On Mon, May 7, 2018 at 1:05 AM, George Fischhof <george@fischhof.hu> wrote:
On Sun, May 6, 2018, 1:54 AM Yuval Greenfield <ubershmekel@gmail.com> wrote:
Hi Ideas,
I often need to reference a script's current directory. I end up
writing:
import os SRC_DIR = os.path.dirname(__file__)
I would give +1 for __dirname__
Something to keep in mind: making this available to every module, whether it's wanted or not, means that the Python interpreter has to prepare that just in case it's wanted. That's extra work as part of setting up a module. Which, in turn, means it's extra work for EVERY import, and consequently, slower Python startup. It might only be a small slowdown, but it's also an extremely small benefit.
It also makes the name show up in dir(mod) for every module, and we're currently looking for ways to make that list *shorter*, not longer. So I have a different suggestion: perhaps it might make sense to propose promoting a key handful of path manipulation operations to the status of being builtins? Specifically, the ones I'd have in mind would be: - dirname (aka os.path.dirname) - joinpath (aka os.path.join) - abspath (aka os.path.abspath) Why those 3? Because with just those three operations you can locate other files relative to `__file__`, the current working directory [1], and arbitrary absolute paths, as well as remove path traversal notation like ".." and "." from the resulting paths (since abspath() internally calls normpath()). _launch_dir = abspath('') def open_from_launch_dir(relpath, mode='r'): return open(abspath(joinpath(_launch_dir, relpath)), mode) _script_dir = dirname(abspath(__file__)) def open_from_script_dir(relpath, mode='r'): return open(abspath(joinpath(_script_dir, relpath)), mode) You'd still need to import pathlib or os.path for more complex path manipulations, but they generally wouldn't be needed any more if all you're doing is reading and/or writing a handful of specific files. Cheers, Nick. [1] abspath can stand in for os.getcwd(), since you can spell the latter as abspath('.') or abspath(''), and we could potentially even make it so you can retrieve the cwd via just abspath() -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Mon, May 7, 2018 at 12:13 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
So I have a different suggestion: perhaps it might make sense to propose promoting a key handful of path manipulation operations to the status of being builtins?
Specifically, the ones I'd have in mind would be:
- dirname (aka os.path.dirname) - joinpath (aka os.path.join)
These two are the basics of path manipulation. +1 for promoting to builtins, unless pathlib becomes core (which I suspect isn't happening).
- abspath (aka os.path.abspath)
Only +0.5 on this, as it has to do file system operations. It may be worthwhile, instead, to promote os.path.normpath, which (like the others) is purely processing the string form of the path. It'll return the same value regardless of the file system. But yes, I'd much rather see path manipulation based on __file__ and builtins rather than injecting yet another module-level attribute that's derived from what we already have. ChrisA
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 7 May 2018 at 12:35, Chris Angelico <rosuav@gmail.com> wrote:
On Mon, May 7, 2018 at 12:13 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
So I have a different suggestion: perhaps it might make sense to propose promoting a key handful of path manipulation operations to the status of being builtins?
Specifically, the ones I'd have in mind would be:
- dirname (aka os.path.dirname) - joinpath (aka os.path.join)
These two are the basics of path manipulation. +1 for promoting to builtins, unless pathlib becomes core (which I suspect isn't happening).
pathlib has too many dependencies to ever make the type available as a builtin: $ ./python -X importtime -c pass 2>&1 | wc -l 25 $ ./python -X importtime -c "import pathlib" 2>&1 | wc -l 53 It's a good way of unifying otherwise scattered standard library APIs, but it's overkill if all you want to do is to calculate and resolve some relative paths.
- abspath (aka os.path.abspath)
Only +0.5 on this, as it has to do file system operations. It may be worthwhile, instead, to promote os.path.normpath, which (like the others) is purely processing the string form of the path. It'll return the same value regardless of the file system.
My rationale for suggesting abspath() over any of its component parts is based on a few key considerations: - "make the given path absolute" is a far more common path manipulation activitity than "normalise the given path" (most users wouldn't even know what the latter means - the only reason *I* know what it means is because I looked up the docs for abspath while writing my previous comment) - __file__ isn't always absolute (especially in __main__), so you need to be able to do abspath(__file__) in order to reliably apply dirname() more than once - it can stand in for both os.getcwd() (when applied to the empty string or os.curdir) and os.path.normpath() (when the given path is already absolute), so we get 3 new bits of builtin functionality for the price of one new builtin name - I don't want to read "normpath(joinpath(getcwd(), relpath))" when I could be reading "abspath(relpath)" instead Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
Spit-balling: how about __filepath__ as a lazily-created-on-first-access pathlib.Path(__file__)? Promoting os.path stuff to builtins just as pathlib is emerging as TOOWTDI makes me a bit uncomfortable. On Sun, May 6, 2018 at 8:29 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 7 May 2018 at 12:35, Chris Angelico <rosuav@gmail.com> wrote:
On Mon, May 7, 2018 at 12:13 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
So I have a different suggestion: perhaps it might make sense to propose promoting a key handful of path manipulation operations to the status of being builtins?
Specifically, the ones I'd have in mind would be:
- dirname (aka os.path.dirname) - joinpath (aka os.path.join)
These two are the basics of path manipulation. +1 for promoting to builtins, unless pathlib becomes core (which I suspect isn't happening).
pathlib has too many dependencies to ever make the type available as a builtin:
$ ./python -X importtime -c pass 2>&1 | wc -l 25 $ ./python -X importtime -c "import pathlib" 2>&1 | wc -l 53
It's a good way of unifying otherwise scattered standard library APIs, but it's overkill if all you want to do is to calculate and resolve some relative paths.
- abspath (aka os.path.abspath)
Only +0.5 on this, as it has to do file system operations. It may be worthwhile, instead, to promote os.path.normpath, which (like the others) is purely processing the string form of the path. It'll return the same value regardless of the file system.
My rationale for suggesting abspath() over any of its component parts is based on a few key considerations:
- "make the given path absolute" is a far more common path manipulation activitity than "normalise the given path" (most users wouldn't even know what the latter means - the only reason *I* know what it means is because I looked up the docs for abspath while writing my previous comment) - __file__ isn't always absolute (especially in __main__), so you need to be able to do abspath(__file__) in order to reliably apply dirname() more than once - it can stand in for both os.getcwd() (when applied to the empty string or os.curdir) and os.path.normpath() (when the given path is already absolute), so we get 3 new bits of builtin functionality for the price of one new builtin name - I don't want to read "normpath(joinpath(getcwd(), relpath))" when I could be reading "abspath(relpath)" instead
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 7 May 2018 at 13:33, Nathaniel Smith <njs@pobox.com> wrote:
Spit-balling: how about __filepath__ as a lazily-created-on-first-access pathlib.Path(__file__)?
Promoting os.path stuff to builtins just as pathlib is emerging as TOOWTDI makes me a bit uncomfortable.
pathlib *isn't* TOOWTDI, since it takes almost 10 milliseconds to import it, and it introduces a higher level object-oriented abstraction that's genuinely distracting when you're using Python as a replacement for shell scripting. While lazy imports could likely help with the import time problem (since 6.5 of those milliseconds are from importing fnmatch), I think there's also a legitimate argument for a two tier system here, where we say "If you can't handle your filesystem manipulation task with just open, dirname, abspath, and joinpath, then reach for the higher level pathlib abstraction". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sun, May 6, 2018 at 8:47 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 7 May 2018 at 13:33, Nathaniel Smith <njs@pobox.com> wrote:
Spit-balling: how about __filepath__ as a lazily-created-on-first-access pathlib.Path(__file__)?
Promoting os.path stuff to builtins just as pathlib is emerging as TOOWTDI makes me a bit uncomfortable.
pathlib *isn't* TOOWTDI, since it takes almost 10 milliseconds to import it, and it introduces a higher level object-oriented abstraction that's genuinely distracting when you're using Python as a replacement for shell scripting.
Hmm, the feedback I've heard from at least some folks teaching intro-python-for-scientists is like, "pathlib is so great for scripting that it justifies upgrading to python 3". How is data_path = __filepath__.parent / "foo.txt" more distracting than data_path = joinpath(dirname(__file__), "foo.txt") ? And the former gives you far more power: the full Path interface, not just 2-3 common operations. Import times are certainly a consideration, but I'm uncomfortable with jumping straight to adding things to builtins based on current import times, without at least exploring options for speeding that up... -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 7 May 2018 at 14:33, Nathaniel Smith <njs@pobox.com> wrote:
On Sun, May 6, 2018 at 8:47 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 7 May 2018 at 13:33, Nathaniel Smith <njs@pobox.com> wrote:
Spit-balling: how about __filepath__ as a lazily-created-on-first-access pathlib.Path(__file__)?
Promoting os.path stuff to builtins just as pathlib is emerging as TOOWTDI makes me a bit uncomfortable.
pathlib *isn't* TOOWTDI, since it takes almost 10 milliseconds to import it, and it introduces a higher level object-oriented abstraction that's genuinely distracting when you're using Python as a replacement for shell scripting.
Hmm, the feedback I've heard from at least some folks teaching intro-python-for-scientists is like, "pathlib is so great for scripting that it justifies upgrading to python 3".
How is
data_path = __filepath__.parent / "foo.txt"
more distracting than
data_path = joinpath(dirname(__file__), "foo.txt")
Fair point :) In that case, perhaps the right answer here would be to adjust the opening examples section in the pathlib docs, showing some additional common operations like: _script_dir = Path(__file__).parent _launch_dir = Path.cwd() _home_dir = Path.home() And perhaps in a recipes section: def open_file_from_dir(dir_path, rel_path, *args, **kwds): return open(Path(dir_path) / rel_path, *args, **kwds) (Now that open() accepts path objects natively, I'm inclined to recommend that over the pathlib-specific method spelling) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Sun, May 06, 2018 at 09:33:03PM -0700, Nathaniel Smith wrote:
How is
data_path = __filepath__.parent / "foo.txt"
more distracting than
data_path = joinpath(dirname(__file__), "foo.txt")
Why are you dividing by a string? That's weird. [looks up the pathlib docs] Oh, that's why. It's still weird. So yes, its very distracting. First I have to work out what __filepath__ is, then I have to remember the differences between all the various flavours of pathlib.<whatever>Path and suffer a moment or two of existential dread as I try to work out whether or not *this* specific flavour is the one I need. This might not matter for heavy users of pathlib, but for casual users, it's a big, intimidating API with: - an important conceptual difference between pure paths and concrete paths; - at least six classes; - about 50 or so methods and properties As far as performance goes, I don't think it matters that we could technically make pathlib imported lazily. Many people put all their pathname manipulations at the beginning of their script, so lazy or not, the pathlib module is going to be loaded *just after* startup, . For many scripts, this isn't going to matter, but for those who want to avoid the overhead of pathlib, making it lazy doesn't help. That just delays the overhead, it doesn't remove it. -- Steve
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Mon, May 7, 2018, 03:45 Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, May 06, 2018 at 09:33:03PM -0700, Nathaniel Smith wrote:
How is
data_path = __filepath__.parent / "foo.txt"
more distracting than
data_path = joinpath(dirname(__file__), "foo.txt")
Why are you dividing by a string? That's weird.
[looks up the pathlib docs]
Oh, that's why. It's still weird.
So yes, its very distracting.
Well, yes, you do have to know the API to use it, and if you happen to have learned the os.path API but not the pathlib API then of course the os.path API will look more familiar. I'm not sure what this is supposed to prove.
First I have to work out what __filepath__ is, then I have to remember the differences between all the various flavours of pathlib.<whatever>Path and suffer a moment or two of existential dread as I try to work out whether or not *this* specific flavour is the one I need. This might not matter for heavy users of pathlib, but for casual users, it's a big, intimidating API with:
- an important conceptual difference between pure paths and concrete paths; - at least six classes;
The docs could perhaps be more beginner friendly. For casual users, the answer is always "you want pathlib.Path". - about 50 or so methods and properties
Yeah, filesystems have lots of operations. That's why before pathlib users had to learn about os and os.path and shutil and glob and maybe some more I'm forgetting.
As far as performance goes, I don't think it matters that we could technically make pathlib imported lazily. Many people put all their pathname manipulations at the beginning of their script, so lazy or not, the pathlib module is going to be loaded *just after* startup, .
For many scripts, this isn't going to matter, but for those who want to avoid the overhead of pathlib, making it lazy doesn't help. That just delays the overhead, it doesn't remove it.
AFAIK were two situations where laziness has been mentioned in this thread: - my suggestion that we delay loading pathlib until someone accesses __filepath__. I don't actually know how to implement this so it was mostly intended to try to spur new ideas, but if we could do it, the point of the laziness would be so that scripts that didn't use __filepath__ wouldn't pay for it. - Nick's observation that pathlib could load faster if it loaded fnmatch lazily. Since this is only used for a few methods, this would benefit any script that didn't use those methods. (And for scripts that do need fnmatch's functionality, without pathlib they'd just be importing it directly, so pathlib importing it isn't really an extra cost.) It's true that laziness isn't a silver bullet, though, yeah. We should also look for ways to speed things up. -n
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 7 May 2018 at 21:42, Nathaniel Smith <njs@pobox.com> wrote:
On Mon, May 7, 2018, 03:45 Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, May 06, 2018 at 09:33:03PM -0700, Nathaniel Smith wrote:
How is
data_path = __filepath__.parent / "foo.txt"
more distracting than
data_path = joinpath(dirname(__file__), "foo.txt")
Why are you dividing by a string? That's weird.
[looks up the pathlib docs]
Oh, that's why. It's still weird.
So yes, its very distracting.
Well, yes, you do have to know the API to use it, and if you happen to have learned the os.path API but not the pathlib API then of course the os.path API will look more familiar. I'm not sure what this is supposed to prove.
I think it strongly suggests that *magically* introducing a path object into a module's namespace would be a bad idea, since it harms readability (since merely having `path` in the name isn't a strong enough hint that the object in question is a `pathlib.Path` instance). Your original point is still valid though: given the boilerplate reduction already available via "from pathlib import Path; _this_dir = Path(__file__).parent", it's the pathlib version that needs to be taken as the baseline for how verbose the status quo really is, not the lower level os.path API (no matter how accustomed some of us may still be to using the latter). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Mon, May 07, 2018 at 11:42:00AM +0000, Nathaniel Smith wrote:
On Mon, May 7, 2018, 03:45 Steven D'Aprano <steve@pearwood.info> wrote:
[...]
So yes, its very distracting.
Well, yes, you do have to know the API to use it, and if you happen to have learned the os.path API but not the pathlib API then of course the os.path API will look more familiar. I'm not sure what this is supposed to prove.
From that perspective, using / to mean something kinda-sorta like string concatenation, only path separator aware, is precisely the sort of thing
Apologies for not being more clear. I'm arguing that for some people, your preferred syntax *is* more distracting and hard to comprehend than the more self-descriptive version with named functions. And its not just a matter of *learning* the API, it is a matter of using it so often that it ceases to look weird and looks natural.[1] There's a school of thought that says that operator overloading is a bad idea, that operators should never be overridden to do something aside from their common meaning (e.g. + should always mean plus, / should always mean numeric division, etc). that makes some people dislike operator overloading. http://cafe.elharo.com/programming/operator-overloading-considered-harmful/ https://blog.jooq.org/2014/02/10/why-everyone-hates-operator-overloading/ I am not going to go quite that far. I think operator overloading has its uses. I'm not going to argue that pathlib's use of / was "bad" or a mistake or harmful. I called it *weird* and that's as far as I'll go. I use lots of weird things, and I even like some of them. But if you think it isn't distracting, I think you are mistaken, and I think we ought to show caution in making it a built-in or an offical part of the module API. Your earlier comment (which I redacted): Hmm, the feedback I've heard from at least some folks teaching intro-python-for-scientists is like, "pathlib is so great for scripting that it justifies upgrading to python 3". felt to me awfully close to "pathlib! it's the future!" I know that's not what you said, or even meant, but I felt it was important to remind people that not everyone knows pathlib or finds its API clearer than the explicitly named functions of os.path. joinpath() may be longer than / but it is self-descriptive and easier to look up. help(joinpath) will tell you exactly what it does. help("/") is almost surely going to talk about numeric division, and it probably won't even mention strings or path objects at all. I say that because we've had + for string concatenation since way back in Python 1.5 or older, and yet as late as 3.6 help("+") still doesn't say a thing about string, list or tuple concatenation. As a Linux user, I'm used to paths containing slashes: $HOMEDIR/spam/eggs but putting quotes around the components looks unusual and is a hint that something usual is going on (namely a shell escape). But writing something like: HOMEDIR / "spam" / "eggs" doesn't even look like a path, just looks *wrong*. It looks like I'm escaping the wrong parts of the path: instead of escaping the spaces, I've escaped the parts with no spaces. It looks wrong as a Linux path, it looks wrong as a Windows path, and it looks wrong as division. So, yes, it is distracting. I'm not saying that everyone will feel the same way, or that I cannot or will not learn to accept / as I've learned to accept % for string interpolation despite it looking like percentage. But I'm saying it's not a slam-dunk useability win to move to pathlib.
First I have to work out what __filepath__ is, then I have to remember the differences between all the various flavours of pathlib.<whatever>Path and suffer a moment or two of existential dread as I try to work out whether or not *this* specific flavour is the one I need. This might not matter for heavy users of pathlib, but for casual users, it's a big, intimidating API with:
- an important conceptual difference between pure paths and concrete paths; - at least six classes;
The docs could perhaps be more beginner friendly. For casual users, the answer is always "you want pathlib.Path".
That might be what I want, but it isn't what I get: py> p = pathlib.Path('/') py> p PosixPath('/') I know what a PosixPath is. But the point is, even beginners have to deal with the complexity of the pathlib API the moment they print a path object in the interactive interpreter. [1] I've been using % for string interpolation for two decades now, and it still looks like a misplaced percentage sign every single time. -- Steve
![](https://secure.gravatar.com/avatar/95fb3c56e2e5322b0f9737fbb1eb9bce.jpg?s=120&d=mm&r=g)
On 2018-05-07 09:17, Steven D'Aprano wrote:
I'm arguing that for some people, your preferred syntax*is* more distracting and hard to comprehend than the more self-descriptive version with named functions. And its not just a matter of*learning* the API, it is a matter of using it so often that it ceases to look weird and looks natural.[1] <snip> But if you think it isn't distracting, I think you are mistaken, and I think we ought to show caution in making it a built-in or an offical part of the module API.
As an aside, this has some parallels with the recent thread about "objectively quantifying readability". Saying things like "you are mistaken" implies that there is an objective ground truth about what is distracting and what is not. And personally I agree that there is such an objective ground truth, and that it is based on facts about human pyschology (although I don't think I agree with you about this particular case). Of course, there may be differences in how individuals react to things, but there is a real sense in which different syntaxes, constructs, etc., have something like a "mean level of confusion" which represents how easy to deal with people in general find them on average, and by which they can be meaningfully compared. I'm not sure how to proceed to uncover this (unless the PSF starts funding psychological experiments!), but I do think it would be good if we could find ways to get at something like hard evidence for claims about whether things "really are" distracting, readable, unreadable, intuitive, etc. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Mon, May 7, 2018 at 9:17 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I'm arguing that for some people, your preferred syntax *is* more distracting and hard to comprehend than the more self-descriptive version with named functions.
then use Path.joinpath() if you want.
From that perspective, using / to mean something kinda-sorta like string
concatenation, only path separator aware, is precisely the sort of thing that makes some people dislike operator overloading.
The time for this argument was when the pathlib API was designed -- and I"m sure there was plenty of argument -- but using "/" to join paths was jsut too nifty to ignore :-) But we are doing everyone a disservice if we essentially say: This very useful standard library API was poorly designed, so let's stick with the old, ugly painful way... (OK, I'm being a bit hyperbolic there ...) TOOWTDI is a really good principle -- we never should have added pathlib if we weren't going to try to make it as useful and standard as possible. felt to me awfully close to
"pathlib! it's the future!"
I know that's not what you said, or even meant, but I felt it was important to remind people that not everyone knows pathlib or finds its API clearer than the explicitly named functions of os.path.
no -- but it IS clearer an easier once we get all the common functionality in there, as opposed to having to poke around in os.path, os, and shutil for what you need. so I'll say, it even if Nathaniel didn't: pathlib! it's the future! :-) - CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 7 May 2018 at 20:44, Steven D'Aprano <steve@pearwood.info> wrote:
First I have to work out what __filepath__ is, then I have to remember the differences between all the various flavours of pathlib.<whatever>Path and suffer a moment or two of existential dread as I try to work out whether or not *this* specific flavour is the one I need. This might not matter for heavy users of pathlib, but for casual users, it's a big, intimidating API with:
- an important conceptual difference between pure paths and concrete paths; - at least six classes; - about 50 or so methods and properties
Right, but that's why I think this may primarily be a docs issue, as for simple use cases, only one pathlib class matters, and that's "pathlib.Path" (which is the appropriate concrete path type for the running platform), together with its alternate constructors "Path.cwd()" and "Path.home()". So if you spell out the OP's original example with pathlib instead of os.path, you get: from pathlib import Path SRC_DIR = Path(__file__).parent And then SRC_DIR is a rich path object that will mostly let you avoid importing any of: - os - os.path - stat - glob - fnmatch
As far as performance goes, I don't think it matters that we could technically make pathlib imported lazily. Many people put all their pathname manipulations at the beginning of their script, so lazy or not, the pathlib module is going to be loaded *just after* startup, .
It's the fnmatch and re module imports *inside* pathlib that may be worth making lazy, as those currently account for a reasonable chunk of the import time but are only used to implement PurePath.match and _WildcardSelector. That means making them lazy may allow folks to avoid those imports if they don't use any of the wildcard matching features.
For many scripts, this isn't going to matter, but for those who want to avoid the overhead of pathlib, making it lazy doesn't help. That just delays the overhead, it doesn't remove it.
That's fine - it's not uncommon for folks looking to minimise startup overhead to have to opt in to using a lower level API for exactly that reason. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Mon, May 7, 2018 at 8:44 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, May 06, 2018 at 09:33:03PM -0700, Nathaniel Smith wrote:
How is
data_path = __filepath__.parent / "foo.txt"
more distracting than
data_path = joinpath(dirname(__file__), "foo.txt")
Why are you dividing by a string? That's weird.
[looks up the pathlib docs]
Oh, that's why. It's still weird.
So yes, its very distracting.
Isn't it strange how we can divide a path by a string, and that works, and we can take the remainder after you divide a string by a string, and that works as long as there's exactly one "%s" in the string, but nobody's interested in having "foo bar spam ham"/" " ==> ["foo","bar","spam","ham"] ? Just sayin', it ain't all that strange. ChrisA
![](https://secure.gravatar.com/avatar/e6e28dcae5e3df0190e0760e96f7d8ab.jpg?s=120&d=mm&r=g)
On 2018-05-06 19:13, Nick Coghlan wrote:
Specifically, the ones I'd have in mind would be:
- dirname (aka os.path.dirname) - joinpath (aka os.path.join) - abspath (aka os.path.abspath) Yes, I end up importing those in most scripts currently. Just "join" has worked fine, although I could imagine someone getting confused about it.
-Mike
![](https://secure.gravatar.com/avatar/3b73b776444fa777acfa37bbdcff23fe.jpg?s=120&d=mm&r=g)
On Sun, May 6, 2018 at 9:30 PM, Mike Miller <python-ideas@mgmiller.net> wrote:
On 2018-05-06 19:13, Nick Coghlan wrote:
Specifically, the ones I'd have in mind would be:
- dirname (aka os.path.dirname) - joinpath (aka os.path.join) - abspath (aka os.path.abspath)
Yes, I end up importing those in most scripts currently. Just "join" has worked fine, although I could imagine someone getting confused about it.
Our homebuilt pre-pathlib package has an 'abs_path' parameter in join, so that could easily eliminate the abspath function itself:
joinpath('.', abs_path=True) <cwd>
![](https://secure.gravatar.com/avatar/4c01705256aa2160c1354790e8c154db.jpg?s=120&d=mm&r=g)
06.05.18 09:53, Yuval Greenfield пише:
I often need to reference a script's current directory. I end up writing:
import os SRC_DIR = os.path.dirname(__file__)
But I would prefer to have a new dunder for that. I propose: "__dir__". I was wondering if others would find it convenient to include such a shortcut.
Here are some examples of dirname(__file__) in prominent projects.
https://github.com/tensorflow/models/search?l=Python&q=dirname&type= https://github.com/django/django/search?l=Python&q=dirname&type= https://github.com/nose-devs/nose/search?l=Python&q=dirname&type=
Reasons not to add __dir__: * There already is one way to do it and it's clear and fairly short.. * Avoid the bikeshed discussion of __dir__, __folder__, and other candidates.
* Additional burden on maintainers of import machinery. It is already too complex, and __file__ is set in multiple places. Don't forgot about third-party implementations. See also issue33277: "Deprecate __loader__, __package__, __file__, and __cached__ on modules" (https://bugs.python.org/issue33277). * More complex user code, because you have to handle different cases: - __file__ is set, but __dir__ is not set. - __file__ and __dir__ are set, but are not consistent.
Reasons to add it:
Are you aware of importlib.resources? https://docs.python.org/3.7/whatsnew/3.7.html#importlib-resources
![](https://secure.gravatar.com/avatar/dd4761743695d5efd3692f2a3b35d37d.jpg?s=120&d=mm&r=g)
On Mon, May 7, 2018 at 7:14 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
* Additional burden on maintainers of import machinery. It is already too complex, and __file__ is set in multiple places. Don't forgot about third-party implementations.
See also issue33277: "Deprecate __loader__, __package__, __file__, and __cached__ on modules" (https://bugs.python.org/issue33277).
Thanks for mentioning all this, Serhiy. :) That said, it *may* be worth considering a method on ModuleSpec (aka "dir()['__spec__']"). One (rough) possibility: def dirname(self): """Return the absolute path to the directory the module is in. This will return None for modules that do not have __file__ or where "directory" does not make sense (e.g. extension modules). """ if self.origin is None: # XXX ...or self.origin isn't a filename. return None import os.path # This "lazy" import is necessary in this case. filename = os.path.abspath(self.origin) return os.path.dirname(filename) Putting this on the module spec has several advantages: 1. __spec__ is a single source of truth (though tied to how a module was "found" rather than to anything that happened when "loaded") 2. encourages folks to rely on __spec__ (where we'd like to head, as demonstrated by the issue Serhiy referenced above) 3. does not add any overhead to import performance (i.e. cost only incurred when needed) 4. does not add complexity to any other part of the import machinery I'm not necessarily saying we should add ModuleSpec.dirname(), but it (or something like it) is what I'd advocate for *if* we were to add a convenient shortcut to the directory a module is in. FWIW, I'd probably use it. -eric
![](https://secure.gravatar.com/avatar/4c01705256aa2160c1354790e8c154db.jpg?s=120&d=mm&r=g)
07.05.18 17:42, Eric Snow пише:
I'm not necessarily saying we should add ModuleSpec.dirname(), but it (or something like it) is what I'd advocate for *if* we were to add a convenient shortcut to the directory a module is in. FWIW, I'd probably use it.
The question is *why* you need the absolute path to the directory the module is in? Taking into account the availability of importlib.resources etc.
![](https://secure.gravatar.com/avatar/e8600d16ba667cc8d7f00ddc9f254340.jpg?s=120&d=mm&r=g)
On Mon, 7 May 2018 at 08:17 Serhiy Storchaka <storchaka@gmail.com> wrote:
07.05.18 17:42, Eric Snow пише:
I'm not necessarily saying we should add ModuleSpec.dirname(), but it (or something like it) is what I'd advocate for *if* we were to add a convenient shortcut to the directory a module is in. FWIW, I'd probably use it.
The question is *why* you need the absolute path to the directory the module is in? Taking into account the availability of importlib.resources etc.
And just "why", and "how often"? I'm sure we have all done it before, but it isn't something that comes up *constantly*. And duplicating part of the details what __spec__.location contains just to save an import and a line to strip off the file seems unnecessary. Plus, this doesn't take into consideration the fact that not every module is going to exist in a directory (e.g. what if I loaded from a sqlite database?). IOW I'm -1 on this addition to modules as I don't think it's difficult enough or used enough to warrant adding the overhead of providing it.
![](https://secure.gravatar.com/avatar/01aa7d6d4db83982a2f6dd363d0ee0f3.jpg?s=120&d=mm&r=g)
Yuval Greenfield wrote:
I often need to reference a script's current directory. I end up writing:
import os SRC_DIR = os.path.dirname(__file__)
The question I have is, why do you want to reference the script's current directory? If the answer is to access other files in that directory, then please consider using importlib.resources (for Python 3.7) and importlib_resources (for Python 2.7, 3.4-3.6). __file__ simply isn't safe, and pkg_resources can be a performance killer. The problem of course is that if you're writing an application and *any* of your dependencies use either technique, you are going to pay for it. This is exactly why Brett and I wrote importlib.resources. We wanted a consistent API, that allows custom loaders to play along, and which is about as efficient as possible, uses Python's import machinery, and is safe for uses like zipapps. now-you-don't-have-to-attend-my-pycon-talk-ly y'rs, -Barry
![](https://secure.gravatar.com/avatar/75885e6fcc2e500ce9fd41f021bf5d1c.jpg?s=120&d=mm&r=g)
While importlib.resources looks very good, I'm certain that it can't replace every use of __file__ for accessing files relative to your Python code. Consider a mini-Web-server written in Python (there are, of course, lots of these) that needs to serve static files. Users of the Web server will expect to be able to place these static files somewhere relative to the directory their code is in, because the files are version-controlled along with the code. If you make developers configure an absolute path, they'll probably use __file__ anyway to get that path, so that it works on more systems than their own without an installer or a layer of configuration management. If I understand the importlib.resources documentation, it won't give you a way of accessing your static files directory unless you place an '__init__.py' file in each subdirectory, and convert conventional locations such as "assets/css/main.css" into path(mypackage.assets.css, 'main.css'). That's already a bit awkward. But do you even want __init__.py to be in your static directory? Even if you tell the mini-server to ignore __init__.py, when you upgrade to a production-ready server like Nginx and point it at the same directory, it won't know anything about this and it'll serve your __init__.py files as static files, leaking details of your system. So you probably wouldn't do this. This is one example; there are other examples of non-Python directories that you need to be able to access from Python code, where adding a file named __init__.py to the directory would cause undesired changes in behavior. Again, importlib.resources is a good idea. I will look into using it in the cases where it applies. But the retort of "well, you shouldn't be using __file__" doesn't hold up when sometimes you do need to use __file__, and there's no universal replacement for it. (Also, every Python programmer I've met who's faced with the decision would choose "well, we need to use __file__, so don't zip things" over "well, we need to zip things, so don't use __file__". Yes, it's bad that Python programmers even have to make this choice, and then on top of that they make the un-recommended choice, but that's how things are.) On Mon, 7 May 2018 at 22:09 Barry Warsaw <barry@python.org> wrote:
Yuval Greenfield wrote:
I often need to reference a script's current directory. I end up writing:
import os SRC_DIR = os.path.dirname(__file__)
The question I have is, why do you want to reference the script's current directory?
If the answer is to access other files in that directory, then please consider using importlib.resources (for Python 3.7) and importlib_resources (for Python 2.7, 3.4-3.6).
__file__ simply isn't safe, and pkg_resources can be a performance killer. The problem of course is that if you're writing an application and *any* of your dependencies use either technique, you are going to pay for it. This is exactly why Brett and I wrote importlib.resources. We wanted a consistent API, that allows custom loaders to play along, and which is about as efficient as possible, uses Python's import machinery, and is safe for uses like zipapps.
now-you-don't-have-to-attend-my-pycon-talk-ly y'rs, -Barry
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
![](https://secure.gravatar.com/avatar/01aa7d6d4db83982a2f6dd363d0ee0f3.jpg?s=120&d=mm&r=g)
On May 15, 2018, at 14:03, Rob Speer <rspeer@luminoso.com> wrote:
Consider a mini-Web-server written in Python (there are, of course, lots of these) that needs to serve static files. Users of the Web server will expect to be able to place these static files somewhere relative to the directory their code is in, because the files are version-controlled along with the code. If you make developers configure an absolute path, they'll probably use __file__ anyway to get that path, so that it works on more systems than their own without an installer or a layer of configuration management.
You don’t need an absolute path, since you don’t pass file system paths to importlib.resources, and even if you relative import a module, you can pass that module to the APIs and it will still work, since the loaders know where they got the modules from.
If I understand the importlib.resources documentation, it won't give you a way of accessing your static files directory unless you place an '__init__.py' file in each subdirectory, and convert conventional locations such as "assets/css/main.css" into path(mypackage.assets.css, 'main.css’).
That is correct. Note that we’re not necessarily saying that we won’t add hierarchical path support to the `resource` attributes of the various APIs, but they do complicate the semantics and implementation. It’s also easier to add features if the use cases warrant, than remove features that are YAGNI.
That's already a bit awkward. But do you even want __init__.py to be in your static directory? Even if you tell the mini-server to ignore __init__.py, when you upgrade to a production-ready server like Nginx and point it at the same directory, it won't know anything about this and it'll serve your __init__.py files as static files, leaking details of your system. So you probably wouldn't do this.
Are you saying that servers like Nginx or whatever your mini-server uses don’t have a way to blanket ignore files? That would surprise me, and it seems like a lurking security vulnerability regardless of importlib.resources or __init__.py files. I would think that you’d want to whitelist file extensions, and that `.py` would not be in that list. Is this a problem you’ve actually encountered or is it theoretical?
This is one example; there are other examples of non-Python directories that you need to be able to access from Python code, where adding a file named __init__.py to the directory would cause undesired changes in behavior.
Can you provide more examples?
Again, importlib.resources is a good idea. I will look into using it in the cases where it applies. But the retort of "well, you shouldn't be using __file__" doesn't hold up when sometimes you do need to use __file__, and there's no universal replacement for it.
(Also, every Python programmer I've met who's faced with the decision would choose "well, we need to use __file__, so don't zip things" over "well, we need to zip things, so don't use __file__". Yes, it's bad that Python programmers even have to make this choice, and then on top of that they make the un-recommended choice, but that's how things are.)
We certainly see a ton of __file__ usage, but I’m not sure whether it’s the case because most developers aren’t aware of the implications, don’t know of the alternatives, or just use the simplest thing possible. Using __file__ in your application, personal web service, or private library is fine. The problem is exacerbated when you use __file__ in your publicly released libraries, because not only can’t *you* use them in zip files, but nothing that depends on your library can use zip files. Given how popular pex is (and hopefully shiv will be), that will cause pain up the Python food chain, and it may mean that other people won’t be able to use your library. It’s certainly a trade-off, but it’s important to keep this in mind. If hierarchical resource paths are important to you, I invite you to submit an issue to our GitLab project: https://gitlab.com/python-devs/importlib_resources/issues Cheers, -Barry
![](https://secure.gravatar.com/avatar/75885e6fcc2e500ce9fd41f021bf5d1c.jpg?s=120&d=mm&r=g)
Are you saying that servers like Nginx or whatever your mini-server uses don’t have a way to blanket ignore files? That would surprise me, and it seems like a lurking security vulnerability regardless of importlib.resources or __init__.py files. I would think that you’d want to whitelist file extensions, and that `.py` would not be in that list.
From what I can tell, if you wanted to exclude '__init__.py' from Nginx in
"Whitelisting file extensions" is very uncommon. You just put the files you intend to serve in your static directory, and don't put the files you don't intend to serve there. Mixing code and static data is usually seen as a sign of muddy PHP-like thinking. particular, you would have to write an unconventional Nginx configuration, where you determine whether a path refers to a static file according to a regex that excludes things that end in '__init__.py'. Anything is possible, but this would be a significant discouragement to using importlib. In practice, Flask's built-in server has its own logic about where to find files (which doesn't involve importlib, and I don't know what it actually does). Tornado appears to ask for an absolute path, so users mostly use __file__ to discover that path.
Is this a problem you’ve actually encountered or is it theoretical?
I had a situation where I wanted to have files that were both served by Flask as static files, and resources that I could load in my tests. Making this work with pkg_resources took a few tries. It sounds like importlib won't really improve the situation. On Tue, 15 May 2018 at 16:30 Barry Warsaw <barry@python.org> wrote:
On May 15, 2018, at 14:03, Rob Speer <rspeer@luminoso.com> wrote:
Consider a mini-Web-server written in Python (there are, of course, lots of these) that needs to serve static files. Users of the Web server will expect to be able to place these static files somewhere relative to the directory their code is in, because the files are version-controlled along with the code. If you make developers configure an absolute path, they'll probably use __file__ anyway to get that path, so that it works on more systems than their own without an installer or a layer of configuration management.
You don’t need an absolute path, since you don’t pass file system paths to importlib.resources, and even if you relative import a module, you can pass that module to the APIs and it will still work, since the loaders know where they got the modules from.
If I understand the importlib.resources documentation, it won't give you a way of accessing your static files directory unless you place an '__init__.py' file in each subdirectory, and convert conventional locations such as "assets/css/main.css" into path(mypackage.assets.css, 'main.css’).
That is correct. Note that we’re not necessarily saying that we won’t add hierarchical path support to the `resource` attributes of the various APIs, but they do complicate the semantics and implementation. It’s also easier to add features if the use cases warrant, than remove features that are YAGNI.
That's already a bit awkward. But do you even want __init__.py to be in your static directory? Even if you tell the mini-server to ignore __init__.py, when you upgrade to a production-ready server like Nginx and point it at the same directory, it won't know anything about this and it'll serve your __init__.py files as static files, leaking details of your system. So you probably wouldn't do this.
Are you saying that servers like Nginx or whatever your mini-server uses don’t have a way to blanket ignore files? That would surprise me, and it seems like a lurking security vulnerability regardless of importlib.resources or __init__.py files. I would think that you’d want to whitelist file extensions, and that `.py` would not be in that list.
Is this a problem you’ve actually encountered or is it theoretical?
This is one example; there are other examples of non-Python directories that you need to be able to access from Python code, where adding a file named __init__.py to the directory would cause undesired changes in behavior.
Can you provide more examples?
Again, importlib.resources is a good idea. I will look into using it in the cases where it applies. But the retort of "well, you shouldn't be using __file__" doesn't hold up when sometimes you do need to use __file__, and there's no universal replacement for it.
(Also, every Python programmer I've met who's faced with the decision would choose "well, we need to use __file__, so don't zip things" over "well, we need to zip things, so don't use __file__". Yes, it's bad that Python programmers even have to make this choice, and then on top of that they make the un-recommended choice, but that's how things are.)
We certainly see a ton of __file__ usage, but I’m not sure whether it’s the case because most developers aren’t aware of the implications, don’t know of the alternatives, or just use the simplest thing possible.
Using __file__ in your application, personal web service, or private library is fine. The problem is exacerbated when you use __file__ in your publicly released libraries, because not only can’t *you* use them in zip files, but nothing that depends on your library can use zip files. Given how popular pex is (and hopefully shiv will be), that will cause pain up the Python food chain, and it may mean that other people won’t be able to use your library.
It’s certainly a trade-off, but it’s important to keep this in mind.
If hierarchical resource paths are important to you, I invite you to submit an issue to our GitLab project:
https://gitlab.com/python-devs/importlib_resources/issues
Cheers, -Barry
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
![](https://secure.gravatar.com/avatar/3b73b776444fa777acfa37bbdcff23fe.jpg?s=120&d=mm&r=g)
On Tue, May 15, 2018 at 10:11 PM Rob Speer <rspeer@luminoso.com> wrote:
From what I can tell, if you wanted to exclude '__init__.py' from Nginx in particular, you would have to write an unconventional Nginx configuration, where you determine whether a path refers to a static file according to a regex that excludes things that end in '__init__.py'. Anything is possible, but this would be a significant discouragement to using importlib.
Ok, I haven't dug into the details ("It should be easy!" :) ), but couldn't you implement a Finder that based its search on, say, 'data.toc' instead of '__init__.py' and graft it into importlib.resources?
![](https://secure.gravatar.com/avatar/bf9ecde8f5e286de6ce5c80206cf9dd6.jpg?s=120&d=mm&r=g)
There are very few programs that never use any path operation. Opening a file is such a common one we have a built-in for it with open(), but you usually need to do some manipulation to get the file path in the first place. We have __file__, but the most common usage is to get the parent dir, with os or pathlib. Websites open static files and configurations file. GUI open files to be processed. Data processing open data source files. Terminal apps often pass files as a parameters. All those paths you may resolve, turn absolute, check against and so on. So much that pathlib.Path is one of the things I always put in a PYTHONSTARTUP since you need it so often. I think Path fits the bill for being a built-in, I feel it's used more often than any/all or zip, and maybe enumerate. Besides, it would help to make people use it, as I regularly meet dev that keep import os.path because of habits, tutorials, books, docs, etc.
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
Sorry for the top-post — iPhone email sucks. But: in regard to the whole “what paths to use to find resource files” issue: The “current working directory” concept can be very helpful. You put your files in a directory tree somewhere— could be inside the package, could be anywhere else. Then all paths in your app are relative to the root of that location. So all your app needs to do is set the cwd on startup, and you’re good to go. So you may use __file__ once at startup (or not, depending on configuration settings) Alternatively, in the simple web server example, you have a root path that gets tacked on automatically in you app, so again, you use relative paths everywhere. The concept of non-python-code resources being accessible within a package is really a separate issue than generic data files, etc. that you may want to access and serve different way. In short, if you have a collection of files that you want to access from Python, and also might want to serve up with another application— you don’t want to use a python resources system. Now I’m a bit confused about the topic of the thread, but I do like the idea of putting Path in a more accessible place. ( though a bit concerned about startup time if it were a built in) -CHB Sent from my iPhone
On Jun 5, 2018, at 6:30 AM, Michel Desmoulin <desmoulinmichel@gmail.com> wrote:
There are very few programs that never use any path operation.
Opening a file is such a common one we have a built-in for it with open(), but you usually need to do some manipulation to get the file path in the first place.
We have __file__, but the most common usage is to get the parent dir, with os or pathlib.
Websites open static files and configurations file.
GUI open files to be processed.
Data processing open data source files.
Terminal apps often pass files as a parameters.
All those paths you may resolve, turn absolute, check against and so on. So much that pathlib.Path is one of the things I always put in a PYTHONSTARTUP since you need it so often.
I think Path fits the bill for being a built-in, I feel it's used more often than any/all or zip, and maybe enumerate.
Besides, it would help to make people use it, as I regularly meet dev that keep import os.path because of habits, tutorials, books, docs, etc. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Tue, Jun 05, 2018 at 03:30:35PM +0200, Michel Desmoulin wrote:
There are very few programs that never use any path operation.
On the contrary, there are many programs than never use any path operations. I have many programs which take input and provide output and no files are involved at all. Of course path manipulation is common. But hyperbola about how common it is does not help your case.
Opening a file is such a common one we have a built-in for it with open(), but you usually need to do some manipulation to get the file path in the first place. We have __file__, but the most common usage is to get the parent dir, with os or pathlib.
Parent directory of what? Are you talking about the parent directory of the script? I almost never care about the script directory. I sometimes care about file names passed in by the user, and maybe ten percent of the time I care about the parent directory of those file names. I sometimes care about the current working directory. But I can't think of the last time I've cared about __file__. In my experience, that's an uncommon need. You keep making absolute claims about what is "most common". What is your evidence for these absolute claims? Have you done a survey of all the Python software in existence? Or do what you mean is that this is *your* most common usage? Because it isn't *my* most common usage. [...]
So much that pathlib.Path is one of the things I always put in a PYTHONSTARTUP since you need it so often.
Please don't speak for me. I don't need it at all, and even if I did, putting it in *your* startup file doesn't help me.
I think Path fits the bill for being a built-in, I feel it's used more often than any/all or zip, and maybe enumerate.
This is a quick and dirty survey of my code: [steve@ando python]$ grep Path *.py */*.py */*/*.py | wc -l 21 [steve@ando python]$ grep "enumerate(" *.py */*.py */*/*.py | wc -l 307 [steve@ando python]$ grep "zip(" *.py */*.py */*/*.py | wc -l 499 [steve@ando python]$ grep "any(" *.py */*.py */*/*.py | wc -l 96 [steve@ando python]$ grep "all(" *.py */*.py */*/*.py | wc -l 224 So I would say that Path is used about 25 times less often than zip, and I wouldn't consider zip to be an essential builtin. I use math.sqrt about 15 times more often than Path.
Besides, it would help to make people use it, as I regularly meet dev that keep import os.path because of habits, tutorials, books, docs, etc.
Why do you want to *make* people use it? Why shouldn't people use os.path if it meets their needs? -- Steve
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Tue, Jun 5, 2018 at 4:42 PM, Steven D'Aprano <steve@pearwood.info> wrote:
This is a quick and dirty survey of my code:
[steve@ando python]$ grep Path *.py */*.py */*/*.py | wc -l 21 [steve@ando python]$ grep "enumerate(" *.py */*.py */*/*.py | wc -l 307 [steve@ando python]$ grep "zip(" *.py */*.py */*/*.py | wc -l 499 [steve@ando python]$ grep "any(" *.py */*.py */*/*.py | wc -l 96 [steve@ando python]$ grep "all(" *.py */*.py */*/*.py | wc -l 224
I"m not saying I agree with the OP, but this is not a fair comparison at all -- Path is pretty new, and even newer is it functional with most of teh stdlib. I do a lot of path manipulations in my code, but hardly ever use Path -- nly brand new code uses it. so I think you'd need to grep for os.path (and probably shutil, too) to get a meaningful answer. But key here is that there is no consensus that Path is the new "obvious way to do it", and adding it to builtins would be essentially making that statement. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Tue, Jun 05, 2018 at 11:23:55PM -0700, Chris Barker wrote:
On Tue, Jun 5, 2018 at 4:42 PM, Steven D'Aprano <steve@pearwood.info> wrote:
This is a quick and dirty survey of my code: [snip grepping] I"m not saying I agree with the OP, but this is not a fair comparison at all -- Path is pretty new, and even newer is it functional with most of teh stdlib.
I do a lot of path manipulations in my code, but hardly ever use Path -- nly brand new code uses it.
so I think you'd need to grep for os.path (and probably shutil, too) to get a meaningful answer.
Why? The OP isn't asking for os.path and shutil to be builtins. The OP's statement wasn't "file manipulations of any sort, using any technique including Path, os.path, shutil and string processing, is more common than enumerate etc". (For *my own code* I'd disagree with that claim too, but other's experience may vary.) It was specifically that Path was more common than enumerate. Maybe it is for him, but that isn't a universal fact.
But key here is that there is no consensus that Path is the new "obvious way to do it", and adding it to builtins would be essentially making that statement.
Indeed. I think there are at least three hurdles to overcome before Path could become a builtin: - concensus, or at least a BDFL ruling, that path manipulation is important enough to be a builtin. (If we're voting, I'd rather have sqrt as a builtin. But maybe that's just me :-) - agreement that Path is the One Obvious Way that should be officially promoted over os.path; - and determination that making Path a builtin would not cause an excessive or onerous burden on the core developers; - or a serious regression in interpreter startup. (pathlib is a reasonably big library, over 1000 LOC, which relies on over a dozen other modules.) -- Steve
![](https://secure.gravatar.com/avatar/b26d5579ba992e23ebccfca44bdfd093.jpg?s=120&d=mm&r=g)
For the startup time, you could keep it around as builtin but save the import time until someone actually uses it. While I agree sqrt should be a builtin as well, I think there's a good argument to be made for Path to. I just switched to it the past month, and im liking it a lot over constructs like (real code example): os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", "filename"). (could probably be inproved by changing to __path__ and removing the dirname ? but the current version works...) sqrt isn't as much used in situations I've been in - and when it was, I generally got a giant heap of data to process and was doing that with numpy anyway.
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Wed, Jun 6, 2018 at 7:51 PM, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
For the startup time, you could keep it around as builtin but save the import time until someone actually uses it.
That would mean creating a system of lazy imports, which is an entirely separate proposal. ChrisA
![](https://secure.gravatar.com/avatar/b26d5579ba992e23ebccfca44bdfd093.jpg?s=120&d=mm&r=g)
2018-06-06 14:51 GMT+02:00 Chris Angelico <rosuav@gmail.com>:
On Wed, Jun 6, 2018 at 7:51 PM, Jacco van Dorp <j.van.dorp@deonet.nl> wrote:
For the startup time, you could keep it around as builtin but save the import time until someone actually uses it.
That would mean creating a system of lazy imports, which is an entirely separate proposal.
ChrisA
It's that complicated ? I know it's not exactly properties on a class, but I thought there were other cases, even if I couldn't name one. Dont mind me, then.
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
For the startup time, you could keep it around as builtin but save the import time until someone actually uses it.
That would mean creating a system of lazy imports, which is an entirely separate proposal.
It's that complicated ? I know it's not exactly properties on a class, but I thought there were other cases, even if I couldn't name one. Dont mind me, then.
It wouldn’t be THAT hard to wrote lazy-import code for pathlib. But there has been a lot of discussion lately about Python startup time. One approach is to create a lazy-import system that could be generally used to help startup time. So I expect that an expensive to import built in will not get added unless that problem is generically solved. And as for Steven’s other points: There has been a fair bit of discussion here and on Python-dev about pathlib. The fact is that it is still not ready to be a full featured replacement for os.path, etc. And a number of core devs aren’t all that interested in it becoming the “one obvious way”. So I think we are no where near it becoming a built in. But if you like it, you can help the efforts to make it even more useful, which would be good in itself, but is also the Path (pun intended) to making it the “one obvious way”. If it’s useful enough, people will use it, even if the have to import it. There was a recent thread about adding functionality to the Oath object that seems to have petered out— maybe contribute to that effort? One more point: A major step in making pathlib useful was adding the __path__ protocol, and then adding support for it in most (all) of the standard library. Another step would be to make any paths in the stdlib (such as __file__) Path objects (as suggested in this thread) but that would bring up the startup costs problem. I wonder if a Path-lite with the core functionality, but less startup cost, would be useful here? -CHB
![](https://secure.gravatar.com/avatar/b68cda4e0d04e1b966cfa5657bbec53d.jpg?s=120&d=mm&r=g)
I assume the the idea is that everybody has Path available without the need to do the import dance first. If its for personal convenience you can always do this trick, that is used by gettext to make _ a builtin. import pathlib import builtings builtins.__dict__['Path'] = pathlib.Path Now Path *is* a builtin for the rest of the code. Barry
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Wed, Jun 06, 2018 at 07:05:35PM +0100, Barry Scott wrote:
I assume the the idea is that everybody has Path available without the need to do the import dance first.
If its for personal convenience you can always do this trick, that is used by gettext to make _ a builtin.
import pathlib import builtings
builtins.__dict__['Path'] = pathlib.Path
The public API for getting the namespace of an object is vars(): vars(builtins)['Path'] but since builtins is just a module, the best way to add a new attribute to it is: builtins.Path = pathlib.Path -- Steve
participants (20)
-
Barry Scott
-
Barry Warsaw
-
Brendan Barnwell
-
Brett Cannon
-
Chris Angelico
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Cody Piersall
-
Eric Fahlgren
-
Eric Snow
-
George Fischhof
-
Jacco van Dorp
-
Michel Desmoulin
-
Mike Miller
-
Nathaniel Smith
-
Nick Coghlan
-
Rob Speer
-
Serhiy Storchaka
-
Steven D'Aprano
-
Yuval Greenfield