Re: [Python-ideas] PEP 428 - object-oriented filesystem paths

Le samedi 13 octobre 2012 à 19:47 +1000, Nick Coghlan a écrit :
The question is: why do you want to do that? I know there are a limited bunch of special cases where Posix filesystem paths may be case-insensitive, but nobody really cares about them today, and I don't expect many people to bother tomorrow. Playing with individual parameters of path semantics sounds like a theoretical bother more than a practical one. A possibility would be to expose the Flavour classes, which until now are an internal implementation detail. That would first imply better defining their API, though. Then people could write e.g.: class PosixCaseInsensitiveFlavour(pathlib.PosixFlavour): case_sensitive = False class MyPath(pathlib.PosixPath): flavour = PosixCaseInsensitiveFlavour() But I would consider it extra icing on the cake, not a requirement for a Path API. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On 2012-10-13 12:06, Antoine Pitrou wrote:
If you want do that, and that is a big if, it might be better to give keywords arguments to Path(), so that the class signature would look like: class Path: def __init__(self, *args, sep=os.path.sep, casesensitive=os.path.casesensitive, expanduser=False)... This will make PosixPath and WindowsPath a partial class with certain keywords arguments filled in. Notice that os.path.casesensitive is not (yet) present in Python. Regards, TB

On Sat, Oct 13, 2012 at 8:06 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
It's a useful trick for writing genuinely cross-platform code: when I'm writing cross-platform code on *nix, I want my paths to behave like posix paths in every respect *except* I want them to complain somehow if any of my names only differ by case. I've been burnt in the past by checking in conflicting names on a Linux machine and then wondering why the Windows checkouts were broken. The only real way to deal with that is to avoid relying on filesystem case sensitivity for correct behaviour of your application, even when the underlying OS *permits* case sensitivity. This becomes even *more* important if NFS and CIFS filesystems are being shared between *nix and Windows systems, but it applies any time a file system may be shared (e.g. creating archive files, checking in to a source control system, etc). I have the luxury right now of only needing to care about Linux systems, but I've had to deal with the mess in the past and "act case insensitive everywhere" is the only sanity preserving option. Python itself deals with this mostly via the stylistic rule of "always use lowercase module and package names", but it would be nice if a new path abstraction allowed the problem to be handled *properly*. On the Windows side, it would be nice to be able to request the use of "/" as the directory separator when converting to a string. Using "\" has the potential to cause interoperability problems (e.g. with regular expressions). If you don't like the implicit nature of contexts (a perfectly reasonable complaint), then I suggest going for an explicit strategy pattern with flavours rather than requiring classes. With this approach, the flavour would be specified on a *per-instance* basis (with the default behaviour being determined by the OS). The main class hierarchy would just be PurePath <-- Path and there would be a separate PathFlavor ABC with PosixFlavor and WindowsFlavor subclasses (public Python stdlib APIs generally follow US spelling and drop the 'u'). The main classes would then *delegate* the flavour dependent operations like parsing, conversion to a string and equality comparisons to the flavour objects. It's really the public use of the strategy pattern that prevents the combinatorial explosion - you can just have a single OS-based default (as is already the case with PurePath.__new__ and Path.__new__ playing type selection games), rather than allowing the default to be configured per thread. The decimal-style thread-based dynamic contexts are more useful when you want to change the behaviour *without* either copying or mutating objects, which I agree is overkill for path manipulation. Since pathlib already uses the Flavor objects as strategies internally, it should just be a matter of switching from the use of inheritance to specify the flavour to using a keyword-only argument in the constructor. The "case-insensitive posix path" example would then look like: class PosixCaseInsensitiveFlavor(pathlib.PosixFlavor): case_sensitive = False def my_path(*args): return Path(*args, flavor=PosixCaseInsensitiveFlavor) You can add as many new flavours as you want, and it's only one class per flavour rather than up to 3 (the flavour itself, the pure variant and the concrete variant). This class hierarchy is also more amenable to the introduction of MutablePath as a second subclass of PurePath - a path variant with mutable properties still sounds potentially attractive to me (over a wide variety of return-a-modified-copy methods for various cases). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 13 October 2012 16:37, Nick Coghlan <ncoghlan@gmail.com> wrote:
I don't disagree with your points, but I want to point out that IO is something Python has to make *really basic* because it's one of the first things newbies use, and Python is a newbie-friendly language. If you're recommending flavours and whatnot, I recommend you do it in a way that makes it very much optional and not at all the direct focus of the docs. The nice thing about the class idea for the uninitiated was that there were only two options, and newbies only ever had one obvious choice. Contexts using "with", I think, seem newbie-friendly too. So does having default flavours and then an “expert”'s option to override default classes in possibly a sub-module. I'm no expert, but I think it's worth bearing in mind.

Le dimanche 14 octobre 2012 à 01:37 +1000, Nick Coghlan a écrit :
But that's not cross-platform. Under Windows you must also care about reserved files (CON, NUL, etc.). Also, you can create Posix filenames with backslashes in them, but under Windows they will be treated as directory separators. Mercurial learnt this the hard way: http://selenic.com/repo/hg-stable/file/605fe310691f/mercurial/store.py#l124
The PEP mentions the .as_posix() method, which does exactly that. (use of regular expressions on whole paths sounds like a weird idea, but hey :-))
If you s/would/could/, I have nothing against it, but I certainly don't understand why you dislike the approach of providing dedicated classes *by default*. IMO, having separate classes is simpler to use, easier to type, more discoverable (using pydoc or help() or tab-completion at the prompt), and it has an educational value that a keyword-only "flavour" argument doesn't have.
Which they already do :) Here is the code: class PurePosixPath(PurePath): _flavour = _posix_flavour __slots__ = () class PureNTPath(PurePath): _flavour = _nt_flavour __slots__ = () (https://bitbucket.org/pitrou/pathlib/src/f6df458aaa89/pathlib.py?at=default#...)
Not only overkill, but incorrect and dangerous!
Yes, you can. That doesn't preclude offering separate classes by default, though :-)
I'm very cold on offering both mutable on non-mutable paths. That's just complicated and confusing. Since an immutable type is very desireable for use in associative containers, I think immutability is the right choice. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sun, Oct 14, 2012 at 2:28 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Factory functions would make more sense to me than separate classes - they're not really a different type, they're the same type using a different strategy for the OS dependent bits.
Sure, if we're only offering one of them, then immutable is definitely the right choice. However, I think this is analogous to the bytes vs bytearray distinction - while bytes objects are more useful in general, using the mutable bytearray when appropriate is vastly superior to slicing and copying bytes objects. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, 14 Oct 2012 02:52:18 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
I find them less helpful. isinstance() calls won't work. Deriving won't work. It makes things a bit more opaque. However, we are definitely talking about a secondary style issue. (note how the threading module moved away from factory functions to regular classes :-))
bytearray was only added after a lot of experience with the 2.x str type. I don't think we should add a mutable path API before significant experience has been gathered about the cost and performance-criticality of path manipulation operations. Offering both mutable and immutable types makes learning the API harder for beginners ("which type should I use? what happens when I combine them?"). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Nick Coghlan wrote:
I don't see how this problem can be solved purely by adjusting path object behaviour. What you want is to get a complaint whenever you try to create a file in a directory that already contains another name that is case-insensitively equal. That would have to be built into the file system access functions. -- Greg

On 2012-10-13 12:06, Antoine Pitrou wrote:
If you want do that, and that is a big if, it might be better to give keywords arguments to Path(), so that the class signature would look like: class Path: def __init__(self, *args, sep=os.path.sep, casesensitive=os.path.casesensitive, expanduser=False)... This will make PosixPath and WindowsPath a partial class with certain keywords arguments filled in. Notice that os.path.casesensitive is not (yet) present in Python. Regards, TB

On Sat, Oct 13, 2012 at 8:06 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
It's a useful trick for writing genuinely cross-platform code: when I'm writing cross-platform code on *nix, I want my paths to behave like posix paths in every respect *except* I want them to complain somehow if any of my names only differ by case. I've been burnt in the past by checking in conflicting names on a Linux machine and then wondering why the Windows checkouts were broken. The only real way to deal with that is to avoid relying on filesystem case sensitivity for correct behaviour of your application, even when the underlying OS *permits* case sensitivity. This becomes even *more* important if NFS and CIFS filesystems are being shared between *nix and Windows systems, but it applies any time a file system may be shared (e.g. creating archive files, checking in to a source control system, etc). I have the luxury right now of only needing to care about Linux systems, but I've had to deal with the mess in the past and "act case insensitive everywhere" is the only sanity preserving option. Python itself deals with this mostly via the stylistic rule of "always use lowercase module and package names", but it would be nice if a new path abstraction allowed the problem to be handled *properly*. On the Windows side, it would be nice to be able to request the use of "/" as the directory separator when converting to a string. Using "\" has the potential to cause interoperability problems (e.g. with regular expressions). If you don't like the implicit nature of contexts (a perfectly reasonable complaint), then I suggest going for an explicit strategy pattern with flavours rather than requiring classes. With this approach, the flavour would be specified on a *per-instance* basis (with the default behaviour being determined by the OS). The main class hierarchy would just be PurePath <-- Path and there would be a separate PathFlavor ABC with PosixFlavor and WindowsFlavor subclasses (public Python stdlib APIs generally follow US spelling and drop the 'u'). The main classes would then *delegate* the flavour dependent operations like parsing, conversion to a string and equality comparisons to the flavour objects. It's really the public use of the strategy pattern that prevents the combinatorial explosion - you can just have a single OS-based default (as is already the case with PurePath.__new__ and Path.__new__ playing type selection games), rather than allowing the default to be configured per thread. The decimal-style thread-based dynamic contexts are more useful when you want to change the behaviour *without* either copying or mutating objects, which I agree is overkill for path manipulation. Since pathlib already uses the Flavor objects as strategies internally, it should just be a matter of switching from the use of inheritance to specify the flavour to using a keyword-only argument in the constructor. The "case-insensitive posix path" example would then look like: class PosixCaseInsensitiveFlavor(pathlib.PosixFlavor): case_sensitive = False def my_path(*args): return Path(*args, flavor=PosixCaseInsensitiveFlavor) You can add as many new flavours as you want, and it's only one class per flavour rather than up to 3 (the flavour itself, the pure variant and the concrete variant). This class hierarchy is also more amenable to the introduction of MutablePath as a second subclass of PurePath - a path variant with mutable properties still sounds potentially attractive to me (over a wide variety of return-a-modified-copy methods for various cases). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 13 October 2012 16:37, Nick Coghlan <ncoghlan@gmail.com> wrote:
I don't disagree with your points, but I want to point out that IO is something Python has to make *really basic* because it's one of the first things newbies use, and Python is a newbie-friendly language. If you're recommending flavours and whatnot, I recommend you do it in a way that makes it very much optional and not at all the direct focus of the docs. The nice thing about the class idea for the uninitiated was that there were only two options, and newbies only ever had one obvious choice. Contexts using "with", I think, seem newbie-friendly too. So does having default flavours and then an “expert”'s option to override default classes in possibly a sub-module. I'm no expert, but I think it's worth bearing in mind.

Le dimanche 14 octobre 2012 à 01:37 +1000, Nick Coghlan a écrit :
But that's not cross-platform. Under Windows you must also care about reserved files (CON, NUL, etc.). Also, you can create Posix filenames with backslashes in them, but under Windows they will be treated as directory separators. Mercurial learnt this the hard way: http://selenic.com/repo/hg-stable/file/605fe310691f/mercurial/store.py#l124
The PEP mentions the .as_posix() method, which does exactly that. (use of regular expressions on whole paths sounds like a weird idea, but hey :-))
If you s/would/could/, I have nothing against it, but I certainly don't understand why you dislike the approach of providing dedicated classes *by default*. IMO, having separate classes is simpler to use, easier to type, more discoverable (using pydoc or help() or tab-completion at the prompt), and it has an educational value that a keyword-only "flavour" argument doesn't have.
Which they already do :) Here is the code: class PurePosixPath(PurePath): _flavour = _posix_flavour __slots__ = () class PureNTPath(PurePath): _flavour = _nt_flavour __slots__ = () (https://bitbucket.org/pitrou/pathlib/src/f6df458aaa89/pathlib.py?at=default#...)
Not only overkill, but incorrect and dangerous!
Yes, you can. That doesn't preclude offering separate classes by default, though :-)
I'm very cold on offering both mutable on non-mutable paths. That's just complicated and confusing. Since an immutable type is very desireable for use in associative containers, I think immutability is the right choice. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sun, Oct 14, 2012 at 2:28 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Factory functions would make more sense to me than separate classes - they're not really a different type, they're the same type using a different strategy for the OS dependent bits.
Sure, if we're only offering one of them, then immutable is definitely the right choice. However, I think this is analogous to the bytes vs bytearray distinction - while bytes objects are more useful in general, using the mutable bytearray when appropriate is vastly superior to slicing and copying bytes objects. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, 14 Oct 2012 02:52:18 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
I find them less helpful. isinstance() calls won't work. Deriving won't work. It makes things a bit more opaque. However, we are definitely talking about a secondary style issue. (note how the threading module moved away from factory functions to regular classes :-))
bytearray was only added after a lot of experience with the 2.x str type. I don't think we should add a mutable path API before significant experience has been gathered about the cost and performance-criticality of path manipulation operations. Offering both mutable and immutable types makes learning the API harder for beginners ("which type should I use? what happens when I combine them?"). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Nick Coghlan wrote:
I don't see how this problem can be solved purely by adjusting path object behaviour. What you want is to get a complaint whenever you try to create a file in a directory that already contains another name that is case-insensitively equal. That would have to be built into the file system access functions. -- Greg
participants (5)
-
Antoine Pitrou
-
Greg Ewing
-
Joshua Landau
-
Nick Coghlan
-
T.B.