PEP 428 - object-oriented filesystem paths

Hello, This PEP is a resurrection of the idea of having object-oriented filesystem paths in the stdlib. It comes with a general API proposal as well as a specific implementation (*). The implementation is young and discussion is quite open. (*) http://pypi.python.org/pypi/pathlib/ Regards Antoine. PS: You can all admire my ASCII-art skills. PEP: 428 Title: The pathlib module -- object-oriented filesystem paths Version: $Revision$ Last-Modified: $Date Author: Antoine Pitrou <solipsis@pitrou.net> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30-July-2012 Python-Version: 3.4 Post-History: Abstract ======== This PEP proposes the inclusion of a third-party module, `pathlib`_, in the standard library. The inclusion is proposed under the provisional label, as described in :pep:`411`. Therefore, API changes can be done, either as part of the PEP process, or after acceptance in the standard library (and until the provisional label is removed). The aim of this library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them. .. _`pathlib`: http://pypi.python.org/pypi/pathlib/ Related work ============ An object-oriented API for filesystem paths has already been proposed and rejected in :pep:`355`. Several third-party implementations of the idea of object-oriented filesystem paths exist in the wild: * The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs and others, which provides a ``str``-subclassing ``Path`` class; * Twisted's slightly specialized `FilePath class`_; * An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than ``str``; * `Unipath`_, a variation on the str-subclassing approach with two public classes, an ``AbstractPath`` class for operations which don't do I/O and a ``Path`` class for all common operations. This proposal attempts to learn from these previous attempts and the rejection of :pep:`355`. .. _`path.py module`: https://github.com/jaraco/path.py .. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FileP... .. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass .. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview Why an object-oriented API ========================== The rationale to represent filesystem paths using dedicated classes is the same as for other kinds of stateless objects, such as dates, times or IP addresses. Python has been slowly moving away from strictly replicating the C language's APIs to providing better, more helpful abstractions around all kinds of common functionality. Even if this PEP isn't accepted, it is likely that another form of filesystem handling abstraction will be adopted one day into the standard library. Indeed, many people will prefer handling dates and times using the high-level objects provided by the ``datetime`` module, rather than using numeric timestamps and the ``time`` module API. Moreover, using a dedicated class allows to enable desirable behaviours by default, for example the case insensitivity of Windows paths. Proposal ======== Class hierarchy --------------- The `pathlib`_ module implements a simple hierarchy of classes:: +----------+ | | ---------| PurePath |-------- | | | | | +----------+ | | | | | | | v | v +---------------+ | +------------+ | | | | | | PurePosixPath | | | PureNTPath | | | | | | +---------------+ | +------------+ | v | | +------+ | | | | | | -------| Path |------ | | | | | | | | | +------+ | | | | | | | | | | v v v v +-----------+ +--------+ | | | | | PosixPath | | NTPath | | | | | +-----------+ +--------+ This hierarchy divides path classes along two dimensions: * a path class can be either pure or concrete: pure classes support only operations that don't need to do any actual I/O, which are most path manipulation operations; concrete classes support all the operations of pure classes, plus operations that do I/O. * a path class is of a given flavour according to the kind of operating system paths it represents. `pathlib`_ implements two flavours: NT paths for the filesystem semantics embodied in Windows systems, POSIX paths for other systems (``os.name``'s terminology is re-used here). Any pure class can be instantiated on any system: for example, you can manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects under Unix, and so on. However, concrete classes can only be instantiated on a matching system: indeed, it would be error-prone to start doing I/O with ``NTPath`` objects under Unix, or vice-versa. Furthermore, there are two base classes which also act as system-dependent factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a ``PureNTPath`` depending on the operating system. Similarly, ``Path`` will instantiate either a ``PosixPath`` or a ``NTPath``. It is expected that, in most uses, using the ``Path`` class is adequate, which is why it has the shortest name of all. No confusion with builtins -------------------------- In this proposal, the path classes do not derive from a builtin type. This contrasts with some other Path class proposals which were derived from ``str``. They also do not pretend to implement the sequence protocol: if you want a path to act as a sequence, you have to lookup a dedicate attribute (the ``parts`` attribute). By avoiding to pass as builtin types, the path classes minimize the potential for confusion if they are combined by accident with genuine builtin types. Immutability ------------ Path objects are immutable, which makes them hashable and also prevents a class of programming errors. Sane behaviour -------------- Little of the functionality from os.path is reused. Many os.path functions are tied by backwards compatibility to confusing or plain wrong behaviour (for example, the fact that ``os.path.abspath()`` simplifies ".." path components without resolving symlinks first). Also, using classes instead of plain strings helps make system-dependent behaviours natural. For example, comparing and ordering Windows path objects is case-insensitive, and path separators are automatically converted to the platform default. Useful notations ---------------- The API tries to provide useful notations all the while avoiding magic. Some examples:: >>> p = Path('/home/antoine/pathlib/setup.py') >>> p.name 'setup.py' >>> p.ext '.py' >>> p.root '/' >>> p.parts <PosixPath.parts: ['/', 'home', 'antoine', 'pathlib', 'setup.py']> >>> list(p.parents()) [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')] >>> p.exists() True >>> p.st_size 928 Pure paths API ============== The philosophy of the ``PurePath`` API is to provide a consistent array of useful path manipulation operations, without exposing a hodge-podge of functions like ``os.path`` does. Definitions ----------- First a couple of conventions: * All paths can have a drive and a root. For POSIX paths, the drive is always empty. * A relative path has neither drive nor root. * A POSIX path is absolute if it has a root. A Windows path is absolute if it has both a drive *and* a root. A Windows UNC path (e.g. ``\\some\\share\\myfile.txt``) always has a drive and a root (here, ``\\some\\share`` and ``\\``, respectively). * A drive which has either a drive *or* a root is said to be anchored. Its anchor is the concatenation of the drive and root. Under POSIX, "anchored" is the same as "absolute". Construction and joining ------------------------ We will present construction and joining together since they expose similar semantics. The simplest way to construct a path is to pass it its string representation:: >>> PurePath('setup.py') PurePosixPath('setup.py') Extraneous path separators and ``"."`` components are eliminated:: >>> PurePath('a///b/c/./d/') PurePosixPath('a/b/c/d') If you pass several arguments, they will be automatically joined:: >>> PurePath('docs', 'Makefile') PurePosixPath('docs/Makefile') Joining semantics are similar to os.path.join, in that anchored paths ignore the information from the previously joined components:: >>> PurePath('/etc', '/usr', 'bin') PurePosixPath('/usr/bin') However, with Windows paths, the drive is retained as necessary:: >>> PureNTPath('c:/foo', '/Windows') PureNTPath('c:\\Windows') >>> PureNTPath('c:/foo', 'd:') PureNTPath('d:') Calling the constructor without any argument creates a path object pointing to the logical "current directory":: >>> PurePosixPath() PurePosixPath('.') A path can be joined with another using the ``__getitem__`` operator:: >>> p = PurePosixPath('foo') >>> p['bar'] PurePosixPath('foo/bar') >>> p[PurePosixPath('bar')] PurePosixPath('foo/bar') As with constructing, multiple path components can be specified at once:: >>> p['bar/xyzzy'] PurePosixPath('foo/bar/xyzzy') A join() method is also provided, with the same behaviour. It can serve as a factory function:: >>> path_factory = p.join >>> path_factory('bar') PurePosixPath('foo/bar') Representing ------------ To represent a path (e.g. to pass it to third-party libraries), just call ``str()`` on it:: >>> p = PurePath('/home/antoine/pathlib/setup.py') >>> str(p) '/home/antoine/pathlib/setup.py' >>> p = PureNTPath('c:/windows') >>> str(p) 'c:\\windows' To force the string representation with forward slashes, use the ``as_posix()`` method:: >>> p.as_posix() 'c:/windows' To get the bytes representation (which might be useful under Unix systems), call ``bytes()`` on it, or use the ``as_bytes()`` method:: >>> bytes(p) b'/home/antoine/pathlib/setup.py' Properties ---------- Five simple properties are provided on every path (each can be empty):: >>> p = PureNTPath('c:/pathlib/setup.py') >>> p.drive 'c:' >>> p.root '\\' >>> p.anchor 'c:\\' >>> p.name 'setup.py' >>> p.ext '.py' Sequence-like access -------------------- The ``parts`` property provides read-only sequence access to a path object:: >>> p = PurePosixPath('/etc/init.d') >>> p.parts <PurePosixPath.parts: ['/', 'etc', 'init.d']> Simple indexing returns the invidual path component as a string, while slicing returns a new path object constructed from the selected components:: >>> p.parts[-1] 'init.d' >>> p.parts[:-1] PurePosixPath('/etc') Windows paths handle the drive and the root as a single path component:: >>> p = PureNTPath('c:/setup.py') >>> p.parts <PureNTPath.parts: ['c:\\', 'setup.py']> >>> p.root '\\' >>> p.parts[0] 'c:\\' (separating them would be wrong, since ``C:`` is not the parent of ``C:\\``). The ``parent()`` method returns an ancestor of the path:: >>> p.parent() PureNTPath('c:\\python33\\bin') >>> p.parent(2) PureNTPath('c:\\python33') >>> p.parent(3) PureNTPath('c:\\') The ``parents()`` method automates repeated invocations of ``parent()``, until the anchor is reached:: >>> p = PureNTPath('c:/python33/bin/python.exe') >>> for parent in p.parents(): parent ... PureNTPath('c:\\python33\\bin') PureNTPath('c:\\python33') PureNTPath('c:\\') Querying -------- ``is_relative()`` returns True if the path is relative (see definition above), False otherwise. ``is_reserved()`` returns True if a Windows path is a reserved path such as ``CON`` or ``NUL``. It always returns False for POSIX paths. ``match()`` matches the path against a glob pattern:: >>> PureNTPath('c:/PATHLIB/setup.py').match('c:*lib/*.PY') True ``relative()`` returns a new relative path by stripping the drive and root:: >>> PurePosixPath('setup.py').relative() PurePosixPath('setup.py') >>> PurePosixPath('/setup.py').relative() PurePosixPath('setup.py') ``relative_to()`` computes the relative difference of a path to another:: >>> PurePosixPath('/usr/bin/python').relative_to('/usr') PurePosixPath('bin/python') ``normcase()`` returns a case-folded version of the path for NT paths:: >>> PurePosixPath('CAPS').normcase() PurePosixPath('CAPS') >>> PureNTPath('CAPS').normcase() PureNTPath('caps') Concrete paths API ================== In addition to the operations of the pure API, concrete paths provide additional methods which actually access the filesystem to query or mutate information. Constructing ------------ The classmethod ``cwd()`` creates a path object pointing to the current working directory in absolute form:: >>> Path.cwd() PosixPath('/home/antoine/pathlib') File metadata ------------- The ``stat()`` method caches and returns the file's stat() result; ``restat()`` forces refreshing of the cache. ``lstat()`` is also provided, but doesn't have any caching behaviour:: >>> p.stat() posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964) For ease of use, direct attribute access to the fields of the stat structure is provided over the path object itself:: >>> p.st_size 928 >>> p.st_mtime 1328287308.889562 Higher-level methods help examine the kind of the file:: >>> p.exists() True >>> p.is_file() True >>> p.is_dir() False >>> p.is_symlink() False The file owner and group names (rather than numeric ids) are queried through matching properties:: >>> p = Path('/etc/shadow') >>> p.owner 'root' >>> p.group 'shadow' Path resolution --------------- The ``resolve()`` method makes a path absolute, resolving any symlink on the way. It is the only operation which will remove "``..``" path components. Directory walking ----------------- Simple (non-recursive) directory access is done by iteration:: >>> p = Path('docs') >>> for child in p: child ... PosixPath('docs/conf.py') PosixPath('docs/_templates') PosixPath('docs/make.bat') PosixPath('docs/index.rst') PosixPath('docs/_build') PosixPath('docs/_static') PosixPath('docs/Makefile') This allows simple filtering through list comprehensions:: >>> p = Path('.') >>> [child for child in p if child.is_dir()] [PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')] Simple and recursive globbing is also provided:: >>> for child in p.glob('**/*.py'): child ... PosixPath('test_pathlib.py') PosixPath('setup.py') PosixPath('pathlib.py') PosixPath('docs/conf.py') PosixPath('build/lib/pathlib.py') File opening ------------ The ``open()`` method provides a file opening API similar to the builtin ``open()`` method:: >>> p = Path('setup.py') >>> with p.open() as f: f.readline() ... '#!/usr/bin/env python3\n' The ``raw_open()`` method, on the other hand, is similar to ``os.open``:: >>> fd = p.raw_open(os.O_RDONLY) >>> os.read(fd, 15) b'#!/usr/bin/env ' Filesystem alteration --------------------- Several common filesystem operations are provided as methods: ``touch()``, ``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``, ``chmod()``, ``lchmod()``, ``symlink_to()``. More operations could be provided, for example some of the functionality of the shutil module. Experimental openat() support ----------------------------- On compatible POSIX systems, the concrete PosixPath class can take advantage of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of open file descriptors as necessary. Support is enabled by passing the *use_openat* argument to the constructor:: >>> p = Path(".", use_openat=True) Then all paths constructed by navigating this path (either by iteration or indexing) will also use the openat() family of functions. The point of using these functions is to avoid race conditions whereby a given directory is silently replaced with another (often a symbolic link to a sensitive system location) between two accesses. .. _`openat()`: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html Copyright ========= This document has been placed into the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8

Hi! On Fri, Oct 05, 2012 at 08:25:34PM +0200, Antoine Pitrou <solipsis@pitrou.net> wrote:
This PEP proposes the inclusion of a third-party module, `pathlib`_, in the standard library.
+1 from me for a sane path handling in the stdlib!
Some attributes are properties and some are methods. Which is which? Why .root is a property but .parents() is a method? .owner/.group are properties but .exists() is a method, and so on. .stat() just returns self._stat, but said ._stat is a property!
If I understand it correctly these should are either \\\\some\\share\\myfile.txt and \\\\some\\share or \\some\share\myfile.txt and \\some\share no? Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Fri, Oct 5, 2012 at 9:16 PM, Oleg Broytman <phd@phdru.name> wrote:
Unobvious indeed. Maybe operations that cause OS api calls should have parens? Also, I agree with Paul Moore that the naming at its current state may cause cross-platform bugs. Though I don't understand why not to overload the "/" or "+" operators. Sounds more elegant than square brackets. Just make sure the op fails on anything other than Path objects. I'm +1 on adding such a useful abstraction to python if and only if it were
= os.path on every front,
Yuval Greenfield

On Fri, Oct 05, 2012 at 09:36:56PM +0200, Yuval Greenfield wrote:
Path concatenation is obviously not a form of division, so it makes little sense to use the division operator for this purpose. I always wonder why the designers of C++ felt that it made sense to perform output by left-bitshifting the output stream by a string: std::cout << "hello, world"; Fortunately, operator overloading in Python is generally limited to cases where the operator's meaning is preserved (with the unfortunate exception of the % operator for strings). -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868

On 06/10/12 05:53, Andrew McNabb wrote:
Path concatenation is obviously not a form of division, so it makes little sense to use the division operator for this purpose.
But / is not just a division operator. It is also used for: * alternatives: "tea and/or coffee, breakfast/lunch/dinner" * italic markup: "some apps use /slashes/ for italics" * instead of line breaks when quoting poetry * abbreviations such as n/a b/w c/o and even w/ (not applicable, between, care of, with) * date separator Since / is often (but not always) used as a path separator, using it as a path component join operator makes good sense. BTW, are there any supported platforms where the path separator or alternate path are not slash? There used to be Apple Mac OS using colons. -- Steven

On Sat, Oct 06, 2012 at 08:41:05AM +1000, Steven D'Aprano wrote:
This is the difference between C++ style operators, where the only thing that matters is what the operator symbol looks like, and Python style operators, where an operator symbol is just syntactic sugar. In Python, the "/" is synonymous with `operator.div` and is defined in terms of the `__div__` special method. This distinction is why I hate operator overloading in C++ but like it in Python. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868

Andrew McNabb wrote:
Not sure what you're saying here -- in both languages, operators are no more than syntactic sugar for dispatching to an appropriate method or function. Python just avoids introducing a special syntax for spelling the name of the operator, which is nice, but it's not a huge difference. The same issues of what you *should* use operators for arises in both communities, and it seems to be very much a matter of personal taste. (The use of << for output in C++ has never bothered me, BTW. There are plenty of problems with the way I/O is done in C++, but the use of << is the least of them, IMO...) -- Greg

On Sat, Oct 06, 2012 at 01:54:21PM +1300, Greg Ewing wrote:
To clarify my point: in Python, "/" is not just a symbol--it specifically means "div".
Overriding the div operator requires creating a "__div__" special method, which I think has helped influence personal taste within the Python community. I personally would feel dirty creating a "__div__" method that had absolutely nothing to do with division. Whether or not the sense of personal taste within the Python community is directly attributable to this or not, I believe that overloaded operators in Python tend to be more predictable and consistent than what I have seen in C++. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868

On 07/10/12 08:45, Andrew McNabb wrote:
To clarify my point: in Python, "/" is not just a symbol--it specifically means "div".
I think that's wrong. / is a symbol that means whatever the class gives it. It isn't like __init__ or __call__ that have defined language semantics, and there is no rule that says that / means division. I'll grant you that it's a strong convention, but it is just a convention.
Overriding the div operator requires creating a "__div__" special method,
Actually it is __truediv__ in Python 3. __div__ no longer has any meaning or special status. But it's just a name. __add__ doesn't necessarily perform addition, __sub__ doesn't necessarily perform subtraction, and __or__ doesn't necessarily have anything to do with either bitwise or boolean OR. Why should we insist that __*div__ (true, floor or just plain div) must only be used for numeric division when we don't privilege other numeric operators like that? -- Steven

On Tue, Oct 09, 2012 at 12:03:55AM +1100, Steven D'Aprano wrote:
I'll grant you that the semantics of the __truediv__ method are defined by convention.
__add__ for strings doesn't mean numerical addition, but people find it perfectly natural to speak of "adding two strings," for example. Seeing `string1.__add__(string2)` is readable, as is `operator.add(string1, string2)`. Every other example of operator overloading that I find tasteful is analogous enough to the numerical operators to retain use the name. Since this really is a matter of personal taste, I'll end my participation in this discussion by voicing support for Nick Coghlan's suggestion of a `join` method, whether it's named `join` or `append` or something else. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868

http://en.wikipedia.org/wiki/List_of_mathematical_symbols#Symbols The + symbol means addition and union of disjoint sets. A path (including a fs path) is a set of links (for a fs path, a link is a folder name). Using the + symbols has a natural interpretation as concatenation of subpaths (sets) to for form a longer path (superset). The / symbol means the quotient of a group. It always returns a subgroup. When I see path1 / path2 I would expect it to return all paths that start by path2 or contain path2, not concatenation. The fact that string paths in Unix use the / to represent concatenation is accidental. That's just how the path is serialized into a string. In fact Windows uses a different separator. I do think a serialized representation of an object makes a good example for its abstract representation. Massimo On Oct 8, 2012, at 11:06 AM, Andrew McNabb wrote:

Massimo DiPierro wrote:
The fact that string paths in Unix use the / to represent concatenation is accidental.
Maybe so, but it can be regarded as a fortuitous accident, since / also happens to be an operator in Python, so it would have mnemonic value to Unix users. The correspondence is not exact for Windows users, but / is similar enough to still have some mnemonic value for them. And all the OSes using other separators seem to have died out. -- Greg

Massimo DiPierro wrote:
A reason *not* to use '+' is that it would violate associativity in some cases, e.g. (path + "foo") + "bar" would not be the same as path + ("foo" + "bar") Using '/', or any other operator not currently defined on strings, would prevent this mistake from occuring. A reason to want an operator is the symmetry of path concatenation. Symmetrical operations deserve a symmetrical syntax, and to achieve that in Python you need either an operator or a stand-alone function. A reason to prefer an operator over a function is associativity. It would be nice to be able to write path1 / path2 / path3 and not have to think about the order in which the operations are being done. If '/' is considered too much of a stretch, how about '&'? It suggests a kind of addition or concatenation, and in fact is used for string concatenation in some other languages. -- Greg

On 09/10/2012 16:30, Michele Lacchia wrote:
But why not interpret a path as a tuple (not a list, it's immutable) of path segments and have: path + ("foo", "bar") and path + ".tar.gz" behave different (i.e. tuples add segments and strings add to the last segment)? And of course path1 + path2 adds the segments together. Joachim

On Mon, 8 Oct 2012 10:06:17 -0600 Andrew McNabb <amcnabb@mcnabbs.org> wrote:
The join() method already exists in the current PEP, but it's less convenient, synctatically, than either '[]' or '/'. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Tue, Oct 9, 2012 at 12:10 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Right. My objections boil down to: 1. The case has not been adequately made that a second way to do it is needed. Therefore, the initial version should just include the method API. 2. Using "join" as the method name is a bad idea for the same reason that using "+" as the operator syntax would be a bad idea: it can cause erroneous output instead of an exception if a string is passed where a Path object is expected. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, 9 Oct 2012 00:19:03 +0530 Nick Coghlan <ncoghlan@gmail.com> wrote:
But you really want a short method name, otherwise it's better to have a dedicated operator. joinpath() definitely doesn't cut it, IMO. (perhaps that's the same reason I am reluctant to use str.format() :-)) By the way, I also thought of using __call__, but for some reason I think it tastes a bit bad ("too clever"?).
Admitted, although I think the potential for confusion is smaller than with "+" (I can't really articulate why, it's just that I fear one much less than the other :-)). Regards Antione. -- Software development and contracting: http://pro.pitrou.net

On Mon, Oct 8, 2012 at 11:56 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Maybe you're overreacting? The current notation for this operation is os.path.join(p, q) which is even longer than p.pathjoin(q). To me the latter is fine.
__call__ overloading is often overused. Please don't go there. It is really hard to figure out what some (semi-)obscure operation means if it uses __call__ overloading.
Personally I fear '+' much more -- to me, + can be used to add an extension without adding a new directory level. If we *have* to overload an operator, I'd prefer p/q over p[q] any day. -- --Guido van Rossum (python.org/~guido)

On 8 October 2012 20:14, Stefan Krah <stefan@bytereef.org> wrote:
On the basis that we want standard libraries to be non-contentious issues: is it not obvious that "+", "/" and "[]" *cannot* be the right choices as they're contentious? I would argue that a lot of this argument is “pointless” because there is no right answer. For example, I prefer indexing out of the lot, but since a lot of people really dislike it I'm not going to bother vouching for it. I think we should ague more along the lines of: # Possibility for accidental validity if configdir is a string
configdir.join("myprogram")
# A bit long
# There's argument here, but I don't find them intuitive or nice
configdir.subpath("mypogram") configdir.superpath("mypogram")
# My favorites ('cause my opinion: so there)
# What I'm surprised (but half-glad) hasn't been mentioned configdir.cd("myprogam") # Not a link, just GMail's silly-ness We already know the semantics for the function; now it's *just a name*.

I was just thinking the same thing. My preference for this at the moment is 'append', notwithstanding the fact that it will be non-mutating. It's a single, short word, it avoids re-stating the datatype, and it resonates with the idea of appending to a sequence of path components.
# My favorites ('cause my opinion: so there) configdir.child("myprogram") # Does sorta' imply IO
Except that the result isn't always a child (the RHS could be an absolute path, start with "..", etc.)
configdir.cd("myprogam")
Aaaghh... my brain... the lobotomy does nothing... -- Greg

Stefan Krah wrote:
'^' or '@' are used for concatenation in some languages. At least accidental confusion with xor is pretty unlikely.
We'd have to add '@' as a new operator before we could use that. But '^' might have possibilities... if you squint, it looks a bit like a compromise between Unix and Windows path separators. :-) -- Greg

On Tue, Oct 9, 2012 at 12:34 AM, Guido van Rossum <guido@python.org> wrote:
Yes, of all the syntactic shorthands, I also favour "/". However, I'm also a big fan of starting with a minimalist core and growing it. Moving from "os.path.join(a, b, c, d, e)" (or, the way I often write it, "joinpath(a, b, c, d, e)") to "a.joinpath(b, c, d, e)" at least isn't going backwards, and is more obvious in isolation than "a / b / c / d / e". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
I think we should keep in mind that we're (hopefully) not going to see things like "a / b / c / d / e" in real-life code. Rather we're going to see things like backupath = destdir / "archive" / filename + ".bak" In other words, there should be some clue from the names that paths are involved, from which it should be fairly easy to guess what the "/" means. -- Greg

Antoine Pitrou wrote:
But you really want a short method name, otherwise it's better to have a dedicated operator. joinpath() definitely doesn't cut it, IMO.
I agree, it's far too longwinded. It would clutter your code just as badly as using os.path.join() all over the place does now, but without the option of aliasing it to a shorter name. -- Greg

On 9 October 2012 06:41, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Good point - the fact that it's not possible to alias a method name means that it's important to get the name right if we're to use a method, because we're all stuck with it forever. Because of that, I'm much more reluctant to "just put up with" Path.pathjoin on the basis that it's better than any other option. Are there any libraries that use a method on a path object (or something similar - URL objects, maybe) and if so, what method name did they use? I'd like to see what real code using any proposed method name would look like. As a point of reference, twisted's FilePath class uses "child". Paul

On Tue, Oct 09, 2012 at 08:36:58AM +0100, Paul Moore wrote:
Huh? py> f = str.join # "join" is too long and I don't like it py> f("*", ["spam", "ham", "eggs"]) 'spam*ham*eggs' We should get the name right because we're stuck with it forever due to backwards compatibility, not because you can't alias it. -- Steven

On Oct 8, 2012, at 3:47 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I'd prefer 'append', because
path.append("somedir", "file.txt")
+1 In so many ways, I see a path as a list of its components. Because of that, path.append and path.extend, with similar semantics to list.append and list.extend, makes a lot of sense to me. When I think about a path as a list of components rather than as a string, the '+' operator starts to make sense for joins as well. I'm OK with using the '/' for path joining as well, because the parallel with list doesn't fit in this case, although I understand Massimo's objection to it. In very many ways, I like thinking of a path as a list (slicing, append, etc). The fact that list.append doesn't return the new list has always bugged me, but if we were to use append and extend, they should mirror the semantics from list. I'm much more inclined to think of path as a special list than as a special string. Ryan

On Mon, Oct 8, 2012 at 4:47 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
As Nick noted, the problem is that append() conflicts with MutableSequence.append(). If someone subclasses Path and friends to act like a list then it complicates the situation. In my mind the name should be one that is not already in use by strings or sequences. -eric

On 09/10/2012 00:47 Greg Ewing wrote:
As has already been stated by others, paths are immutable so using them like lists is leading to confusion (and list's append() only wants one arg, so extend() might be better in that case). But paths could then be interpreted as tuples of "directory entries" instead. So adding a path to a path would "join" them: pathA + pathB and in order to not always need a path object for pathB one could also write the right argument of __add__ as a tuple of strings: pathA + ("somedir", "file.txt") One could also use "+" for adding to the last segment if it isn't a path object or a tuple: pathA + ".tar.gz" Joachim

On Oct 9, 2012, at 1:18 AM, Joachim König <him@online.de> wrote:
I like it. As you pointed out, my comparison with list is inappropriate because of path's immutability. So .append() and .extend() probably don't make sense.
One could also use "+" for adding to the last segment if it isn't a path object or a tuple:
pathA + ".tar.gz"
This might be a reasonable way to appease both those who are viewing path as a special tuple and those who are viewing it as a special string. It breaks the parallel with tuple a bit, but it's clear that there are important properties of both strings and tuples that would be nice to preserve. Ryan

On Oct 9, 2012, at 10:11 AM, Eric V. Smith <eric@trueblade.com> wrote:
or pathA + Path("file.txt") Just like with any tuple, if you wish to add a new part, it must be a tuple (Path) first. I'm not convinced that adding a string to a path should be allowed, but if not then we should probably throw a TypeError if its not a tuple or Path. That would leave the following method for appending a suffix: path[:-1] + Path(path[-1] + '.tar.gz') That's alot more verbose than the option to "add a string". Ryan

On 09.10.2012 19:11, Eric V. Smith wrote:
You could of course write: pathA + "/file.txt" because with a separator it's still explicit. But this requires clarification because "/file.txt" could be considered an absolut path. But IMO the string additionen should be concatenation. YMMV. Joachim

On 06/10/12 09:54, Andrew McNabb wrote:
I'm afraid that it's a distinction that seems meaningless to me. int + int and str + str are not the same, even though the operator symbol looks the same. Likewise int - int and set - set are not the same even though they use the same operator symbol. Similarly for & and | operators. For what it is worth, when I am writing pseudocode on paper, just playing around with ideas, I often use / to join path components: open(path/name) # pseudo-code sort of thing, so I would be much more comfortable writing either of these: path/"name.txt" path+"name.txt" than path["name.txt"] which looks like it ought to be a lookup, not a constructor. -- Steven

On Fri, 5 Oct 2012 23:16:25 +0400 Oleg Broytman <phd@phdru.name> wrote:
parents() returns a generator (hence the list() call in the example above). A generator-returning property sounds a bit too confusing IMHO. ._stat is an implementation detail. stat() and exists() both mirror similar APIs in the os / os.path modules. .name, .ext, .root, .parts just return static, immutable properties of the path, I see no reason for them to be methods.
Ah, right. I'll correct it. Thanks Antoine. -- Software development and contracting: http://pro.pitrou.net

On 5 October 2012 19:25, Antoine Pitrou <solipsis@pitrou.net> wrote:
There is a risk that this is too "cute". However, it's probably better than overloading the '/' operator, and you do need something short.
That's risky. Are you proposing always using '/' regardless of OS? I'd have expected os.sep (so \ on Windows). On the other hand, that would make p['bar\\baz'] mean two different things on Windows and Unix - 2 extra path levels on Windows, only one on Unix (and a filename containing a backslash). It would probably be better to allow tuples as arguments: p['bar', 'baz']
I don't like the way the distinction between "root" and "anchor" works here. Unix users are never going to use "anchor", as "root" is the natural term, and it does exactly the right thing on Unix. So code written on Unix will tend to do the wrong thing on Windows (where generally you'd want to use "anchor" or you'll find yourself switching accidentally to the current drive). It's a rare situation where it would matter, which on the one hand makes it much less worth worrying about, but on the other hand means that when bugs *do* occur, they will be very obscure :-( Also, there is no good terminology in current use here. The only concrete thing I can suggest is that "root" would be better used as the term for what you're calling "anchor" as Windows users would expect the root of "C:\foo\bar\baz" to be "C:\". The term "drive" would be right for "C:" (although some might expect that to mean "C:\" as well, but there's no point wasting two terms on the one concept). It might be more practical to use a new, but explicit, term like "driveroot" for "\". It's the same as root on Unix, and on Windows it's fairly obviously "the root on the current drive". And by using the coined term for the less common option, it might act as a reminder to people that something not entirely portable is going on. But there's no really simple answer - Windows and Unix are just different here.
+1. There's lots of times I have wished os.path had this.
This again suggests to me that "C:\" is more closely allied to the term "root" here. Also, I assume that paths will be comparable, using case sensitivity appropriate to the platform. Presumably a PurePath and a Path are comparable, too. What about a PosixPath and an NTPath? Would you expect them to be comparable or not? But in general, this looks like a pretty good proposal. Having a decent path abstraction in the stdlib would be great. Paul.

Paul Moore wrote:
I actually like using the '/' operator for this. My own path module uses it, and the resulting code is along the lines of: job = Path('c:/orders/38273') table = dbf.Table(job/'ABC12345')
Mine does; it also accepts `\\` on Windows machines. Personally, I don't care for the index notation Antoine is suggesting. ~Ethan~

On Fri, 5 Oct 2012 20:19:12 +0100 Paul Moore <p.f.moore@gmail.com> wrote:
I think overloading '/' is ugly (dividing paths??). Someone else proposed overloading '+', which would be confusing since we need to be able to combine paths and regular strings, for ease of use. The point of using __getitem__ is that you get an error if you replace the Path object with a regular string by mistake:
If you were to use the '+' operator instead, 'foo' + 'bar' would work but give you the wrong result.
Both '/' and '\\' are accepted as path separators under Windows. Under Unix, '\\' is a regular character:
It would probably be better to allow tuples as arguments:
p['bar', 'baz']
It already works indeed:
Well, I expect .root or .anchor to be used mostly for presentation or debugging purposes. There's nothing really useful to be done with them otherwise, IMHO. Do you know of any use cases?
But then the root of "C:foo" would be "C:", which sounds wrong: "C:" isn't a root at all.
But there's no really simple answer - Windows and Unix are just different here.
Yes, and Unix users are expecting something simpler than what's going on under Windows ;)
Currently, different flavours imply unequal (and unorderable) paths:
However, pure paths and concrete paths of the same flavour can be equal, and ordered:
Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Antoine Pitrou wrote:
But '/' is the normal path separator, so it's not dividing; and it certainly makes more sense than `%` with string interpolations. ;)
I would rather use the `/` and `+` and risk the occasional wrong result. (And yes, I have spent time tracking bugs because of that wrong result when using my own Path module -- and I'd still rather make that trade-off.) ~Ethan~

+1 in general. I like to have library like that in the battery. I would to see the note why [] used instead / or + in the pep while I'm agree with that. +0 for / -1 for + For method/property decision I guess (maybe stupid) rule: properties for simple accessors and methods for operations which require os calls. With exception for parents() as method which returns generator. On Fri, Oct 5, 2012 at 11:06 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
-- Thanks, Andrew Svetlov

Antoine Pitrou wrote:
Well, I expect .root or .anchor to be used mostly for presentation or debugging purposes.
I'm having trouble thinking of *any* use cases, even for presentation or debugging. Maybe they should be dropped altogether until someone comes up with a use case. -- Greg

On Fri, Oct 5, 2012 at 1:55 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I think overloading '/' is ugly (dividing paths??).
Agreed. +1 on the proposed API in this regard. It's pretty easy to grok. I also like that item access here mirrors how paths are treated as sequences/iterables in other parts of the API. It wouldn't surprise me if the join syntax is the most contentious part of the proposal. ;) -eric

Antoine Pitrou writes:
I didn't like this much at first. However, if you think of this as a "collection" (cf. WebDAV), then the bracket notation is the obvious way to do it in Python (FVO "it" == "accessing a member of a collection by name"). I wonder if there is a need to distinguish between a path naming a directory as a collection, and as a file itself? Or can/should this be implicit (wash my mouth out with soap!) in the operation using the Path?
Is it really that obnoxious to write "p + Path('bar')" (where p is a Path)? What about the case "'bar' + p"? Since Python isn't C, you can't express that as "'bar'[p]"!
That's outright ugly, especially from the "collections" point of view (foo/bar/xyzzy is not a member of foo). If you want something that doesn't suffer from the bogosities of os.path, this kind of platform- dependence should be avoided, I think.
Why not interpret the root of "C:foo" to be None? The Windows user can still get "C:" as the drive, and I don't think that will be surprising to them.
Well, Unix users can do things more uniformly. But there's also a lot of complexity going on under the hood. Every file system has a root, of which only one is named "/". I don't know if Python programs ever need that information (I never have :-), but it would be nice to leave room for extension. Similarly, many "file systems" are actually just hierarchically organized database access methods with no physical existence on hardware. I wonder if "mount_point" is sufficiently general to include the roots of real local file systems, remote file systems, Windows drives, and pseudo file systems? An obvious problem is that Windows users would not find that terminology natural.

On 6 October 2012 09:39, Stephen J. Turnbull <turnbull@sk.tsukuba.ac.jp> wrote:
Technically, newer versions of Windows (Vista and later, I think) allow you to mount a drive on a directory rather than a drive letter, just like Unix. Although I'm not sure I've ever seen it done, and I don't know if there are suitable API calls to determine if a directory is a mount point (I guess there must be). An ugly, but viable, approach would be to have drive and mount_point properties, which are synonyms. Paul.

On Sat, 06 Oct 2012 17:39:13 +0900 "Stephen J. Turnbull" <turnbull@sk.tsukuba.ac.jp> wrote:
I don't think there's a need to distinguish. Trying to access /etc/passwd/somefile will simply raise an error on I/O.
The issue I envision is if you write `p + "bar"`, thinking p is a Path, and p is actually a str object. It won't raise, but give you the wrong result.
Well, you do want to be able to convert str paths to Path objects without handling path separator conversion by hand. It's a matter of practicality.
That's a possibility indeed. I'd like to have feedback from more Windows users about your suggestion:
which would also give the following for UNC paths:
PureNTPath('//network/share/foo/bar').root '\\\\network\\share\\'
Another is that finding mount points is I/O, while finding the root is a purely lexical operation. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Antoine Pitrou writes:
No, my point is that for me prepending new segments is quite common, though not as common as appending them. The asymmetry of the bracket operator means that there's no easy way to deal with that. On the other hand, `p + Path('foo')` and `Path('foo') + p` (where p is a Path, not a string) both seem reasonable to me. It's true that one could screw up as you suggest, but that requires *two* mistakes, first thinking that p is a Path when it's a string, and then forgetting to convert 'bar' to Path. I don't think that's very likely if you don't allow mixing strings and Paths without explicit conversion.
Sorry, cut too much context. I was referring to the use of path['foo/bar'] where path['foo', 'bar'] will do. Of course overloading the constructor is an obvious thing to do.

Am 06.10.2012 16:49, schrieb Stephen J. Turnbull:
But having to call Path() explicitly every time is not very convenient either; in that case you can also call .join() -- and I bet people would prefer p + Path('foo/bar/baz') (which is probably not correct in all cases) to p + Path('foo') + Path('bar') + Path('baz') just because it's such a pain. On the other hand, when the explicit conversion is not needed, confusion will ensue, as Antoine says. In any case, for me using "+" to join paths is quite ugly. I guess it's because after all, I think of the underlying path as a string, and "+" is hardwired in my brain as string concatenation (at least in Python). Georg

Stephen J. Turnbull wrote:
On the other hand, `p + Path('foo')` and `Path('foo') + p` (where p is a Path, not a string) both seem reasonable to me.
I don't like the idea of using + as the path concatenation operator, because path + ".c" is an obvious way to add an extension or other suffix to a filename, and it ought to work. -- Greg

Antoine Pitrou wrote:
I'm all for eliminating extra '.'s, but shouldn't extra '/'s be an error?
What's the use-case for iterating through all the parent directories? Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'), and I export it to .csv in the same folder; how would I transform the above PureNTPath's ext from 'dbf' to 'csv'? ~Ethan~

On Fri, Oct 05, 2012 at 02:38:57PM -0700, Ethan Furman <ethan@stoneleaf.us> wrote:
Why? They aren't errors in the underlying OS.
for parent in p.parents(): if parent['.svn'].exists(): last_seen = parent continue else: print("The topmost directory of the project: %s" % last_seen) break Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Oleg Broytman wrote:
They are on Windows (no comment on whether or not it qualifies as an OS ;). c:\temp>dir \\\\\temp The filename, directory name, or volume label syntax is incorrect. c:\temp>dir \\temp The filename, directory name, or volume label syntax is incorrect. Although I see it works fine in between path pieces: c:\temp\34400>dir \temp\\\34400 [snip listing]
Cool, thanks. ~Ethan~

On 10/06/2012 12:21 AM, Ethan Furman wrote:
\\ at the start of a path has a special meaning under windows: http://en.wikipedia.org/wiki/UNC_path#Uniform_Naming_Convention

On Sat, 06 Oct 2012 00:47:28 +0200 Mathias Panzenböck <grosser.meister.morti@gmx.net> wrote:
\\ at the start of a path has a special meaning under windows: http://en.wikipedia.org/wiki/UNC_path#Uniform_Naming_Convention
And indeed the API preserves them: >>> PurePosixPath('//some/path') PurePosixPath('/some/path') >>> PureNTPath('//some/path') PureNTPath('\\\\some\\path\\') Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Fri, 05 Oct 2012 14:38:57 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
Something like:
Any suggestion to ease this use case a bit? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, 06 Oct 2012 01:27:49 +0100 Richard Oudkerk <shibturn@gmail.com> wrote:
Wouldn't there be some confusion with os.path.basename:
Richard
-- Software development and contracting: http://pro.pitrou.net

On Sat, 06 Oct 2012 01:27:49 +0100 Richard Oudkerk <shibturn@gmail.com> wrote:
Wouldn't there be some confusion with os.path.basename:
os.path.basename('a/b/c.ext') 'c.ext'
(sorry for the earlier, unfinished reply) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Antoine Pitrou writes:
Not to mention standard Unix usage. GNU basename will allow you to specify a *particular* extension explicitly, which will be stripped if present and otherwise ignored. Eg, "basename a/b/c.ext ext" => "c." (note the period!) and "basename a/b/c ext" => "c". I don't know if that's an extension to POSIX. In any case, it would require basename to be a method rather than a property.
(sorry for the earlier, unfinished reply)
Also there are applications where "basenames" contain periods (eg, wget often creates directories with names like "www.python.org"), and filenames may have multiple extensions, eg, "index.ja.html". I think it's reasonable to define "extension" to mean "the portion after the last period (if any, maybe including the period), but I think usage of the complementary concept is pretty application- specific.

Stephen J. Turnbull wrote:
I wouldn't worry too much about this; after all, we are trying to replace a primitive system with a more advanced, user-friendly one.
FWIW, my own implementation uses the names .path -> c:\foo\bar or \\computer_name\share\dir1\dir2 .vol -> c: \\computer_name\share .dirs -> \foo\bar \dir1\dir2 .filename -> some_file.txt or archive.tar.gz .basename -> some_file archive .ext -> .txt .tar.gz ~Ethan~

How about making a path object behave like a sequence of pathname components? Then * You can iterate over it directly instead of needing .parents() * p[:-1] gives you the dirname * p[-1] gives you the os.path.basename -- Greg

On Sat, Oct 06, 2012 at 05:04:44PM +0900, "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
I think this would be overgeneralization. IMO there is no need to replace parts beyond drive/name/extension. To "replace" root or path components just construct a new Path. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, Oct 06, 2012 at 11:44:02AM -0700, Ethan Furman <ethan@stoneleaf.us> wrote:
Yes. Even if the new path differs from the old by one letter somewhere in a middle component. "Practicality beats purity". We need to see real use cases to decide what is really needed. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Fri, 5 Oct 2012 23:16:55 -0600 Eric Snow <ericsnowcurrently@gmail.com> wrote:
The concrete Path objects' replace() method already maps to os.replace(). Note os.replace() is new in 3.3 and is a portable always-overwriting alternative to os.rename(): http://docs.python.org/dev/library/os.html#os.replace Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, Oct 06, 2012 at 02:09:24PM +0200, Antoine Pitrou <solipsis@pitrou.net> wrote:
Call it "with": newpath = path.with_drive('C:') newpath = path.with_name('newname') newpath = path.with_ext('.zip') Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman <phd@phdru.name> wrote:
BTW, I think having these three -- replacing drive, name and extension -- is enough. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, 6 Oct 2012 16:40:49 +0400 Oleg Broytman <phd@phdru.name> wrote:
What is the point of replacing the drive? Replacing the name is already trivial: path.parent()[newname] So we only need to replace the "basename" and the extension (I think I'm ok with the "basename" terminology now :-)). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, Oct 06, 2012 at 02:46:35PM +0200, Antoine Pitrou <solipsis@pitrou.net> wrote:
I'm ok with that. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, 06 Oct 2012 14:55:16 +0200 Georg Brandl <g.brandl@gmx.net> wrote:
Well, "basename" is the name proposed for the "part before the extension". "name" is the full filename. (so path.name == path.basename + path.ext) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, 06 Oct 2012 15:08:27 +0200 Georg Brandl <g.brandl@gmx.net> wrote:
True, but since we already have the name attribute it stands reasonable for basename to mean something else than name :-) Do you have another suggestion? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, Oct 6, 2012 at 3:42 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
It appears "base name" or "base" is the convention for the part before the extension. http://en.wikipedia.org/wiki/Filename Perhaps os.path.basename should be deprecated in favor of a better named function one day. But that's probably for a different thread.

On Sat, Oct 06, 2012 at 03:49:49PM +0200, Yuval Greenfield <ubershmekel@gmail.com> wrote:
Perhaps os.path.basename should be deprecated in favor of a better named function one day. But that's probably for a different thread.
That's certainly for a different Python. os.path.basename cannot be renamed because: 1) it's used in millions of programs; 2) it's in line with GNU tools. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Antoine Pitrou wrote:
If we have a method for replacing the extension, I don't think we have a strong need a name for "all of the last name except the extension", because usually all you want that for is so you can add a different extension (possibly empty). So I propose to avoid the term "basename" altogether, and just have path.name --> all of the last component path.ext --> the extension path.with_name(foo) -- replaces all of the last component path.with_ext(ext) -- replaces the extension Then if you really want to extract the last component without the extension (which I expect to be a rare requirement), you can do path.with_ext("").name -- Greg

Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
This is based on the false concept that there is one “extension” in a filename. On POSIX filesystems, that's just not true; filenames often have several suffixes in sequence, e.g. ‘foo.tar.gz’ or ‘foo.pg.sql’, and each one conveys meaningful intent by whoever named the file.
+1 on avoiding the term “basename” for anything to do with the concept being discussed here, since it already has a different meaning (“the part of the filename without any leading directory parts”). −1 on entrenching this false concept of “the extension” of a filename. -- \ Eccles: “I'll get [the job] too, you'll see. I'm wearing a | `\ Cambridge tie.” Greenslade: “What were you doing there?” | _o__) Eccles: “Buying a tie.” —The Goon Show, _The Greenslade Story_ | Ben Finney

Ben Finney wrote:
When I talk about "the extension", I mean the last one. The vast majority of the time, that's all you're interested in -- you unwrap one layer of the onion at a time, and leave the rest for the next layer of software up. That's not always true, but it's true often enough that I think it's worth having special APIs for dealing with the last dot-suffix. -- Greg

Antoine Pitrou wrote:
I do not.
What is the point of replacing the drive?
At my work we have identical path structures on several machines, and we sometimes move entire branches from one machine to another. In those instances it is good to be able to change from one drive/mount/share to another.
Replacing the name is already trivial: path.parent()[newname]
Or, if '/' is allowed, path.path/newname. I can see the reasonableness of using indexing (to me, it sorta looks like a window onto the path ;) ), but I prefer other methods when possible (tender wrists -- arthritis sucks) ~Ethan~

Antoine Pitrou writes:
``relative()`` returns a new relative path by stripping the drive and root::
Does this have use cases so common that it deserves a convenience method? I would expect "relative" to require an argument. (Ie, I would expect it to have the semantics of "relative_to".) Or is the issue that you can't count on PureNTPath(p).relative_to('C:\\') to DTRT? Maybe the

On 6 October 2012 11:09, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Agreed.
I would expect "relative" to require an argument. (Ie, I would expect it to have the semantics of "relative_to".)
I agree that's what I thought relative() would be when I first read the name.
It seems to me that if p isn't on drive C:, then the right thing is clearly to raise an exception. No ambiguity there - although Unix users might well write code that doesn't allow for exceptions from the method, just because it's not a possible result on Unix. Having it documented might help raise awareness of the possibility, though. And that's about the best you can hope for. Paul.

On Sat, 6 Oct 2012 11:27:58 +0100 Paul Moore <p.f.moore@gmail.com> wrote:
You are right, relative() could be removed and replaced with the current relative_to() method. I wasn't sure about how these names would feel to a native English speaker.
Indeed:
Actually, it can raise too:
You can't really add '..' components and expect the result to be correct, for example if '/usr/lib' is a symlink to '/lib', then '/usr/lib/..' is '/', not /usr'. That's why the resolve() method, which resolves symlinks along the path, is the only one allowed to muck with '..' components. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

I've said before that I like the general shape of the pathlib API and that's still the case. It's the only OO API I've seen that's semantically clean enough for me to support introducing it as "the" standard path abstraction in the standard library. However, there are still a few rough edges I would like to see smoothed out :) On Sat, Oct 6, 2012 at 5:48 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
The minor problem is that "relative" on its own is slightly unclear about whether the invariant involved is "a == b.subpath(a.relative(b))" or "b == a.subpath(a.relative(b))" By including the extra word, the intended meaning becomes crystal clear: "a == b.subpath(a.relative_to(b))" However, "a relative to b" is the more natural interpretation, so +1 for using "relative" for the semantics of the method based equivalent to the current os.path.relpath(). I agree there's no need for a shorthand for "a.relative(a.root)" As the invariants above suggest, I'm also currently -1 on *any* of the proposed shorthands for "p.subpath(subpath)", *as well as* the use of "join" as the method name (due to the major difference in semantics relative to str.join). All of the shorthands are magical and/or cryptic and save very little typing over the explicitly named method. As already noted in the PEP, you can also shorten it manually by saving the bound method to a local variable. It's important to remember that you can't readily search for syntactic characters or common method names to find out what they mean, and these days that kind of thing should be taken into account when designing an API. "p.subpath('foo', 'bar')" looks like executable pseudocode for creating a new path based on existing one to me, unlike "p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')". The method semantics are obvious by comparison, since they would be the same as those for ordinary construction: "p.subpath(*args) == type(p)(p, *args)" I'm not 100% sold on "subpath" as an alternative (since ".." entries may mean that the result isn't really a subpath of the original directory at all), but I do like the way it reads in the absence of parent directory references, and I definitely like it better than "join" or "[]" or "/" or "+". This interpretation is also favoured by the fact that the calculation of relative path references is strict by default (i.e. it won't insert ".." to make the reference work when the target isn't a subpath)
This seems too strict for the general case. Configuration files in bundled applications, for example, often contain paths relative to the file (e.g. open up a Visual Studio project file). There are no symlinks involved there. Perhaps a "require_subpath" flag that defaults to True would be appropriate? Passing "require_subpath=False" would then provide explicit permission to add ".." entries as appropriate, and it would be up to the developer to document the "no symlinks!" restriction on their layout. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8 October 2012 11:31, Nick Coghlan <ncoghlan@gmail.com> wrote:
Until precisely this point in your email, I'd been completely confused, because I thought that p.supbath(xxx) was some sort of "is xxx a subpath of p" query. It never occurred to me that it was the os.path.join equivalent operation. In fact, I'm not sure where you got it from, as I couldn't find it in either the PEP or in pathlib's documentation. I'm not unhappy with using a method for creating a new path based on an existing one (none of the operator forms seems particularly compelling to me) but I really don't like subpath as a name. I don't dislike p.join(parts) as it links back nicely to os.path.join. I can't honestly see anyone getting confused in practice. But I'm not so convinced that I would want to insist on it. +1 on a method -1 on subpath as its name +0 on join as its name I'm happy for someone to come up with a better name -0 on a convenience operator form. Mainly because "only one way to do it" and the general controversy over which is the best operator to use, suggests that leaving the operator form out altogether at least in the initial implementation is the better option. Paul.

Paul Moore writes:
On 8 October 2012 11:31, Nick Coghlan <ncoghlan@gmail.com> wrote:
I agree with Paul on this. If .join() doesn't work for you, how about .append() for adding new path components at the end, vs. .suffix() for adding an extension to the last component? (I don't claim Paul would agree with this next, but as long as I'm here....) I really think that the main API for paths should be the API for sequences specialized to "sequence of path components", with a subsidiary set of operations for common textual manipulations applied to individual components.

On Mon, Oct 8, 2012 at 4:41 PM, Paul Moore <p.f.moore@gmail.com> wrote:
That's OK, I don't set the bar for my mnemonics *that* high: I use Guido's rule that good names are easy to remember once you know what they mean. Being able to guess precisely just from the name is a nice bonus, but not strictly necessary.
I made it up by using "make subpath" as the reverse of "get relative path". The "is subpath" query could be handled by calling "b.startswith(a)". I'd be fine with "joinpath" as well (that is what path.py uses to avoid the conflict with str.join)
I really don't like it because of the semantic conflict with str.join. That semantic conflict is the reason I only do "from os.path import join as joinpath" or else call it as "os.path.join" - I find that using the bare "join" directly is too hard to interpret when reading code. I consider .append() and .extend() unacceptable for the same reason - they're too closely tied to mutating method semantics on sequences.
Right, this is my main point as well. The method form *has* to exist. I am *not* convinced that the cute syntactic shorthands actually *improve* readability - they improve *brevity*, but that's not the same thing. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 08/10/12 21:31, Nick Coghlan wrote:
The use of indexing to join path components: # Example from the PEP >>> p = PurePosixPath('foo') >>> p['bar'] PurePosixPath('foo/bar') is an absolute deal breaker for me. I'd rather stick with the status quo than have to deal with something which so clearly shouts "index/key lookup" but does something radically different (join/concatenate components). I would *much* rather use the / or + operator, but I think even better (and less likely to cause arguments about the operator) is an explicit `join` method. After all, we call it "joining path components", so the name is intuitive (at least for English speakers) and simple. I don't believe that there will be confusion with str.join -- we already have an os.path.join method, and I haven't seen any sign of confusion caused by that. [...]
To some degree, that's a failure of the search engine, not of the language. Why can't we type "symbol=+" into the search field and get information about addition? If Google can let you do mathematical calculations in their search field, surely we could search for symbols? But I digress.
"p.subpath('foo', 'bar')" looks like executable pseudocode for creating a new path based on existing one to me,
That notation quite possibly goes beyond unintuitive to downright perverse. You are using a method called "subpath" to generate a *superpath* (deeper, longer path which includes p as a part). http://en.wiktionary.org/wiki/subpath Given: p = /a/b/c q = /a/b/c/d/e # p.subpath(d, e) p is a subpath of q, not the other way around: q is a path PLUS some subdirectories of that path, i.e. a longer path. It's also a pretty unusual term outside of graph theory: Googling finds fewer than 400,000 references to "subpath". It gets used in graphics applications, some games, and in an extension to mercurial for adding symbolic names to repo URLs. I can't see any sign that it is used in the sense you intend.
unlike "p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')".
Okay, I'll grant you that we'll probably never get a consensus on operators + versus / but I really don't understand why you think that p.join is unsuitable for a method which joins path components. -- Steven

On Mon, Oct 8, 2012 at 11:53 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Huh? It's a tree structure. A subpath lives inside its parent path, just as subnodes are children of their parent node. Agreed it's not a widely used term though - it's a generalisation of subdirectory to also cover file paths. They're certainly not "super" anything, any more than a subdirectory is really a superdirectory (which is what you appear to be arguing).
"p.join(r)" has exactly the same problem as "p + r": pass in a string to a function expecting a path object and you get data corruption instead of an exception. When you want *different* semantics, then ducktyping is your enemy and it's necessary to take steps to avoid it, include changing method names and avoiding some operators. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8 October 2012 19:39, Nick Coghlan <ncoghlan@gmail.com> wrote:
Ah, OK. I understand your objection now. I concede that Path.join() is a bad idea based on this. I still don't like subpath() though. And pathjoin() is too likely to be redundant in real code: temp_path = Path(tempfile.mkdtemp()) generated_file = temp_path.pathjoin('data_extract.csv') I can't think of a better term, though :-( Paul

On Tue, 9 Oct 2012 00:09:23 +0530 Nick Coghlan <ncoghlan@gmail.com> wrote:
Well, it's a "subpath", except when it isn't:
I have to admit I didn't understand what your meant by "subpath" until you explained that it was another name for "join". It really don't think it's a good name. child() would be a good name, except for the case above where you join with an absolute path (above). Actually, child() could be a variant of join() which wouldn't allow for absolute arguments. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Nick, I've come to the conclusion that you are right to prefer a named method over an operator for joining paths. But I think you are wrong to name that method "subpath" -- see below. On 09/10/12 05:39, Nick Coghlan wrote:
I believe you mentioned in an earlier email that you invented the term for this discussion. Quote: I made it up by using "make subpath" as the reverse of "get relative path". Unfortunately subpath already has an established meaning, and it is the complete opposite of the sense you intend: paths are trees are graphs, and the graph a->b->c->d is a superpath, not subpath, of a->b->c: a->b->c is strictly contained within a->b->c->d; the reverse is not true. Just as "abcd" is a superstring of "abc", not a substring. Likewise for superset and subset. And likewise for trees (best viewed in a monospaced font): a-b-c \ f-g One can say that the tree a-f-g is a subtree of the whole, but one cannot say that a-f-g-h is a subtree since h is not a part of the first tree.
They're certainly not "super" anything, any more than a subdirectory is really a superdirectory (which is what you appear to be arguing).
Common usage is that "subdirectory" gets used for relative paths: given path /a/b/c/d, we say that "d" is a subdirectory of /a/b/c. I've never come across anyone giving d in absolute terms. Now perhaps I've lived a sheltered life *wink* and people do talk about subdirectories in absolute paths all the time. That's fine. But they don't talk about "subpaths" in the sense you intend, and the sense you intend goes completely against the established sense. The point is, despite the common "sub" prefix, the semantics of "subdirectory" is quite different from the semantics of "substring", "subset", "subtree" and "subpath". -- Steven

Steven D'Aprano wrote:
I think the "sub" in "subdirectory" is more in the sense of "below", rather than "is a part of". Like a submarine is something that travels below the surface of the sea, not something that's part of the sea. -- Greg

Nick Coghlan wrote:
Huh? It's a tree structure. A subpath lives inside its parent path, just as subnodes are children of their parent node.
You're confusing the path, which is a name, with the object that it names. It's called a path because it's the route that you follow from the root to reach the node being named. To reach a subnode of N requires following a *longer* path than you did to reach N. There's no sense in which the *path* to the subnode is "contained" within the path to N -- rather it's the other way around. -- Greg

Just to add my 2p's worth. On 05/10/12 19:25, Antoine Pitrou wrote:
In general I like it.
Class hierarchy ---------------
Lovely ASCII art work :) but it does have have the n*m problem of such hierarchies. N types of file: file, directory, mount-point, drive, root, etc, etc and M implementations Posix, NT, linux, OSX, network, database, etc, etc I would prefer duck-typing. Add ABCs for all the N types of file and use concrete classes for the actual filesystems That way there are N+M rather than N*M classes. Although I'm generally against operator overloading, would the // operator be better than the // operator as it is more rarely used and more visually distinctive? Cheers, Mark.

Hello Mark, On Sat, 06 Oct 2012 11:49:35 +0100 Mark Shannon <mark@hotpy.org> wrote:
There is no distinction per "type of file": files, directories, etc. all share the same implementation. So you only have a per-flavour distinction (Posix / NT).
It seems to me that "duck typing" and "ABCs" are mutually exclusive, kind of :)
You mean "would the / operator be better than the [] operator"? I didn't choose / at first because I knew this choice would be quite contentious. However, if there happens to be a strong majority in its favour, why not. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Responding late, but I didn't get a chance to get my very strong feelings on this proposal in yesterday. I do not like it. I'll give full disclosure and say that I think our earlier failure to include the path library in the stdlib has been a loss for Python and I'll always hope we can fix that one day. I still hold out hope. It feels like this proposal is "make it object oriented, because object oriented is good" without any actual justification or obvious problem this solves. The API looks clunky and redundant, and does not appear to actually improve anything over the facilities in the os.path module. This takes a lot of things we can already do with paths and files and remixes them into a not-so intuitive API for the sake of change, not for the sake of solving a real problem. As for specific problems I have with the proposal: Frankly, I think not keeping the / operator for joining is a huge mistake. This is the number one best feature of path and despite that many people don't like it, it makes sense. It makes our most common path operation read very close to the actual representation of the what you're creating. This is great. Not inheriting from str means that we can't directly path these path objects to existing code that just expects a string, so we have a really hard boundary around the edges of this new API. It does not lend itself well to incrementally transitioning to it from existing code. The stat operations and other file-facilities tacked on feel out of place, and limited. Why does it make sense to add these facilities to path and not other file operations? Why not give me a read method on paths? or maybe a copy? Putting lots of file facilities on a path object feels wrong because you can't extend it easily. This is one place that function(thing) works better than thing.function() Overall, I'm completely -1 on the whole thing. On Fri, Oct 5, 2012 at 2:25 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On Sat, 6 Oct 2012 12:14:40 -0400 Calvin Spealman <ironfroggy@gmail.com> wrote:
Personally, I cringe everytime I have to type `os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two directories upwards of a given path. Compare, with, say:
Really, I don't think os.path is the prettiest or most convenient "battery" in the stdlib.
Ironing out difficulties such as platform-specific case-sensitivity rules or the various path separators is a real problem that is not solved by a os.path-like API, because you can't muck with str and give it the required semantics for a filesystem path. So people end up sprinkling their code with calls to os.path.normpath() and/or os.path.normcase() in the hope that it will appease the Gods of Portability (which will also lose casing information).
As discussed in the PEP, I consider inheriting from str to be a mistake when your intent is to provide different semantics from str. Why should indexing or iterating over a path produce individual characters? Why should Path.split() split over whitespace by default? Why should "c:\\" be considered unequal to "C:\\" under Windows? Why should startswith() work character by character, rather than path component by path component? These are all standard str behaviours that are unhelpful when applied to filesystem paths. As for the transition, you just have to call str() on the path object. Since str() also works on plain str objects (and is a no-op), it seems rather painless to me. (Of course, you are not forced to transition. The PEP doesn't call for deprecation of os.path.)
There is always room to improve and complete the API without breaking compatibility. To quote the PEP: “More operations could be provided, for example some of the functionality of the shutil module”. The focus of the PEP is not to enumerate every possible file operation, but to propose the semantic and syntactic foundations (such as how to join paths, how to divide them into their individual components, etc.).
But you can still define a function() taking a Path as an argument, if you need to. Similarly, you can define a function() taking a datetime object if the datetime object's API lacks some useful functionality for you. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

How about something along this lines: import os class Path(str): def __add__(self,other): return Path(self+os.path.sep+other) def __getitem__(self,i): return self.split(os.path.sep)[i] def __setitem__(self,i,v): items = self.split(os.path.sep) items[i]=v return Path(os.path.sep.join(items)) def append(self,v): self += os.path.sep+v @property def filename(self): return self.split(os.path.sep)[-1] @property def folder(self): items =self.split(os.path.sep) return Path(os.path.sep.join(items[:-1])) path = Path('/this/is/an/example.png') print isinstance(path,str) # True print path[-1] # example.png print path.filename # example.png print path.folder # /this/is/an On Oct 6, 2012, at 12:08 PM, Antoine Pitrou wrote:

I was thinking of the api more than the implementation. The point to me is that it would be nice to have something the behaves as a string and as a list at the same time. Here is another possible incomplete implementation. import os class Path(object): def __init__(self,s='/',sep=os.path.sep): self.sep = sep self.s = s.split(sep) def __str__(self): return self.sep.join(self.s) def __add__(self,other): if other[0]=='': return Path(other) else: return Path(str(self)+os.sep+str(other)) def __getitem__(self,i): return self.s[i] def __setitem__(self,i,v): self.s[i] = v def append(self,v): self.s.append(v) @property def filename(self): return self.s[-1] @property def folder(self): return Path(self.sep.join(self.s[:-1]))
On Oct 6, 2012, at 12:51 PM, Georg Brandl wrote:

Georg Brandl wrote:
If you inherit from str, you cannot override any of the operations that str already has (i.e. __add__, __getitem__).
Is this a 3.x thing? My 2.x version of Path overrides many of the str methods and works just fine.
And obviously you also can't make it mutable, i.e. __setitem__.
Well, since Paths (both Antoine's and mine) are immutable that's not an issue. ~Ethan~

Georg Brandl wrote:
Which is why I would like to see Path based on str, despite Guido's misgivings. (Yes, I know I'm probably tilting at windmills here...) If Path is string based we get backwards compatibility with all the os and third-party tools that expect and use strings; this would allow a gentle migration to using them, as opposed to the all-or-nothing if Path is a completely new type. This would be especially useful for accessing the functions that haven't been added on to Path yet. If Path is string based some questions evaporate: '+'? It does what str does; iterate? Just like str (we can make named methods for the iterations that we want, such as Path.dirs). If Path is string based we still get to use '/' to combine them together (I think that was the preference from the poll... but that could be wishful thinking on my part. ;) ) Even Path.joinpath would make sense to differentiate from Path.join (which is really str.join). Anyway, my two cents worth.

On Fri, 12 Oct 2012 12:23:46 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
It is not all-or-nothing since you can just call str() and it will work fine with both strings and paths. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Antoine Pitrou wrote:
D'oh. You're correct, of course. What I was thinking was along the lines of: --> some_table = Path('~/addresses.dbf') --> some_table = os.path.expanduser(some_table) vs --> some_table = Path('~/addresses.dbf') --> some_table = Path(os.path.expanduser(str(some_table))) The Path/str sandwich is ackward, as well as verbose. ~Ethan~

On Fri, 12 Oct 2012 13:33:14 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
Hey, nice catch, I need to add a expanduser()-alike to the Path API. Thank you! Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, Oct 13, 2012 at 7:00 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
My point about the Path(...(str(...))) sandwich still applies, though, for every function that isn't built in to Path. :)
It's the same situation we were in with the design of the new ipaddress module, and the answer is the same: implicit coercion just creates way too many opportunities for errors to pass silently. We had to create a backwards incompatible version of the language to eliminate the semantic confusion between binary data and text data, we're not going to introduce a similar confusion between arbitrary text strings and objects that actually behave like filesystem paths. str has a *big* API, and much of it doesn't make any sense in the particular case of path objects. In particular, path objects shouldn't be iterable, because it isn't clear what iteration should mean: it could be path segments, it could be parent paths, or it could be directory contents. It definitely *shouldn't* be individual characters, but that's what we would get if it inherited from strings. I do like the idea of introducing a "filesystem path" protocol though (and Antoine's already considering that), which would give us the implicit interoperability without the inheritance of an overbroad API. Something else I've been thinking about is that it still feels wrong to me to be making the Windows vs Posix behavioural decision at the class level. It really feels more like a "decimal.Context" style API would be more appropriate, where there was a PathContext that determined how various operations on paths behaved. The default context would then be determined by the current OS, but you could write: with pathlib.PosixContext: # "\" is not a directory separator # "/" is used in representations # Comparison is case sensitive # expanduser() uses posix rules with pathlib.WindowsContext: # "\" and "/" are directory separators # "\" is used in representations # Comparison is case insensitive Contexts could be tweaked for desired behaviour (e.g. using "/" in representations on Windows) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, 13 Oct 2012 17:41:29 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
:-/ You could make an argument that the Path classes could have their behaviour tweaked with such a context system, but I really think explicit classes for different path flavours are much better design than some thread-local context hackery. Personally, I consider thread-local contexts to be an anti-pattern. (also, the idea that a POSIX path becomes a Windows path based on which "with" statement it's used inside sounds scary) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On 13/10/12 18:41, Nick Coghlan wrote:
Ah, I wondered if anyone else had picked up on that. When I read the PEP, I was concerned about the mental conflict between iteration and indexing of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does something completely different from iterating over p directly. Indexing gives path components; iteration gives children of the path (like os.walk). -1 on iteration over the children. Instead, use: for child in p.walk(): ... which has the huge benefit that the walk method can take arguments as needed, such as the args os.walk takes: topdown=True, onerror=None, followlinks=False plus I'd like to see a "filter" argument to filter which children are (or aren't) seen. +1 on indexing giving path components, although the side effect of this is that you automatically get iteration via the sequence protocol. So be it -- I don't think we should be scared to *choose* an iteration model, just because there are other potential models. Using indexing to get path components is useful, slicing gives you sub paths for free, and if the cost of that is that you can iterate over the path, well, I'm okay with that: p = Path('/usr/local/lib/python3.3/glob.py') list(p) => ['/', 'usr', 'local', 'lib', 'python3.3', 'glob.py'] Works for me. -- Steven

On Sun, 14 Oct 2012 21:48:59 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
p[0] p[1] etc. are just TypeErrors:
So, yes, it's doing "something different", but there is little chance of silent bugs :-)
Judging by its name and signature, walk() would be a recursive operation, while iterating on a path isn't (it only gets you the children).
There is already a .parts property which does exactly that: http://www.python.org/dev/peps/pep-0428/#sequence-like-access The problem with enabling sequence access *on the path object* is that you get confusion with str's own sequencing behaviour, if you happen to pass a str instead of a Path, or the reverse. Which is explained briefly here: http://www.python.org/dev/peps/pep-0428/#no-confusion-with-builtins Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Steven D'Aprano wrote:
I actually prefer Steven's interpretation. If we are going to iterate directly on a path object, we should be yeilding the pieces of the path object. After all, a path can contain a file name (most of mine do) and what sense does it make to iterate over the children of /usr/home/ethanf/some_table.dbf? ~Ethan~

On Sun, 14 Oct 2012 07:50:06 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
Well, given that: 1. sequence access (including the iterator protocol) to the path's parts is already provided through the ".parts" property 2. it makes little sense to actually iterate over those parts (what operations are you going to do sequentially over '/', then 'home', then 'ethanf', etc.?) ... I think yielding the directory contents is a much more useful alternative when iterating over the path itself. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On 14/10/12 23:13, Antoine Pitrou wrote:
Well, that's two people so far who have conflated "p.parts" as just p. Perhaps that's because "parts" is so similar to "path". Since we already refer to the bits of a path as "path components", perhaps this bike shed ought to be spelled "p.components". It's longer, but I bet nobody will miss it. -- Steven

On Sun, Oct 14, 2012 at 8:45 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I would prefer to see p.split() It matches the existing os.path.split() better and I like the idea of a new library matching the old, to be an easier transition for brains. That said, it also looks too much like str.split()
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On 12 October 2012 21:33, Ethan Furman <ethan@stoneleaf.us> wrote:
A lot of them might end up inadvertently converting back to a pure string as well, so a better comparison will in many places be: some_table = Path('~/addresses.dbf')
some_table = Path(os.path.expanduser(some_table))
vs some_table = Path('~/addresses.dbf')
some_table = Path(os.path.expanduser(str(**some_table)))
which is only five characters different. I would also prefer: some_table = Path('~/addresses.dbf')
some_table = Path(os.path.expanduser(some_table.raw()))
or some other method. It just looks nicer to me in this case. Maybe .str(), .chars() or.text(). Additionally, if this is too painful and too often used, we can always make an auxiliary function. some_table = Path('~/addresses.dbf')
some_table = some_table.str_apply(os.path.expanduser)
Where .str_apply takes (func, *args, **kwargs) and you need to wrap the function if it takes the path at a different position. I don't particularly like this option, but it exists.

On 12 October 2012 20:42, Antoine Pitrou <solipsis@pitrou.net> wrote:
I assumed that part of the proposal for including a new Path class was that it would (perhaps eventually rather than immediately) be directly supported by all of the standard Python APIs that expect strings-representing-paths. I apologise if I have missed something but is there some reason why it would be bad for e.g. open() to accept Path instances as they are? I think it's reasonable to require that e.g. os.open() should only accept a str, but standard open()? Oscar

Oscar Benjamin wrote:
I think it's reasonable to require that e.g. os.open() should only accept a str, but standard open()?
Why shouldn't os.open() accept a path object? Especially if we use a protocol such as __strpath__ so that the os module doesn't have to explicitly know about the Path classes. -- Greg

Massimo DiPierro wrote:
Unfortunately, if you subclass from str, I don't think it will be feasible to make indexing return pathname components, because code that's treating it as a string will be expecting it to index characters. Similarly you can't make + mean path concatenation -- it must remain ordinary string concatenation. -- Greg

On Sat, Oct 6, 2012 at 1:08 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I would never do the first version in the first place. I would just join(my_path, "../..") Note that we really need to get out of the habit of "import os" instead of "from os.path import join, etc..." We are making our code uglier and arbitrarily creating many of your concerns by making the use of os.path harder than it should be.
I agree this stuff is difficult, but I think normalizing is a lot more predictable than lots of platform specific paths (both FS and code paths)
Good points, but I'm not convinced that subclasses from string means you can't change these in your subclass.
These are all standard str behaviours that are unhelpful when applied to filesystem paths.
We agree there.
But then I loose all the helpful path information. Something further down the call chain, path aware, might be able to make use of it.
(Of course, you are not forced to transition. The PEP doesn't call for deprecation of os.path.)
If we are only adding something redundant and intend to leave both forever, it only feels like bloat. We should be shrinking the stdlib, not growing it with redundant APIs.
What I meant is that I can't extend it in third party code without being second class. I can add another library that does file operations os.path or stat() don't provide, and they sit side by side.
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On 07/10/12 04:08, Antoine Pitrou wrote:
I would cringe too if I did that, because it goes THREE directories up, not two: py> path = '/a/b/c/d' py> os.path.dirname(os.path.dirname(os.path.dirname(path))) '/a' :)
You know, I don't think I've ever needed to call dirname more than once at a time, but if I was using it a lot: parent = os.path.dirname parent(parent(parent(p)) which is not as short as p.parent(3), but it's still pretty clear. -- Steven

On Sun, 07 Oct 2012 12:41:44 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
Not if d is a file, actually (yes, the formulation was a bit ambiguous). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, Oct 6, 2012 at 12:14 PM, Calvin Spealman <ironfroggy@gmail.com> wrote:
The only reason to have objects for anything is to let people have other implementations that do something else with the same method. I remember one of the advantages to having an object-oriented path API, that I always wanted, is that the actual filesystem doesn't have to be what the paths access. They could be names for web resources, or files within a zip archive, or virtual files on a pretend hard drive in your demo application. That's fantastic to have, imo, and it's something function calls (like you suggest) can't possibly support, because functions aren't extensibly polymorphic. If we don't get this sort of polymorphism of functionality, there's very little point to an object oriented path API. It is syntax sugar for function calls with slightly better type safety (NTPath(...) / UnixPath(...) == TypeError -- I hope.) So I'd assume the reason that these methods exist is to enable polymorphism. As for why your suggested methods don't exist, they are better written as functions because they don't need to be ad-hoc polymorphic, they work just fine as regular functions that call methods on path objects. e.g. def read(path): return path.open().read() def copy(path1, path2): path2.open('w').write(path1.read()) # won't work for very large files, blah blah blah Whereas the open method cannot work this way, because the path should define how file opening works. (It might return an io.StringIO for example.) And the return value of .open() might not be a real file with a real fd, so you can't implement a stat function in terms of open and f.fileno() and such. And so on. -- Devin

On Sat, Oct 6, 2012 at 9:44 PM, Calvin Spealman <ironfroggy@gmail.com> wrote:
The PEP needs to better articulate the rationale, but the key points are: - better abstraction and encapsulation of cross-platform logic so file manipulation algorithms written on Windows are more likely to work correctly on POSIX systems (and vice-versa) - improved ability to manipulate paths with Windows semantics on a POSIX system (and vice-versa) - better support for creation of "mock" filesystem APIs
It trades readability (and discoverability) for brevity. Not good.
It's the exact design philosophy as was used in the creation of the new ipaddress module: the objects in ipaddress must still be converted to a string or integer before they can be passed to other operations (such as the socket module APIs). Strings and integers remain the data interchange formats here as well (although far more focused on strings in the path case).
Indeed, I'm personally much happier with the "pure" path classes than I am with the ones that can do filesystem manipulation. Having both "p.open(mode)" and "open(str(p), mode)" seems strange. OTOH, I can see the attraction in being able to better fake filesystem access through the method API, so I'm willing to go along with it.
Overall, I'm completely -1 on the whole thing.
I find this very hard to square with your enthusiastic support for path.py. Like ipaddr, which needed to clean up its semantic model before it could be included in the standard library (as ipaddress), we need a clean cross-platform semantic model for path objects before a convenience API can be added for manipulating them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Oct 8, 2012 at 1:59 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Frankly, for 99% of file path work, anything I do on one "just works" on the other, and complicating things with these POSIX versus NT path types just seems to be a whole lot of early complication for a few edge cases most people never see. Simplest example is requiring the backslash separator on NT when it handles forward slash, just like POSIX, just fine, and has for a long, long time.
I admit the mock FS intrigues me
I thought it had all three. In these situations, where my and another's perception of a systems strengths and weaknesses are opposite, I don't really know how to make a good response. :-/
I somewhat dislike this because I loved path.py so much and this proposal seems to actively avoid exactly the aspects of path.py that I enjoyed the most (like the / joining).
Cheers, Nick.
path.py was in teh wild, and is still in use. Why do we find ourselves debating new libraries like this as PEPs? We need to let them play out, see what sticks. If someone wants to make this library and stick it on PyPI, I'm not stopping them. I'm encouraging it. Let's see how it plays out. if it works out well, it deserves a PEP. In two or three years. -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On Tue, Oct 9, 2012 at 3:02 AM, Calvin Spealman <ironfroggy@gmail.com>wrote:
I agree, This discussion has been framed unfairly. The only things that should appear in this PEP are the guidelines Guido mentioned earlier in the discussion along with some use cases. So python is chartering a path object module, and we should let whichever module is the best on pypi eventually get into the std-lib. Yuval Greenfield

Yuval Greenfield <ubershmekel@...> writes:
On Tue, Oct 9, 2012 at 3:02 AM, Calvin Spealman
<ironfroggy@gmail.com> wrote:
path.py was in teh wild, and is still in use. Why do we find ourselves
debating new libraries like this as PEPs? We need to let them play out, see what sticks. If someone wants to make this library and stick it on PyPI, I'm not stopping them. I'm encouraging it. Let's see how it plays out. if it works out well, it deserves a PEP. In two or three years.
I agree,
This discussion has been framed unfairly.
path.py (or a similar API) has already been rejected as PEP 355. I see no need to go through this again, at least not in this discussion thread. If you want to re-discuss PEP 355, please open a separate thread. Regards Antoine.

On Tue, Oct 9, 2012 at 4:33 PM, Yuval Greenfield <ubershmekel@gmail.com> wrote:
So python is chartering a path object module, and we should let whichever module is the best on pypi eventually get into the std-lib.
No, the module has to at least have a nodding acquaintance with good software design principles, avoid introducing too many ways to do the same thing, and various other concerns many authors of modules on PyPI often don't care about. That's *why* path.py got rejected in the first place. Just as ipaddress is not the same as ipaddr due to those additional concerns, so will whatever path abstraction makes into the standard library take those concerns into account. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Oct 5, 2012 at 11:25 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Thanks for getting this started! I haven't read the whole PEP or the whole thread, but I like many of the principles, such as not deriving from existing built-in types (str or tuple), immutability, explicitly caring about OS differences, and distinguishing between pure and impure (I/O-using) operations. (Though admittedly I'm not super-keen on the specific term "pure".) I can't say I'm thrilled about overloading p[s], but I can't get too excited about p/s either; p+s makes more sense but that would beg the question of how to append an extension to a path (transforming e.g. 'foo/bar' to 'foo/bar.py' by appending '.py'). At the same time I'm not in the camp that says you can't use / because it's not division. But rather than diving right into the syntax, I would like to focus on some use cases. (Some of this may already be in the PEP, my apologize.) Some things I care about (based on path manipulations I remember I've written at some point or another): - Distinguishing absolute paths from relative paths; this affects joining behavior as for os.path.join(). - Various normal forms that can be used for comparing paths for equality; there should be a pure normalization as well as an impure one (like os.path.realpath()). - An API that encourage Unix lovers to write code that is most likely also to make sense on Windows. - An API that encourages Windows lovers to write code that is most likely also to make sense on Unix. - Integration with fnmatch (pure) and glob (impure). - In addition to stat(), some simple derived operations like getmtime(), getsize(), islink(). - Easy checks and manipulations (applying to the basename) like "ends with .pyc", "starts with foo", "ends with .tar.gz", "replace .pyc extension with .py", "remove trailing ~", "append .tmp", "remove leading @", and so on. - While it's nice to be able to ask for "the extension" it would be nice if the checks above would not be hardcoded to use "." as a separator; and it would be nice if the extension-parsing code could deal with multiple extensions and wasn't confused by names starting or ending with a dot. - Matching on patterns on directory names (e.g. "does not contain a segment named .hg"). - A matching notation based on glob/fnmatch syntax instead of regular expressions. PS. Another occasional use for "posix" style paths I have found is manipulating the path portion of a URL. There are some posix-like features, e.g. the interpretation of trailing / as "directory", the requirement of leading / as root, the interpretation of "." and "..", and the notion of relative paths (although path joining is different). It would be nice if the "pure" posix path class could be reused for this purpose, or if a related class with a subset or superset of the same methods existed. This may influence the basic design somewhat in showing the need for custom subclasses etc. -- --Guido van Rossum (python.org/~guido)

On Sat, 6 Oct 2012 10:44:37 -0700 Guido van Rossum <guido@python.org> wrote:
The proposed API does function like os.path.join() in that respect: when joining a relative path to an absolute path, the relative path is simply discarded:
Impure normalization is done with the resolve() method:
(/etc/ssl/certs being a symlink to /etc/pki/tks/certs on my system) Pure comparison already obeys case-sensitivity rules as well as the different path separators:
Note the case information isn't lost either:
I agree on these goals, that's why I'm trying to avoid system-specific methods. For example is_reserved() is also defined under Unix, it just always returns False:
- Integration with fnmatch (pure) and glob (impure).
This is provided indeed, with the match() and glob() methods respectively.
- In addition to stat(), some simple derived operations like getmtime(), getsize(), islink().
The PEP proposes properties mimicking the stat object attributes:
And methods to query the file type:
Perhaps the properties / methods mix isn't very consistent.
I'll try to reconcile this with Ben Finney's suffix / suffixes proposal.
- Matching on patterns on directory names (e.g. "does not contain a segment named .hg").
Sequence-like access on the parts property provides this:
Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sun, Oct 7, 2012 at 10:37 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I would warn about caching these results on the path object. I can easily imagine cases where I want to repeatedly call stat() because I'm waiting for a file to change (e.g. tail -f does something like this). I would prefer to have a stat() method that always calls os.stat(), and no caching of the results; the user can cache the stat() return value. (Maybe we can add is_file() etc. as methods on stat() results now they are no longer just tuples?)
Sounds cool. I will try to refrain from bikeshedding much more on this proposal; I'd rather focus on reactors and futures... -- --Guido van Rossum (python.org/~guido)

On Sun, Oct 7, 2012 at 7:37 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
What's the use case for this behavior? I'd much rather if joining an absolute path to a relative one fail and reveal the potential bug.... >>> os.unlink(Path('myproj') / Path('/lib')) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: absolute path can't be appended to a relative path

On Sun, 7 Oct 2012 23:15:38 +0200 Yuval Greenfield <ubershmekel@gmail.com> wrote:
In all honesty I followed os.path.join's behaviour here. I agree a ValueError (not TypeError) would be sensible too. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Am 07.10.2012 23:42, schrieb Antoine Pitrou:
Please no -- this is a very important use case (for os.path.join, at least): resolving a path from config/user/command line that can be given either absolute or relative to a certain directory. Right now it's as simple as join(default, path), and i'd prefer to keep this. There is no bug here, it's working as designed. Georg

On Sun, 7 Oct 2012 22:43:02 +0100 Arnaud Delobelle <arnodel@gmail.com> wrote:
I don't know. How does os.path deal with it? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Antoine Pitrou wrote:
Not all that well, apparently. From the docs for os.path: os.path.normcase(path) Normalize the case of a pathname. On Unix and Mac OS X, this returns the path unchanged; on case-insensitive filesystems, it converts the path to lowercase. On Windows, it also converts forward slashes to backward slashes. This is partially self-contradictory, since many MacOSX filesystems are actually case-insensitive; it depends on the particular filesystem concerned. Worse, different parts of the same path can have different case sensitivities. Also, with network file systems, not all paths are necessarily case-insensitive on Windows. So there's really no certain way to compare pure paths for equality. Basing it on which OS is running your code is no more than a guess. -- Greg

On Mon, 08 Oct 2012 11:55:26 +1300 Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
That's true, but considering paths case-insensitive under Windows and case-sensitive under (non-OS X) Unix is still a very good approximation that seems to satisfy most everyone.
So there's really no certain way to compare pure paths for equality. Basing it on which OS is running your code is no more than a guess.
I wonder how well other file-dealing tools cope under OS X, especially those that are portable and not OS X-specific. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Mon, Oct 08, 2012 at 12:00:22PM +0200, Ronald Oussoren <ronaldoussoren@mac.com> wrote:
Or CIFS filesystems mounted on a Linux? Case-sensitivity is a file-system property, not a operating system one.
But there is no API to ask what type of filesystem a path belongs to. So guessing by OS name is the only heuristic we can do. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 8 Oct, 2012, at 13:07, Oleg Broytman <phd@phdru.name> wrote:
I guess so, as neither statvs, statvfs, nor pathconf seem to be able to tell if a filesystem is case insensitive. The alternative would be to have a list of case insentive filesystems and use that that when comparing impure path objects. That would be fairly expensive though, as you'd have to check for every element of the path if that element is on a case insensitive filesystem. Ronald

On Mon, Oct 08, 2012 at 03:59:18PM +0200, Ronald Oussoren <ronaldoussoren@mac.com> wrote:
If a filesystem mounted to w32 is exported from a server by CIFS/SMB protocol -- is it case sensitive? What if said server is Linux? What if said filesystem was actually imported to Linux from a Novel server by NetWare Core Protocol. It's not a fictional situation -- I do it at oper.med.ru; the server is Linux that mounts two CIFS and NCP filesystem and reexport them via Samba. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Tue, Oct 9, 2012 at 1:28 AM, Oleg Broytman <phd@phdru.name> wrote:
And I thought I was weird in using sshfs and Samba together to "bounce" drive access without having to set up SMB passwords for lots of systems... Would it be safer to simply assume that everything's case sensitive until you actually do a filesystem call (a stat or something)? That is, every Pure function works as though the FS is case sensitive? ChrisA

On 8 October 2012 11:28, Oleg Broytman <phd@phdru.name> wrote:
Actually, after just thinking of a few corner cases, (and in this case seen some real world scenarios) it is easy to infer that it is impossible to estabilish for certain that a filesystem, worse, that a given directory, is case-sensitive or not. So, regardless of general passive assumptions, I think Python should include a way to actively verify the filesystem case sensitivity. Something along "assert_case_sensitiveness(<path>)" that would check for a filename in the given path, and try to retrieve it inverting some capitalization. If a suitable filename were not found in the given directory, it could raise an error - or try to make an active test by writtng there (this behavior should be controled by keyword parameters). So, whenever one needs to know about case sensitiveness, there would be one obvious way in place to know for shure, even at the cost of some extra system resources. js -><-

On 8 Oct, 2012, at 16:28, Oleg Broytman <phd@phdru.name> wrote:
Even more fun :-). CIFS/SMB from Windows to Linux or OSX behaves like a case-preserving filesystem on the systems I tested. Likewise a NFS filesystem exported from Linux to OSX behaves like a case sensitive filesystem if the Linux filesystem is case sensitive. All in all the best we seem to be able to do is use the OS as a heuristic, most Unix filesystems are case sensitive while Windows and OSX filesystems are case preserving. Ronald

Ronald Oussoren writes:
We can do better than that heuristic. All of the POSIX systems I know publish mtab by default. The mount utility by default will report the types of filesystems. While a path module should not depend on such information, I suppose[1], there ought to be a way to ask for it. Of course this is still an heuristic (at least some Mac filesystems can be configured to be case sensitive rather than case-preserving, and I don't think this information is available in mtab), but it's far more accurate than using only the OS. Footnotes: [1] Requires a system call or subprocess execution, and since mounts can be dynamically changed, doing it once at module initialization is not good enough.

Ronald Oussoren wrote:
neither statvs, statvfs, nor pathconf seem to be able to tell if a filesystem is case insensitive.
Even if they could, you wouldn't be entirely out of the woods, because different parts of the same path can be on different file systems... But how important is all this anyway? I'm trying to think of occasions when I've wanted to compare two entire paths for equality, and I can't think of *any*. -- Greg

On 10 October 2012 09:16, Ronald Oussoren <ronaldoussoren@mac.com> wrote:
Mercurial had to consider this issue when dealing with repositories built on Unix and being used on Windows. Specifically, it needed to know, if the repository contains files README and ReadMe, could it safely write both of these files without one overwriting the other. Actually, something as simple as an unzip utility could hit the same issue (it's just that it's not as critical to be careful with unzip as with a DVCS system... :-)) I don't know how Mercurial fixed the problem in the end - I believe the in-repo format encodes filenames to preserve case even on case insensitive systems, and I *think* it detects case insensitive filesystems for writing by writing a test file and reading it back in a different case. But that may have changed. Paul

Greg Ewing wrote:
Well, while I haven't had to compare the /entire/ path, I have had to compare (and sort) the filename portion. And since the SMB share uses lower-case, and our legacy FoxPro code writes upper-case, and files get copied from SMB to the local Windows drive, having the case-insensitive compare option in Path makes my life much easier. ~Ethan~

I was hesitant to put mine on PyPI because there's already a slew of others, but for the sake of discussion here it is [1]. Mine is str based, has no actual I/O components, and can easily be used in normal os, shutil, etc., calls. Example usage: job = '12345' home = Path('c:/orders'/job) work = Path('c:/work/') for pdf in glob(work/'*.pdf'): dash = pdf.filename.index('-') dest = home/'reports'/job + pdf.filename[dash:] shutil.copy(pdf, dest) Assuming I haven't typo'ed anything, the above code takes all the pdf files, removes the standard (and useless to me) header info before the '-' in the filename, then copies it over to its final resting place. If I understand Antoine's Path, the code would look something like: job = '12345' home = Path('c:/orders/')[job] work = Path('c:/work/') for child in work: if child.ext != '.pdf': continue name = child.filename dash = name.index('-') dest = home['reports'][name] shutil.copy(str(child), str(dest)) My biggest objections are the extra str calls, and indexing just doesn't look like path concatenation. ~Ethan~ [1]http://pypi.python.org/pypi/strpath P.S. Oh, very nice ascii-art!

On Sat, 06 Oct 2012 13:19:54 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
You could actually write `for child in work.glob('*.pdf')` (non-recursive) or `for child in work.glob('**/*.pdf')` (recursive). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On 05.10.12 21:25, Antoine Pitrou wrote:
PS: You can all admire my ASCII-art skills.
PurePosixPath and PureNTPath looks closer to Path than to PurePath.
The ``parent()`` method returns an ancestor of the path::
p[:-n] is shorter and looks neater than p.parent(n). Possible the ``parent()`` method is unnecessary?

Am 05.10.2012 20:25, schrieb Antoine Pitrou:
I already gave you my +1 on #python-dev. I've some additional ideas that I like to suggest for pathlib. * Jason Orendorff's path module has some methods that are quite useful for shell and find like script. I especially like the files(pattern=None), dirs(pattern=None) and their recursive counterparts walkfiles() and walkdirs(). They make code like recursively remove all pyc files easy to write: for pyc in path.walkfiles('*.py'): pyc.remove() * I like to see a convenient method to format sizes in SI units (for example 1.2 MB, 5 GB) and non SI units (MiB, GiB, aka human readable, multiple of 2). I've some code that would be useful for the task. * Web application often need to know the mimetype of a file. How about a mimetype property that returns the mimetype according to the extension? * Symlink and directory traversal attacks are a constant thread. I like to see a pathlib object that restricts itself an all its offsprings to a directory. Perhaps this can be implemented as a proxy object around a pathlib object? * While we are working on pathlib I like to improve os.listdir() in two ways. The os.listdir() function currently returns a list of file names. This can consume lots of memory for a directory with hundreds of thousands files. How about I implement an iterator version that returns some additional information, too? On Linux and most BSD you can get the file type (d_type, e.g. file, directory, symlink) for free. * Implement "if filename in directory" with os.path.exists(). Christian

On 07/10/12 09:41, Christian Heimes wrote:
Ouch! My source code!!! *grin*
So do I. http://pypi.python.org/pypi/byteformat Although it's only listed as an "alpha" package, that's just me being conservative about allowing changes to the API. The code is actually fairly mature. If there is interest in having this in the standard library, I am more than happy to target 3.4 and commit to maintaining it. -- Steven

Antoine Pitrou <solipsis@pitrou.net> writes:
The term “extension” is a barnacle from mainframe filesystems where a filename is necessarily divided into exactly two parts, the name and the extension. It doesn't really apply to POSIX filesystems. On filesystems where the user has always been free to have any number of parts in a filename, the closest concept is better referred to by the term “suffix”:: >>> p.suffix '.py' It may be useful to add an API method to query the *sequence* of suffixes of a filename:: >>> p = Path('/home/antoine/pathlib.tar.gz') >>> p.name 'pathlib.tar.gz' >>> p.suffix '.gz' >>> p.suffixes ['.tar', '.gz'] Thanks for keeping this proposal active, Antoine. -- \ “In any great organization it is far, far safer to be wrong | `\ with the majority than to be right alone.” —John Kenneth | _o__) Galbraith, 1989-07-28 | Ben Finney

Antoine Pitrou <solipsis@...> writes:
PS: You can all admire my ASCII-art skills.
but you got the direction of the "is a" arrows wrong. see http://en.wikipedia.org/wiki/Class_diagram#Generalization renaud

I would like to see some backwards compatibility here. ;) In other words, add method names where reasonable (such as .child or .children instead of or along with built-in iteration) so that this new Path beast can be backported to the 2.x line. I'm happy to take that task on if Antoine has better uses of his time. What this would allow is a nice shiny toy for the 2.x series, plus an easier migration to 3.x when the time comes. While I am very excited about the 3.x branch, and will use it whenever I can, some projects still have to be 2.x because of other dependencies. If the new Path doesn't have conflicting method or dunder names it would be possible to have a str-based 2.x version that otherwise acted remarkably like the non-str based 3.x version -- especially if the __strpath__ concept takes hold and Path objects can be passed around the os and os.path modules the way strings are now. ~Ethan~

Hi! On Fri, Oct 05, 2012 at 08:25:34PM +0200, Antoine Pitrou <solipsis@pitrou.net> wrote:
This PEP proposes the inclusion of a third-party module, `pathlib`_, in the standard library.
+1 from me for a sane path handling in the stdlib!
Some attributes are properties and some are methods. Which is which? Why .root is a property but .parents() is a method? .owner/.group are properties but .exists() is a method, and so on. .stat() just returns self._stat, but said ._stat is a property!
If I understand it correctly these should are either \\\\some\\share\\myfile.txt and \\\\some\\share or \\some\share\myfile.txt and \\some\share no? Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Fri, Oct 5, 2012 at 9:16 PM, Oleg Broytman <phd@phdru.name> wrote:
Unobvious indeed. Maybe operations that cause OS api calls should have parens? Also, I agree with Paul Moore that the naming at its current state may cause cross-platform bugs. Though I don't understand why not to overload the "/" or "+" operators. Sounds more elegant than square brackets. Just make sure the op fails on anything other than Path objects. I'm +1 on adding such a useful abstraction to python if and only if it were
= os.path on every front,
Yuval Greenfield

On Fri, Oct 05, 2012 at 09:36:56PM +0200, Yuval Greenfield wrote:
Path concatenation is obviously not a form of division, so it makes little sense to use the division operator for this purpose. I always wonder why the designers of C++ felt that it made sense to perform output by left-bitshifting the output stream by a string: std::cout << "hello, world"; Fortunately, operator overloading in Python is generally limited to cases where the operator's meaning is preserved (with the unfortunate exception of the % operator for strings). -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868

On 06/10/12 05:53, Andrew McNabb wrote:
Path concatenation is obviously not a form of division, so it makes little sense to use the division operator for this purpose.
But / is not just a division operator. It is also used for: * alternatives: "tea and/or coffee, breakfast/lunch/dinner" * italic markup: "some apps use /slashes/ for italics" * instead of line breaks when quoting poetry * abbreviations such as n/a b/w c/o and even w/ (not applicable, between, care of, with) * date separator Since / is often (but not always) used as a path separator, using it as a path component join operator makes good sense. BTW, are there any supported platforms where the path separator or alternate path are not slash? There used to be Apple Mac OS using colons. -- Steven

On Sat, Oct 06, 2012 at 08:41:05AM +1000, Steven D'Aprano wrote:
This is the difference between C++ style operators, where the only thing that matters is what the operator symbol looks like, and Python style operators, where an operator symbol is just syntactic sugar. In Python, the "/" is synonymous with `operator.div` and is defined in terms of the `__div__` special method. This distinction is why I hate operator overloading in C++ but like it in Python. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868

Andrew McNabb wrote:
Not sure what you're saying here -- in both languages, operators are no more than syntactic sugar for dispatching to an appropriate method or function. Python just avoids introducing a special syntax for spelling the name of the operator, which is nice, but it's not a huge difference. The same issues of what you *should* use operators for arises in both communities, and it seems to be very much a matter of personal taste. (The use of << for output in C++ has never bothered me, BTW. There are plenty of problems with the way I/O is done in C++, but the use of << is the least of them, IMO...) -- Greg

On Sat, Oct 06, 2012 at 01:54:21PM +1300, Greg Ewing wrote:
To clarify my point: in Python, "/" is not just a symbol--it specifically means "div".
Overriding the div operator requires creating a "__div__" special method, which I think has helped influence personal taste within the Python community. I personally would feel dirty creating a "__div__" method that had absolutely nothing to do with division. Whether or not the sense of personal taste within the Python community is directly attributable to this or not, I believe that overloaded operators in Python tend to be more predictable and consistent than what I have seen in C++. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868

On 07/10/12 08:45, Andrew McNabb wrote:
To clarify my point: in Python, "/" is not just a symbol--it specifically means "div".
I think that's wrong. / is a symbol that means whatever the class gives it. It isn't like __init__ or __call__ that have defined language semantics, and there is no rule that says that / means division. I'll grant you that it's a strong convention, but it is just a convention.
Overriding the div operator requires creating a "__div__" special method,
Actually it is __truediv__ in Python 3. __div__ no longer has any meaning or special status. But it's just a name. __add__ doesn't necessarily perform addition, __sub__ doesn't necessarily perform subtraction, and __or__ doesn't necessarily have anything to do with either bitwise or boolean OR. Why should we insist that __*div__ (true, floor or just plain div) must only be used for numeric division when we don't privilege other numeric operators like that? -- Steven

On Tue, Oct 09, 2012 at 12:03:55AM +1100, Steven D'Aprano wrote:
I'll grant you that the semantics of the __truediv__ method are defined by convention.
__add__ for strings doesn't mean numerical addition, but people find it perfectly natural to speak of "adding two strings," for example. Seeing `string1.__add__(string2)` is readable, as is `operator.add(string1, string2)`. Every other example of operator overloading that I find tasteful is analogous enough to the numerical operators to retain use the name. Since this really is a matter of personal taste, I'll end my participation in this discussion by voicing support for Nick Coghlan's suggestion of a `join` method, whether it's named `join` or `append` or something else. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868

http://en.wikipedia.org/wiki/List_of_mathematical_symbols#Symbols The + symbol means addition and union of disjoint sets. A path (including a fs path) is a set of links (for a fs path, a link is a folder name). Using the + symbols has a natural interpretation as concatenation of subpaths (sets) to for form a longer path (superset). The / symbol means the quotient of a group. It always returns a subgroup. When I see path1 / path2 I would expect it to return all paths that start by path2 or contain path2, not concatenation. The fact that string paths in Unix use the / to represent concatenation is accidental. That's just how the path is serialized into a string. In fact Windows uses a different separator. I do think a serialized representation of an object makes a good example for its abstract representation. Massimo On Oct 8, 2012, at 11:06 AM, Andrew McNabb wrote:

Massimo DiPierro wrote:
The fact that string paths in Unix use the / to represent concatenation is accidental.
Maybe so, but it can be regarded as a fortuitous accident, since / also happens to be an operator in Python, so it would have mnemonic value to Unix users. The correspondence is not exact for Windows users, but / is similar enough to still have some mnemonic value for them. And all the OSes using other separators seem to have died out. -- Greg

Massimo DiPierro wrote:
A reason *not* to use '+' is that it would violate associativity in some cases, e.g. (path + "foo") + "bar" would not be the same as path + ("foo" + "bar") Using '/', or any other operator not currently defined on strings, would prevent this mistake from occuring. A reason to want an operator is the symmetry of path concatenation. Symmetrical operations deserve a symmetrical syntax, and to achieve that in Python you need either an operator or a stand-alone function. A reason to prefer an operator over a function is associativity. It would be nice to be able to write path1 / path2 / path3 and not have to think about the order in which the operations are being done. If '/' is considered too much of a stretch, how about '&'? It suggests a kind of addition or concatenation, and in fact is used for string concatenation in some other languages. -- Greg

On 09/10/2012 16:30, Michele Lacchia wrote:
But why not interpret a path as a tuple (not a list, it's immutable) of path segments and have: path + ("foo", "bar") and path + ".tar.gz" behave different (i.e. tuples add segments and strings add to the last segment)? And of course path1 + path2 adds the segments together. Joachim

On Mon, 8 Oct 2012 10:06:17 -0600 Andrew McNabb <amcnabb@mcnabbs.org> wrote:
The join() method already exists in the current PEP, but it's less convenient, synctatically, than either '[]' or '/'. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Tue, Oct 9, 2012 at 12:10 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Right. My objections boil down to: 1. The case has not been adequately made that a second way to do it is needed. Therefore, the initial version should just include the method API. 2. Using "join" as the method name is a bad idea for the same reason that using "+" as the operator syntax would be a bad idea: it can cause erroneous output instead of an exception if a string is passed where a Path object is expected. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Oct 8, 2012 at 11:49 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
It took me a while before I realized that 'abc'.join('def') already has a meaning (returning 'dabceabcf'). But yes, this makes it a poor choice for a Path method. -- --Guido van Rossum (python.org/~guido)

On Tue, 9 Oct 2012 00:19:03 +0530 Nick Coghlan <ncoghlan@gmail.com> wrote:
But you really want a short method name, otherwise it's better to have a dedicated operator. joinpath() definitely doesn't cut it, IMO. (perhaps that's the same reason I am reluctant to use str.format() :-)) By the way, I also thought of using __call__, but for some reason I think it tastes a bit bad ("too clever"?).
Admitted, although I think the potential for confusion is smaller than with "+" (I can't really articulate why, it's just that I fear one much less than the other :-)). Regards Antione. -- Software development and contracting: http://pro.pitrou.net

On Mon, Oct 8, 2012 at 11:56 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Maybe you're overreacting? The current notation for this operation is os.path.join(p, q) which is even longer than p.pathjoin(q). To me the latter is fine.
__call__ overloading is often overused. Please don't go there. It is really hard to figure out what some (semi-)obscure operation means if it uses __call__ overloading.
Personally I fear '+' much more -- to me, + can be used to add an extension without adding a new directory level. If we *have* to overload an operator, I'd prefer p/q over p[q] any day. -- --Guido van Rossum (python.org/~guido)

On 8 October 2012 20:14, Stefan Krah <stefan@bytereef.org> wrote:
On the basis that we want standard libraries to be non-contentious issues: is it not obvious that "+", "/" and "[]" *cannot* be the right choices as they're contentious? I would argue that a lot of this argument is “pointless” because there is no right answer. For example, I prefer indexing out of the lot, but since a lot of people really dislike it I'm not going to bother vouching for it. I think we should ague more along the lines of: # Possibility for accidental validity if configdir is a string
configdir.join("myprogram")
# A bit long
# There's argument here, but I don't find them intuitive or nice
configdir.subpath("mypogram") configdir.superpath("mypogram")
# My favorites ('cause my opinion: so there)
# What I'm surprised (but half-glad) hasn't been mentioned configdir.cd("myprogam") # Not a link, just GMail's silly-ness We already know the semantics for the function; now it's *just a name*.

I was just thinking the same thing. My preference for this at the moment is 'append', notwithstanding the fact that it will be non-mutating. It's a single, short word, it avoids re-stating the datatype, and it resonates with the idea of appending to a sequence of path components.
# My favorites ('cause my opinion: so there) configdir.child("myprogram") # Does sorta' imply IO
Except that the result isn't always a child (the RHS could be an absolute path, start with "..", etc.)
configdir.cd("myprogam")
Aaaghh... my brain... the lobotomy does nothing... -- Greg

Stefan Krah wrote:
'^' or '@' are used for concatenation in some languages. At least accidental confusion with xor is pretty unlikely.
We'd have to add '@' as a new operator before we could use that. But '^' might have possibilities... if you squint, it looks a bit like a compromise between Unix and Windows path separators. :-) -- Greg

On Tue, Oct 9, 2012 at 12:34 AM, Guido van Rossum <guido@python.org> wrote:
Yes, of all the syntactic shorthands, I also favour "/". However, I'm also a big fan of starting with a minimalist core and growing it. Moving from "os.path.join(a, b, c, d, e)" (or, the way I often write it, "joinpath(a, b, c, d, e)") to "a.joinpath(b, c, d, e)" at least isn't going backwards, and is more obvious in isolation than "a / b / c / d / e". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan wrote:
I think we should keep in mind that we're (hopefully) not going to see things like "a / b / c / d / e" in real-life code. Rather we're going to see things like backupath = destdir / "archive" / filename + ".bak" In other words, there should be some clue from the names that paths are involved, from which it should be fairly easy to guess what the "/" means. -- Greg

Antoine Pitrou wrote:
But you really want a short method name, otherwise it's better to have a dedicated operator. joinpath() definitely doesn't cut it, IMO.
I agree, it's far too longwinded. It would clutter your code just as badly as using os.path.join() all over the place does now, but without the option of aliasing it to a shorter name. -- Greg

On 9 October 2012 06:41, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Good point - the fact that it's not possible to alias a method name means that it's important to get the name right if we're to use a method, because we're all stuck with it forever. Because of that, I'm much more reluctant to "just put up with" Path.pathjoin on the basis that it's better than any other option. Are there any libraries that use a method on a path object (or something similar - URL objects, maybe) and if so, what method name did they use? I'd like to see what real code using any proposed method name would look like. As a point of reference, twisted's FilePath class uses "child". Paul

On Tue, Oct 09, 2012 at 08:36:58AM +0100, Paul Moore wrote:
Huh? py> f = str.join # "join" is too long and I don't like it py> f("*", ["spam", "ham", "eggs"]) 'spam*ham*eggs' We should get the name right because we're stuck with it forever due to backwards compatibility, not because you can't alias it. -- Steven

On Oct 8, 2012, at 3:47 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I'd prefer 'append', because
path.append("somedir", "file.txt")
+1 In so many ways, I see a path as a list of its components. Because of that, path.append and path.extend, with similar semantics to list.append and list.extend, makes a lot of sense to me. When I think about a path as a list of components rather than as a string, the '+' operator starts to make sense for joins as well. I'm OK with using the '/' for path joining as well, because the parallel with list doesn't fit in this case, although I understand Massimo's objection to it. In very many ways, I like thinking of a path as a list (slicing, append, etc). The fact that list.append doesn't return the new list has always bugged me, but if we were to use append and extend, they should mirror the semantics from list. I'm much more inclined to think of path as a special list than as a special string. Ryan

On Mon, Oct 8, 2012 at 4:47 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
As Nick noted, the problem is that append() conflicts with MutableSequence.append(). If someone subclasses Path and friends to act like a list then it complicates the situation. In my mind the name should be one that is not already in use by strings or sequences. -eric

On 09/10/2012 00:47 Greg Ewing wrote:
As has already been stated by others, paths are immutable so using them like lists is leading to confusion (and list's append() only wants one arg, so extend() might be better in that case). But paths could then be interpreted as tuples of "directory entries" instead. So adding a path to a path would "join" them: pathA + pathB and in order to not always need a path object for pathB one could also write the right argument of __add__ as a tuple of strings: pathA + ("somedir", "file.txt") One could also use "+" for adding to the last segment if it isn't a path object or a tuple: pathA + ".tar.gz" Joachim

On Oct 9, 2012, at 1:18 AM, Joachim König <him@online.de> wrote:
I like it. As you pointed out, my comparison with list is inappropriate because of path's immutability. So .append() and .extend() probably don't make sense.
One could also use "+" for adding to the last segment if it isn't a path object or a tuple:
pathA + ".tar.gz"
This might be a reasonable way to appease both those who are viewing path as a special tuple and those who are viewing it as a special string. It breaks the parallel with tuple a bit, but it's clear that there are important properties of both strings and tuples that would be nice to preserve. Ryan

On Oct 9, 2012, at 10:11 AM, Eric V. Smith <eric@trueblade.com> wrote:
or pathA + Path("file.txt") Just like with any tuple, if you wish to add a new part, it must be a tuple (Path) first. I'm not convinced that adding a string to a path should be allowed, but if not then we should probably throw a TypeError if its not a tuple or Path. That would leave the following method for appending a suffix: path[:-1] + Path(path[-1] + '.tar.gz') That's alot more verbose than the option to "add a string". Ryan

On 09.10.2012 19:11, Eric V. Smith wrote:
You could of course write: pathA + "/file.txt" because with a separator it's still explicit. But this requires clarification because "/file.txt" could be considered an absolut path. But IMO the string additionen should be concatenation. YMMV. Joachim

On 06/10/12 09:54, Andrew McNabb wrote:
I'm afraid that it's a distinction that seems meaningless to me. int + int and str + str are not the same, even though the operator symbol looks the same. Likewise int - int and set - set are not the same even though they use the same operator symbol. Similarly for & and | operators. For what it is worth, when I am writing pseudocode on paper, just playing around with ideas, I often use / to join path components: open(path/name) # pseudo-code sort of thing, so I would be much more comfortable writing either of these: path/"name.txt" path+"name.txt" than path["name.txt"] which looks like it ought to be a lookup, not a constructor. -- Steven

On Fri, 5 Oct 2012 23:16:25 +0400 Oleg Broytman <phd@phdru.name> wrote:
parents() returns a generator (hence the list() call in the example above). A generator-returning property sounds a bit too confusing IMHO. ._stat is an implementation detail. stat() and exists() both mirror similar APIs in the os / os.path modules. .name, .ext, .root, .parts just return static, immutable properties of the path, I see no reason for them to be methods.
Ah, right. I'll correct it. Thanks Antoine. -- Software development and contracting: http://pro.pitrou.net

On 5 October 2012 19:25, Antoine Pitrou <solipsis@pitrou.net> wrote:
There is a risk that this is too "cute". However, it's probably better than overloading the '/' operator, and you do need something short.
That's risky. Are you proposing always using '/' regardless of OS? I'd have expected os.sep (so \ on Windows). On the other hand, that would make p['bar\\baz'] mean two different things on Windows and Unix - 2 extra path levels on Windows, only one on Unix (and a filename containing a backslash). It would probably be better to allow tuples as arguments: p['bar', 'baz']
I don't like the way the distinction between "root" and "anchor" works here. Unix users are never going to use "anchor", as "root" is the natural term, and it does exactly the right thing on Unix. So code written on Unix will tend to do the wrong thing on Windows (where generally you'd want to use "anchor" or you'll find yourself switching accidentally to the current drive). It's a rare situation where it would matter, which on the one hand makes it much less worth worrying about, but on the other hand means that when bugs *do* occur, they will be very obscure :-( Also, there is no good terminology in current use here. The only concrete thing I can suggest is that "root" would be better used as the term for what you're calling "anchor" as Windows users would expect the root of "C:\foo\bar\baz" to be "C:\". The term "drive" would be right for "C:" (although some might expect that to mean "C:\" as well, but there's no point wasting two terms on the one concept). It might be more practical to use a new, but explicit, term like "driveroot" for "\". It's the same as root on Unix, and on Windows it's fairly obviously "the root on the current drive". And by using the coined term for the less common option, it might act as a reminder to people that something not entirely portable is going on. But there's no really simple answer - Windows and Unix are just different here.
+1. There's lots of times I have wished os.path had this.
This again suggests to me that "C:\" is more closely allied to the term "root" here. Also, I assume that paths will be comparable, using case sensitivity appropriate to the platform. Presumably a PurePath and a Path are comparable, too. What about a PosixPath and an NTPath? Would you expect them to be comparable or not? But in general, this looks like a pretty good proposal. Having a decent path abstraction in the stdlib would be great. Paul.

Paul Moore wrote:
I actually like using the '/' operator for this. My own path module uses it, and the resulting code is along the lines of: job = Path('c:/orders/38273') table = dbf.Table(job/'ABC12345')
Mine does; it also accepts `\\` on Windows machines. Personally, I don't care for the index notation Antoine is suggesting. ~Ethan~

On Fri, 5 Oct 2012 20:19:12 +0100 Paul Moore <p.f.moore@gmail.com> wrote:
I think overloading '/' is ugly (dividing paths??). Someone else proposed overloading '+', which would be confusing since we need to be able to combine paths and regular strings, for ease of use. The point of using __getitem__ is that you get an error if you replace the Path object with a regular string by mistake:
If you were to use the '+' operator instead, 'foo' + 'bar' would work but give you the wrong result.
Both '/' and '\\' are accepted as path separators under Windows. Under Unix, '\\' is a regular character:
It would probably be better to allow tuples as arguments:
p['bar', 'baz']
It already works indeed:
Well, I expect .root or .anchor to be used mostly for presentation or debugging purposes. There's nothing really useful to be done with them otherwise, IMHO. Do you know of any use cases?
But then the root of "C:foo" would be "C:", which sounds wrong: "C:" isn't a root at all.
But there's no really simple answer - Windows and Unix are just different here.
Yes, and Unix users are expecting something simpler than what's going on under Windows ;)
Currently, different flavours imply unequal (and unorderable) paths:
However, pure paths and concrete paths of the same flavour can be equal, and ordered:
Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Antoine Pitrou wrote:
But '/' is the normal path separator, so it's not dividing; and it certainly makes more sense than `%` with string interpolations. ;)
I would rather use the `/` and `+` and risk the occasional wrong result. (And yes, I have spent time tracking bugs because of that wrong result when using my own Path module -- and I'd still rather make that trade-off.) ~Ethan~

+1 in general. I like to have library like that in the battery. I would to see the note why [] used instead / or + in the pep while I'm agree with that. +0 for / -1 for + For method/property decision I guess (maybe stupid) rule: properties for simple accessors and methods for operations which require os calls. With exception for parents() as method which returns generator. On Fri, Oct 5, 2012 at 11:06 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
-- Thanks, Andrew Svetlov

Antoine Pitrou wrote:
Well, I expect .root or .anchor to be used mostly for presentation or debugging purposes.
I'm having trouble thinking of *any* use cases, even for presentation or debugging. Maybe they should be dropped altogether until someone comes up with a use case. -- Greg

On Fri, Oct 5, 2012 at 1:55 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I think overloading '/' is ugly (dividing paths??).
Agreed. +1 on the proposed API in this regard. It's pretty easy to grok. I also like that item access here mirrors how paths are treated as sequences/iterables in other parts of the API. It wouldn't surprise me if the join syntax is the most contentious part of the proposal. ;) -eric

Antoine Pitrou writes:
I didn't like this much at first. However, if you think of this as a "collection" (cf. WebDAV), then the bracket notation is the obvious way to do it in Python (FVO "it" == "accessing a member of a collection by name"). I wonder if there is a need to distinguish between a path naming a directory as a collection, and as a file itself? Or can/should this be implicit (wash my mouth out with soap!) in the operation using the Path?
Is it really that obnoxious to write "p + Path('bar')" (where p is a Path)? What about the case "'bar' + p"? Since Python isn't C, you can't express that as "'bar'[p]"!
That's outright ugly, especially from the "collections" point of view (foo/bar/xyzzy is not a member of foo). If you want something that doesn't suffer from the bogosities of os.path, this kind of platform- dependence should be avoided, I think.
Why not interpret the root of "C:foo" to be None? The Windows user can still get "C:" as the drive, and I don't think that will be surprising to them.
Well, Unix users can do things more uniformly. But there's also a lot of complexity going on under the hood. Every file system has a root, of which only one is named "/". I don't know if Python programs ever need that information (I never have :-), but it would be nice to leave room for extension. Similarly, many "file systems" are actually just hierarchically organized database access methods with no physical existence on hardware. I wonder if "mount_point" is sufficiently general to include the roots of real local file systems, remote file systems, Windows drives, and pseudo file systems? An obvious problem is that Windows users would not find that terminology natural.

On 6 October 2012 09:39, Stephen J. Turnbull <turnbull@sk.tsukuba.ac.jp> wrote:
Technically, newer versions of Windows (Vista and later, I think) allow you to mount a drive on a directory rather than a drive letter, just like Unix. Although I'm not sure I've ever seen it done, and I don't know if there are suitable API calls to determine if a directory is a mount point (I guess there must be). An ugly, but viable, approach would be to have drive and mount_point properties, which are synonyms. Paul.

On Sat, 06 Oct 2012 17:39:13 +0900 "Stephen J. Turnbull" <turnbull@sk.tsukuba.ac.jp> wrote:
I don't think there's a need to distinguish. Trying to access /etc/passwd/somefile will simply raise an error on I/O.
The issue I envision is if you write `p + "bar"`, thinking p is a Path, and p is actually a str object. It won't raise, but give you the wrong result.
Well, you do want to be able to convert str paths to Path objects without handling path separator conversion by hand. It's a matter of practicality.
That's a possibility indeed. I'd like to have feedback from more Windows users about your suggestion:
which would also give the following for UNC paths:
PureNTPath('//network/share/foo/bar').root '\\\\network\\share\\'
Another is that finding mount points is I/O, while finding the root is a purely lexical operation. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Antoine Pitrou writes:
No, my point is that for me prepending new segments is quite common, though not as common as appending them. The asymmetry of the bracket operator means that there's no easy way to deal with that. On the other hand, `p + Path('foo')` and `Path('foo') + p` (where p is a Path, not a string) both seem reasonable to me. It's true that one could screw up as you suggest, but that requires *two* mistakes, first thinking that p is a Path when it's a string, and then forgetting to convert 'bar' to Path. I don't think that's very likely if you don't allow mixing strings and Paths without explicit conversion.
Sorry, cut too much context. I was referring to the use of path['foo/bar'] where path['foo', 'bar'] will do. Of course overloading the constructor is an obvious thing to do.

Am 06.10.2012 16:49, schrieb Stephen J. Turnbull:
But having to call Path() explicitly every time is not very convenient either; in that case you can also call .join() -- and I bet people would prefer p + Path('foo/bar/baz') (which is probably not correct in all cases) to p + Path('foo') + Path('bar') + Path('baz') just because it's such a pain. On the other hand, when the explicit conversion is not needed, confusion will ensue, as Antoine says. In any case, for me using "+" to join paths is quite ugly. I guess it's because after all, I think of the underlying path as a string, and "+" is hardwired in my brain as string concatenation (at least in Python). Georg

Stephen J. Turnbull wrote:
On the other hand, `p + Path('foo')` and `Path('foo') + p` (where p is a Path, not a string) both seem reasonable to me.
I don't like the idea of using + as the path concatenation operator, because path + ".c" is an obvious way to add an extension or other suffix to a filename, and it ought to work. -- Greg

Antoine Pitrou wrote:
I'm all for eliminating extra '.'s, but shouldn't extra '/'s be an error?
What's the use-case for iterating through all the parent directories? Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'), and I export it to .csv in the same folder; how would I transform the above PureNTPath's ext from 'dbf' to 'csv'? ~Ethan~

On Fri, Oct 05, 2012 at 02:38:57PM -0700, Ethan Furman <ethan@stoneleaf.us> wrote:
Why? They aren't errors in the underlying OS.
for parent in p.parents(): if parent['.svn'].exists(): last_seen = parent continue else: print("The topmost directory of the project: %s" % last_seen) break Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Oleg Broytman wrote:
They are on Windows (no comment on whether or not it qualifies as an OS ;). c:\temp>dir \\\\\temp The filename, directory name, or volume label syntax is incorrect. c:\temp>dir \\temp The filename, directory name, or volume label syntax is incorrect. Although I see it works fine in between path pieces: c:\temp\34400>dir \temp\\\34400 [snip listing]
Cool, thanks. ~Ethan~

On 10/06/2012 12:21 AM, Ethan Furman wrote:
\\ at the start of a path has a special meaning under windows: http://en.wikipedia.org/wiki/UNC_path#Uniform_Naming_Convention

On Sat, 06 Oct 2012 00:47:28 +0200 Mathias Panzenböck <grosser.meister.morti@gmx.net> wrote:
\\ at the start of a path has a special meaning under windows: http://en.wikipedia.org/wiki/UNC_path#Uniform_Naming_Convention
And indeed the API preserves them: >>> PurePosixPath('//some/path') PurePosixPath('/some/path') >>> PureNTPath('//some/path') PureNTPath('\\\\some\\path\\') Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Fri, 05 Oct 2012 14:38:57 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
Something like:
Any suggestion to ease this use case a bit? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, 06 Oct 2012 01:27:49 +0100 Richard Oudkerk <shibturn@gmail.com> wrote:
Wouldn't there be some confusion with os.path.basename:
Richard
-- Software development and contracting: http://pro.pitrou.net

On Sat, 06 Oct 2012 01:27:49 +0100 Richard Oudkerk <shibturn@gmail.com> wrote:
Wouldn't there be some confusion with os.path.basename:
os.path.basename('a/b/c.ext') 'c.ext'
(sorry for the earlier, unfinished reply) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Antoine Pitrou writes:
Not to mention standard Unix usage. GNU basename will allow you to specify a *particular* extension explicitly, which will be stripped if present and otherwise ignored. Eg, "basename a/b/c.ext ext" => "c." (note the period!) and "basename a/b/c ext" => "c". I don't know if that's an extension to POSIX. In any case, it would require basename to be a method rather than a property.
(sorry for the earlier, unfinished reply)
Also there are applications where "basenames" contain periods (eg, wget often creates directories with names like "www.python.org"), and filenames may have multiple extensions, eg, "index.ja.html". I think it's reasonable to define "extension" to mean "the portion after the last period (if any, maybe including the period), but I think usage of the complementary concept is pretty application- specific.

Stephen J. Turnbull wrote:
I wouldn't worry too much about this; after all, we are trying to replace a primitive system with a more advanced, user-friendly one.
FWIW, my own implementation uses the names .path -> c:\foo\bar or \\computer_name\share\dir1\dir2 .vol -> c: \\computer_name\share .dirs -> \foo\bar \dir1\dir2 .filename -> some_file.txt or archive.tar.gz .basename -> some_file archive .ext -> .txt .tar.gz ~Ethan~

How about making a path object behave like a sequence of pathname components? Then * You can iterate over it directly instead of needing .parents() * p[:-1] gives you the dirname * p[-1] gives you the os.path.basename -- Greg

Ethan Furman writes:
How about a more general subst() method? Indeed, it would need keyword arguments for named components like ext, but I often do things like "mv ~/Maildir/{tmp,new}/42" in the shell. I think it would be useful to be able to replace any component of a path.

On Sat, Oct 06, 2012 at 05:04:44PM +0900, "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
I think this would be overgeneralization. IMO there is no need to replace parts beyond drive/name/extension. To "replace" root or path components just construct a new Path. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, Oct 06, 2012 at 11:44:02AM -0700, Ethan Furman <ethan@stoneleaf.us> wrote:
Yes. Even if the new path differs from the old by one letter somewhere in a middle component. "Practicality beats purity". We need to see real use cases to decide what is really needed. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Fri, 5 Oct 2012 23:16:55 -0600 Eric Snow <ericsnowcurrently@gmail.com> wrote:
The concrete Path objects' replace() method already maps to os.replace(). Note os.replace() is new in 3.3 and is a portable always-overwriting alternative to os.rename(): http://docs.python.org/dev/library/os.html#os.replace Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, Oct 06, 2012 at 02:09:24PM +0200, Antoine Pitrou <solipsis@pitrou.net> wrote:
Call it "with": newpath = path.with_drive('C:') newpath = path.with_name('newname') newpath = path.with_ext('.zip') Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman <phd@phdru.name> wrote:
BTW, I think having these three -- replacing drive, name and extension -- is enough. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, 6 Oct 2012 16:40:49 +0400 Oleg Broytman <phd@phdru.name> wrote:
What is the point of replacing the drive? Replacing the name is already trivial: path.parent()[newname] So we only need to replace the "basename" and the extension (I think I'm ok with the "basename" terminology now :-)). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, Oct 06, 2012 at 02:46:35PM +0200, Antoine Pitrou <solipsis@pitrou.net> wrote:
I'm ok with that. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, 06 Oct 2012 14:55:16 +0200 Georg Brandl <g.brandl@gmx.net> wrote:
Well, "basename" is the name proposed for the "part before the extension". "name" is the full filename. (so path.name == path.basename + path.ext) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, 06 Oct 2012 15:08:27 +0200 Georg Brandl <g.brandl@gmx.net> wrote:
True, but since we already have the name attribute it stands reasonable for basename to mean something else than name :-) Do you have another suggestion? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, Oct 6, 2012 at 3:42 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
It appears "base name" or "base" is the convention for the part before the extension. http://en.wikipedia.org/wiki/Filename Perhaps os.path.basename should be deprecated in favor of a better named function one day. But that's probably for a different thread.

On Sat, Oct 06, 2012 at 03:49:49PM +0200, Yuval Greenfield <ubershmekel@gmail.com> wrote:
Perhaps os.path.basename should be deprecated in favor of a better named function one day. But that's probably for a different thread.
That's certainly for a different Python. os.path.basename cannot be renamed because: 1) it's used in millions of programs; 2) it's in line with GNU tools. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Antoine Pitrou wrote:
If we have a method for replacing the extension, I don't think we have a strong need a name for "all of the last name except the extension", because usually all you want that for is so you can add a different extension (possibly empty). So I propose to avoid the term "basename" altogether, and just have path.name --> all of the last component path.ext --> the extension path.with_name(foo) -- replaces all of the last component path.with_ext(ext) -- replaces the extension Then if you really want to extract the last component without the extension (which I expect to be a rare requirement), you can do path.with_ext("").name -- Greg

Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
This is based on the false concept that there is one “extension” in a filename. On POSIX filesystems, that's just not true; filenames often have several suffixes in sequence, e.g. ‘foo.tar.gz’ or ‘foo.pg.sql’, and each one conveys meaningful intent by whoever named the file.
+1 on avoiding the term “basename” for anything to do with the concept being discussed here, since it already has a different meaning (“the part of the filename without any leading directory parts”). −1 on entrenching this false concept of “the extension” of a filename. -- \ Eccles: “I'll get [the job] too, you'll see. I'm wearing a | `\ Cambridge tie.” Greenslade: “What were you doing there?” | _o__) Eccles: “Buying a tie.” —The Goon Show, _The Greenslade Story_ | Ben Finney

Ben Finney wrote:
When I talk about "the extension", I mean the last one. The vast majority of the time, that's all you're interested in -- you unwrap one layer of the onion at a time, and leave the rest for the next layer of software up. That's not always true, but it's true often enough that I think it's worth having special APIs for dealing with the last dot-suffix. -- Greg

Antoine Pitrou wrote:
I do not.
What is the point of replacing the drive?
At my work we have identical path structures on several machines, and we sometimes move entire branches from one machine to another. In those instances it is good to be able to change from one drive/mount/share to another.
Replacing the name is already trivial: path.parent()[newname]
Or, if '/' is allowed, path.path/newname. I can see the reasonableness of using indexing (to me, it sorta looks like a window onto the path ;) ), but I prefer other methods when possible (tender wrists -- arthritis sucks) ~Ethan~

Antoine Pitrou writes:
``relative()`` returns a new relative path by stripping the drive and root::
Does this have use cases so common that it deserves a convenience method? I would expect "relative" to require an argument. (Ie, I would expect it to have the semantics of "relative_to".) Or is the issue that you can't count on PureNTPath(p).relative_to('C:\\') to DTRT? Maybe the

On 6 October 2012 11:09, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Agreed.
I would expect "relative" to require an argument. (Ie, I would expect it to have the semantics of "relative_to".)
I agree that's what I thought relative() would be when I first read the name.
It seems to me that if p isn't on drive C:, then the right thing is clearly to raise an exception. No ambiguity there - although Unix users might well write code that doesn't allow for exceptions from the method, just because it's not a possible result on Unix. Having it documented might help raise awareness of the possibility, though. And that's about the best you can hope for. Paul.

On Sat, 6 Oct 2012 11:27:58 +0100 Paul Moore <p.f.moore@gmail.com> wrote:
You are right, relative() could be removed and replaced with the current relative_to() method. I wasn't sure about how these names would feel to a native English speaker.
Indeed:
Actually, it can raise too:
You can't really add '..' components and expect the result to be correct, for example if '/usr/lib' is a symlink to '/lib', then '/usr/lib/..' is '/', not /usr'. That's why the resolve() method, which resolves symlinks along the path, is the only one allowed to muck with '..' components. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

I've said before that I like the general shape of the pathlib API and that's still the case. It's the only OO API I've seen that's semantically clean enough for me to support introducing it as "the" standard path abstraction in the standard library. However, there are still a few rough edges I would like to see smoothed out :) On Sat, Oct 6, 2012 at 5:48 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
The minor problem is that "relative" on its own is slightly unclear about whether the invariant involved is "a == b.subpath(a.relative(b))" or "b == a.subpath(a.relative(b))" By including the extra word, the intended meaning becomes crystal clear: "a == b.subpath(a.relative_to(b))" However, "a relative to b" is the more natural interpretation, so +1 for using "relative" for the semantics of the method based equivalent to the current os.path.relpath(). I agree there's no need for a shorthand for "a.relative(a.root)" As the invariants above suggest, I'm also currently -1 on *any* of the proposed shorthands for "p.subpath(subpath)", *as well as* the use of "join" as the method name (due to the major difference in semantics relative to str.join). All of the shorthands are magical and/or cryptic and save very little typing over the explicitly named method. As already noted in the PEP, you can also shorten it manually by saving the bound method to a local variable. It's important to remember that you can't readily search for syntactic characters or common method names to find out what they mean, and these days that kind of thing should be taken into account when designing an API. "p.subpath('foo', 'bar')" looks like executable pseudocode for creating a new path based on existing one to me, unlike "p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')". The method semantics are obvious by comparison, since they would be the same as those for ordinary construction: "p.subpath(*args) == type(p)(p, *args)" I'm not 100% sold on "subpath" as an alternative (since ".." entries may mean that the result isn't really a subpath of the original directory at all), but I do like the way it reads in the absence of parent directory references, and I definitely like it better than "join" or "[]" or "/" or "+". This interpretation is also favoured by the fact that the calculation of relative path references is strict by default (i.e. it won't insert ".." to make the reference work when the target isn't a subpath)
This seems too strict for the general case. Configuration files in bundled applications, for example, often contain paths relative to the file (e.g. open up a Visual Studio project file). There are no symlinks involved there. Perhaps a "require_subpath" flag that defaults to True would be appropriate? Passing "require_subpath=False" would then provide explicit permission to add ".." entries as appropriate, and it would be up to the developer to document the "no symlinks!" restriction on their layout. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8 October 2012 11:31, Nick Coghlan <ncoghlan@gmail.com> wrote:
Until precisely this point in your email, I'd been completely confused, because I thought that p.supbath(xxx) was some sort of "is xxx a subpath of p" query. It never occurred to me that it was the os.path.join equivalent operation. In fact, I'm not sure where you got it from, as I couldn't find it in either the PEP or in pathlib's documentation. I'm not unhappy with using a method for creating a new path based on an existing one (none of the operator forms seems particularly compelling to me) but I really don't like subpath as a name. I don't dislike p.join(parts) as it links back nicely to os.path.join. I can't honestly see anyone getting confused in practice. But I'm not so convinced that I would want to insist on it. +1 on a method -1 on subpath as its name +0 on join as its name I'm happy for someone to come up with a better name -0 on a convenience operator form. Mainly because "only one way to do it" and the general controversy over which is the best operator to use, suggests that leaving the operator form out altogether at least in the initial implementation is the better option. Paul.

Paul Moore writes:
On 8 October 2012 11:31, Nick Coghlan <ncoghlan@gmail.com> wrote:
I agree with Paul on this. If .join() doesn't work for you, how about .append() for adding new path components at the end, vs. .suffix() for adding an extension to the last component? (I don't claim Paul would agree with this next, but as long as I'm here....) I really think that the main API for paths should be the API for sequences specialized to "sequence of path components", with a subsidiary set of operations for common textual manipulations applied to individual components.

On Mon, Oct 8, 2012 at 4:41 PM, Paul Moore <p.f.moore@gmail.com> wrote:
That's OK, I don't set the bar for my mnemonics *that* high: I use Guido's rule that good names are easy to remember once you know what they mean. Being able to guess precisely just from the name is a nice bonus, but not strictly necessary.
I made it up by using "make subpath" as the reverse of "get relative path". The "is subpath" query could be handled by calling "b.startswith(a)". I'd be fine with "joinpath" as well (that is what path.py uses to avoid the conflict with str.join)
I really don't like it because of the semantic conflict with str.join. That semantic conflict is the reason I only do "from os.path import join as joinpath" or else call it as "os.path.join" - I find that using the bare "join" directly is too hard to interpret when reading code. I consider .append() and .extend() unacceptable for the same reason - they're too closely tied to mutating method semantics on sequences.
Right, this is my main point as well. The method form *has* to exist. I am *not* convinced that the cute syntactic shorthands actually *improve* readability - they improve *brevity*, but that's not the same thing. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 08/10/12 21:31, Nick Coghlan wrote:
The use of indexing to join path components: # Example from the PEP >>> p = PurePosixPath('foo') >>> p['bar'] PurePosixPath('foo/bar') is an absolute deal breaker for me. I'd rather stick with the status quo than have to deal with something which so clearly shouts "index/key lookup" but does something radically different (join/concatenate components). I would *much* rather use the / or + operator, but I think even better (and less likely to cause arguments about the operator) is an explicit `join` method. After all, we call it "joining path components", so the name is intuitive (at least for English speakers) and simple. I don't believe that there will be confusion with str.join -- we already have an os.path.join method, and I haven't seen any sign of confusion caused by that. [...]
To some degree, that's a failure of the search engine, not of the language. Why can't we type "symbol=+" into the search field and get information about addition? If Google can let you do mathematical calculations in their search field, surely we could search for symbols? But I digress.
"p.subpath('foo', 'bar')" looks like executable pseudocode for creating a new path based on existing one to me,
That notation quite possibly goes beyond unintuitive to downright perverse. You are using a method called "subpath" to generate a *superpath* (deeper, longer path which includes p as a part). http://en.wiktionary.org/wiki/subpath Given: p = /a/b/c q = /a/b/c/d/e # p.subpath(d, e) p is a subpath of q, not the other way around: q is a path PLUS some subdirectories of that path, i.e. a longer path. It's also a pretty unusual term outside of graph theory: Googling finds fewer than 400,000 references to "subpath". It gets used in graphics applications, some games, and in an extension to mercurial for adding symbolic names to repo URLs. I can't see any sign that it is used in the sense you intend.
unlike "p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')".
Okay, I'll grant you that we'll probably never get a consensus on operators + versus / but I really don't understand why you think that p.join is unsuitable for a method which joins path components. -- Steven

On Mon, Oct 8, 2012 at 11:53 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Huh? It's a tree structure. A subpath lives inside its parent path, just as subnodes are children of their parent node. Agreed it's not a widely used term though - it's a generalisation of subdirectory to also cover file paths. They're certainly not "super" anything, any more than a subdirectory is really a superdirectory (which is what you appear to be arguing).
"p.join(r)" has exactly the same problem as "p + r": pass in a string to a function expecting a path object and you get data corruption instead of an exception. When you want *different* semantics, then ducktyping is your enemy and it's necessary to take steps to avoid it, include changing method names and avoiding some operators. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8 October 2012 19:39, Nick Coghlan <ncoghlan@gmail.com> wrote:
Ah, OK. I understand your objection now. I concede that Path.join() is a bad idea based on this. I still don't like subpath() though. And pathjoin() is too likely to be redundant in real code: temp_path = Path(tempfile.mkdtemp()) generated_file = temp_path.pathjoin('data_extract.csv') I can't think of a better term, though :-( Paul

On Tue, 9 Oct 2012 00:09:23 +0530 Nick Coghlan <ncoghlan@gmail.com> wrote:
Well, it's a "subpath", except when it isn't:
I have to admit I didn't understand what your meant by "subpath" until you explained that it was another name for "join". It really don't think it's a good name. child() would be a good name, except for the case above where you join with an absolute path (above). Actually, child() could be a variant of join() which wouldn't allow for absolute arguments. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Nick, I've come to the conclusion that you are right to prefer a named method over an operator for joining paths. But I think you are wrong to name that method "subpath" -- see below. On 09/10/12 05:39, Nick Coghlan wrote:
I believe you mentioned in an earlier email that you invented the term for this discussion. Quote: I made it up by using "make subpath" as the reverse of "get relative path". Unfortunately subpath already has an established meaning, and it is the complete opposite of the sense you intend: paths are trees are graphs, and the graph a->b->c->d is a superpath, not subpath, of a->b->c: a->b->c is strictly contained within a->b->c->d; the reverse is not true. Just as "abcd" is a superstring of "abc", not a substring. Likewise for superset and subset. And likewise for trees (best viewed in a monospaced font): a-b-c \ f-g One can say that the tree a-f-g is a subtree of the whole, but one cannot say that a-f-g-h is a subtree since h is not a part of the first tree.
They're certainly not "super" anything, any more than a subdirectory is really a superdirectory (which is what you appear to be arguing).
Common usage is that "subdirectory" gets used for relative paths: given path /a/b/c/d, we say that "d" is a subdirectory of /a/b/c. I've never come across anyone giving d in absolute terms. Now perhaps I've lived a sheltered life *wink* and people do talk about subdirectories in absolute paths all the time. That's fine. But they don't talk about "subpaths" in the sense you intend, and the sense you intend goes completely against the established sense. The point is, despite the common "sub" prefix, the semantics of "subdirectory" is quite different from the semantics of "substring", "subset", "subtree" and "subpath". -- Steven

Steven D'Aprano wrote:
I think the "sub" in "subdirectory" is more in the sense of "below", rather than "is a part of". Like a submarine is something that travels below the surface of the sea, not something that's part of the sea. -- Greg

Nick Coghlan wrote:
Huh? It's a tree structure. A subpath lives inside its parent path, just as subnodes are children of their parent node.
You're confusing the path, which is a name, with the object that it names. It's called a path because it's the route that you follow from the root to reach the node being named. To reach a subnode of N requires following a *longer* path than you did to reach N. There's no sense in which the *path* to the subnode is "contained" within the path to N -- rather it's the other way around. -- Greg

Just to add my 2p's worth. On 05/10/12 19:25, Antoine Pitrou wrote:
In general I like it.
Class hierarchy ---------------
Lovely ASCII art work :) but it does have have the n*m problem of such hierarchies. N types of file: file, directory, mount-point, drive, root, etc, etc and M implementations Posix, NT, linux, OSX, network, database, etc, etc I would prefer duck-typing. Add ABCs for all the N types of file and use concrete classes for the actual filesystems That way there are N+M rather than N*M classes. Although I'm generally against operator overloading, would the // operator be better than the // operator as it is more rarely used and more visually distinctive? Cheers, Mark.

Hello Mark, On Sat, 06 Oct 2012 11:49:35 +0100 Mark Shannon <mark@hotpy.org> wrote:
There is no distinction per "type of file": files, directories, etc. all share the same implementation. So you only have a per-flavour distinction (Posix / NT).
It seems to me that "duck typing" and "ABCs" are mutually exclusive, kind of :)
You mean "would the / operator be better than the [] operator"? I didn't choose / at first because I knew this choice would be quite contentious. However, if there happens to be a strong majority in its favour, why not. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Responding late, but I didn't get a chance to get my very strong feelings on this proposal in yesterday. I do not like it. I'll give full disclosure and say that I think our earlier failure to include the path library in the stdlib has been a loss for Python and I'll always hope we can fix that one day. I still hold out hope. It feels like this proposal is "make it object oriented, because object oriented is good" without any actual justification or obvious problem this solves. The API looks clunky and redundant, and does not appear to actually improve anything over the facilities in the os.path module. This takes a lot of things we can already do with paths and files and remixes them into a not-so intuitive API for the sake of change, not for the sake of solving a real problem. As for specific problems I have with the proposal: Frankly, I think not keeping the / operator for joining is a huge mistake. This is the number one best feature of path and despite that many people don't like it, it makes sense. It makes our most common path operation read very close to the actual representation of the what you're creating. This is great. Not inheriting from str means that we can't directly path these path objects to existing code that just expects a string, so we have a really hard boundary around the edges of this new API. It does not lend itself well to incrementally transitioning to it from existing code. The stat operations and other file-facilities tacked on feel out of place, and limited. Why does it make sense to add these facilities to path and not other file operations? Why not give me a read method on paths? or maybe a copy? Putting lots of file facilities on a path object feels wrong because you can't extend it easily. This is one place that function(thing) works better than thing.function() Overall, I'm completely -1 on the whole thing. On Fri, Oct 5, 2012 at 2:25 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On Sat, 6 Oct 2012 12:14:40 -0400 Calvin Spealman <ironfroggy@gmail.com> wrote:
Personally, I cringe everytime I have to type `os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two directories upwards of a given path. Compare, with, say:
Really, I don't think os.path is the prettiest or most convenient "battery" in the stdlib.
Ironing out difficulties such as platform-specific case-sensitivity rules or the various path separators is a real problem that is not solved by a os.path-like API, because you can't muck with str and give it the required semantics for a filesystem path. So people end up sprinkling their code with calls to os.path.normpath() and/or os.path.normcase() in the hope that it will appease the Gods of Portability (which will also lose casing information).
As discussed in the PEP, I consider inheriting from str to be a mistake when your intent is to provide different semantics from str. Why should indexing or iterating over a path produce individual characters? Why should Path.split() split over whitespace by default? Why should "c:\\" be considered unequal to "C:\\" under Windows? Why should startswith() work character by character, rather than path component by path component? These are all standard str behaviours that are unhelpful when applied to filesystem paths. As for the transition, you just have to call str() on the path object. Since str() also works on plain str objects (and is a no-op), it seems rather painless to me. (Of course, you are not forced to transition. The PEP doesn't call for deprecation of os.path.)
There is always room to improve and complete the API without breaking compatibility. To quote the PEP: “More operations could be provided, for example some of the functionality of the shutil module”. The focus of the PEP is not to enumerate every possible file operation, but to propose the semantic and syntactic foundations (such as how to join paths, how to divide them into their individual components, etc.).
But you can still define a function() taking a Path as an argument, if you need to. Similarly, you can define a function() taking a datetime object if the datetime object's API lacks some useful functionality for you. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

How about something along this lines: import os class Path(str): def __add__(self,other): return Path(self+os.path.sep+other) def __getitem__(self,i): return self.split(os.path.sep)[i] def __setitem__(self,i,v): items = self.split(os.path.sep) items[i]=v return Path(os.path.sep.join(items)) def append(self,v): self += os.path.sep+v @property def filename(self): return self.split(os.path.sep)[-1] @property def folder(self): items =self.split(os.path.sep) return Path(os.path.sep.join(items[:-1])) path = Path('/this/is/an/example.png') print isinstance(path,str) # True print path[-1] # example.png print path.filename # example.png print path.folder # /this/is/an On Oct 6, 2012, at 12:08 PM, Antoine Pitrou wrote:

I was thinking of the api more than the implementation. The point to me is that it would be nice to have something the behaves as a string and as a list at the same time. Here is another possible incomplete implementation. import os class Path(object): def __init__(self,s='/',sep=os.path.sep): self.sep = sep self.s = s.split(sep) def __str__(self): return self.sep.join(self.s) def __add__(self,other): if other[0]=='': return Path(other) else: return Path(str(self)+os.sep+str(other)) def __getitem__(self,i): return self.s[i] def __setitem__(self,i,v): self.s[i] = v def append(self,v): self.s.append(v) @property def filename(self): return self.s[-1] @property def folder(self): return Path(self.sep.join(self.s[:-1]))
On Oct 6, 2012, at 12:51 PM, Georg Brandl wrote:

Georg Brandl wrote:
If you inherit from str, you cannot override any of the operations that str already has (i.e. __add__, __getitem__).
Is this a 3.x thing? My 2.x version of Path overrides many of the str methods and works just fine.
And obviously you also can't make it mutable, i.e. __setitem__.
Well, since Paths (both Antoine's and mine) are immutable that's not an issue. ~Ethan~

Georg Brandl wrote:
Which is why I would like to see Path based on str, despite Guido's misgivings. (Yes, I know I'm probably tilting at windmills here...) If Path is string based we get backwards compatibility with all the os and third-party tools that expect and use strings; this would allow a gentle migration to using them, as opposed to the all-or-nothing if Path is a completely new type. This would be especially useful for accessing the functions that haven't been added on to Path yet. If Path is string based some questions evaporate: '+'? It does what str does; iterate? Just like str (we can make named methods for the iterations that we want, such as Path.dirs). If Path is string based we still get to use '/' to combine them together (I think that was the preference from the poll... but that could be wishful thinking on my part. ;) ) Even Path.joinpath would make sense to differentiate from Path.join (which is really str.join). Anyway, my two cents worth.

On Fri, 12 Oct 2012 12:23:46 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
It is not all-or-nothing since you can just call str() and it will work fine with both strings and paths. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Antoine Pitrou wrote:
D'oh. You're correct, of course. What I was thinking was along the lines of: --> some_table = Path('~/addresses.dbf') --> some_table = os.path.expanduser(some_table) vs --> some_table = Path('~/addresses.dbf') --> some_table = Path(os.path.expanduser(str(some_table))) The Path/str sandwich is ackward, as well as verbose. ~Ethan~

On Fri, 12 Oct 2012 13:33:14 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
Hey, nice catch, I need to add a expanduser()-alike to the Path API. Thank you! Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, Oct 13, 2012 at 7:00 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
My point about the Path(...(str(...))) sandwich still applies, though, for every function that isn't built in to Path. :)
It's the same situation we were in with the design of the new ipaddress module, and the answer is the same: implicit coercion just creates way too many opportunities for errors to pass silently. We had to create a backwards incompatible version of the language to eliminate the semantic confusion between binary data and text data, we're not going to introduce a similar confusion between arbitrary text strings and objects that actually behave like filesystem paths. str has a *big* API, and much of it doesn't make any sense in the particular case of path objects. In particular, path objects shouldn't be iterable, because it isn't clear what iteration should mean: it could be path segments, it could be parent paths, or it could be directory contents. It definitely *shouldn't* be individual characters, but that's what we would get if it inherited from strings. I do like the idea of introducing a "filesystem path" protocol though (and Antoine's already considering that), which would give us the implicit interoperability without the inheritance of an overbroad API. Something else I've been thinking about is that it still feels wrong to me to be making the Windows vs Posix behavioural decision at the class level. It really feels more like a "decimal.Context" style API would be more appropriate, where there was a PathContext that determined how various operations on paths behaved. The default context would then be determined by the current OS, but you could write: with pathlib.PosixContext: # "\" is not a directory separator # "/" is used in representations # Comparison is case sensitive # expanduser() uses posix rules with pathlib.WindowsContext: # "\" and "/" are directory separators # "\" is used in representations # Comparison is case insensitive Contexts could be tweaked for desired behaviour (e.g. using "/" in representations on Windows) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, 13 Oct 2012 17:41:29 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
:-/ You could make an argument that the Path classes could have their behaviour tweaked with such a context system, but I really think explicit classes for different path flavours are much better design than some thread-local context hackery. Personally, I consider thread-local contexts to be an anti-pattern. (also, the idea that a POSIX path becomes a Windows path based on which "with" statement it's used inside sounds scary) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On 13/10/12 18:41, Nick Coghlan wrote:
Ah, I wondered if anyone else had picked up on that. When I read the PEP, I was concerned about the mental conflict between iteration and indexing of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does something completely different from iterating over p directly. Indexing gives path components; iteration gives children of the path (like os.walk). -1 on iteration over the children. Instead, use: for child in p.walk(): ... which has the huge benefit that the walk method can take arguments as needed, such as the args os.walk takes: topdown=True, onerror=None, followlinks=False plus I'd like to see a "filter" argument to filter which children are (or aren't) seen. +1 on indexing giving path components, although the side effect of this is that you automatically get iteration via the sequence protocol. So be it -- I don't think we should be scared to *choose* an iteration model, just because there are other potential models. Using indexing to get path components is useful, slicing gives you sub paths for free, and if the cost of that is that you can iterate over the path, well, I'm okay with that: p = Path('/usr/local/lib/python3.3/glob.py') list(p) => ['/', 'usr', 'local', 'lib', 'python3.3', 'glob.py'] Works for me. -- Steven

On Sun, 14 Oct 2012 21:48:59 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
p[0] p[1] etc. are just TypeErrors:
So, yes, it's doing "something different", but there is little chance of silent bugs :-)
Judging by its name and signature, walk() would be a recursive operation, while iterating on a path isn't (it only gets you the children).
There is already a .parts property which does exactly that: http://www.python.org/dev/peps/pep-0428/#sequence-like-access The problem with enabling sequence access *on the path object* is that you get confusion with str's own sequencing behaviour, if you happen to pass a str instead of a Path, or the reverse. Which is explained briefly here: http://www.python.org/dev/peps/pep-0428/#no-confusion-with-builtins Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Steven D'Aprano wrote:
I actually prefer Steven's interpretation. If we are going to iterate directly on a path object, we should be yeilding the pieces of the path object. After all, a path can contain a file name (most of mine do) and what sense does it make to iterate over the children of /usr/home/ethanf/some_table.dbf? ~Ethan~

On Sun, 14 Oct 2012 07:50:06 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
Well, given that: 1. sequence access (including the iterator protocol) to the path's parts is already provided through the ".parts" property 2. it makes little sense to actually iterate over those parts (what operations are you going to do sequentially over '/', then 'home', then 'ethanf', etc.?) ... I think yielding the directory contents is a much more useful alternative when iterating over the path itself. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On 14/10/12 23:13, Antoine Pitrou wrote:
Well, that's two people so far who have conflated "p.parts" as just p. Perhaps that's because "parts" is so similar to "path". Since we already refer to the bits of a path as "path components", perhaps this bike shed ought to be spelled "p.components". It's longer, but I bet nobody will miss it. -- Steven

On Sun, Oct 14, 2012 at 8:45 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I would prefer to see p.split() It matches the existing os.path.split() better and I like the idea of a new library matching the old, to be an easier transition for brains. That said, it also looks too much like str.split()
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On 12 October 2012 21:33, Ethan Furman <ethan@stoneleaf.us> wrote:
A lot of them might end up inadvertently converting back to a pure string as well, so a better comparison will in many places be: some_table = Path('~/addresses.dbf')
some_table = Path(os.path.expanduser(some_table))
vs some_table = Path('~/addresses.dbf')
some_table = Path(os.path.expanduser(str(**some_table)))
which is only five characters different. I would also prefer: some_table = Path('~/addresses.dbf')
some_table = Path(os.path.expanduser(some_table.raw()))
or some other method. It just looks nicer to me in this case. Maybe .str(), .chars() or.text(). Additionally, if this is too painful and too often used, we can always make an auxiliary function. some_table = Path('~/addresses.dbf')
some_table = some_table.str_apply(os.path.expanduser)
Where .str_apply takes (func, *args, **kwargs) and you need to wrap the function if it takes the path at a different position. I don't particularly like this option, but it exists.

On 12 October 2012 20:42, Antoine Pitrou <solipsis@pitrou.net> wrote:
I assumed that part of the proposal for including a new Path class was that it would (perhaps eventually rather than immediately) be directly supported by all of the standard Python APIs that expect strings-representing-paths. I apologise if I have missed something but is there some reason why it would be bad for e.g. open() to accept Path instances as they are? I think it's reasonable to require that e.g. os.open() should only accept a str, but standard open()? Oscar

Oscar Benjamin wrote:
I think it's reasonable to require that e.g. os.open() should only accept a str, but standard open()?
Why shouldn't os.open() accept a path object? Especially if we use a protocol such as __strpath__ so that the os module doesn't have to explicitly know about the Path classes. -- Greg

Massimo DiPierro wrote:
Unfortunately, if you subclass from str, I don't think it will be feasible to make indexing return pathname components, because code that's treating it as a string will be expecting it to index characters. Similarly you can't make + mean path concatenation -- it must remain ordinary string concatenation. -- Greg

On Sat, Oct 6, 2012 at 1:08 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I would never do the first version in the first place. I would just join(my_path, "../..") Note that we really need to get out of the habit of "import os" instead of "from os.path import join, etc..." We are making our code uglier and arbitrarily creating many of your concerns by making the use of os.path harder than it should be.
I agree this stuff is difficult, but I think normalizing is a lot more predictable than lots of platform specific paths (both FS and code paths)
Good points, but I'm not convinced that subclasses from string means you can't change these in your subclass.
These are all standard str behaviours that are unhelpful when applied to filesystem paths.
We agree there.
But then I loose all the helpful path information. Something further down the call chain, path aware, might be able to make use of it.
(Of course, you are not forced to transition. The PEP doesn't call for deprecation of os.path.)
If we are only adding something redundant and intend to leave both forever, it only feels like bloat. We should be shrinking the stdlib, not growing it with redundant APIs.
What I meant is that I can't extend it in third party code without being second class. I can add another library that does file operations os.path or stat() don't provide, and they sit side by side.
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On 07/10/12 04:08, Antoine Pitrou wrote:
I would cringe too if I did that, because it goes THREE directories up, not two: py> path = '/a/b/c/d' py> os.path.dirname(os.path.dirname(os.path.dirname(path))) '/a' :)
You know, I don't think I've ever needed to call dirname more than once at a time, but if I was using it a lot: parent = os.path.dirname parent(parent(parent(p)) which is not as short as p.parent(3), but it's still pretty clear. -- Steven

On Sun, 07 Oct 2012 12:41:44 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
Not if d is a file, actually (yes, the formulation was a bit ambiguous). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, Oct 6, 2012 at 12:14 PM, Calvin Spealman <ironfroggy@gmail.com> wrote:
The only reason to have objects for anything is to let people have other implementations that do something else with the same method. I remember one of the advantages to having an object-oriented path API, that I always wanted, is that the actual filesystem doesn't have to be what the paths access. They could be names for web resources, or files within a zip archive, or virtual files on a pretend hard drive in your demo application. That's fantastic to have, imo, and it's something function calls (like you suggest) can't possibly support, because functions aren't extensibly polymorphic. If we don't get this sort of polymorphism of functionality, there's very little point to an object oriented path API. It is syntax sugar for function calls with slightly better type safety (NTPath(...) / UnixPath(...) == TypeError -- I hope.) So I'd assume the reason that these methods exist is to enable polymorphism. As for why your suggested methods don't exist, they are better written as functions because they don't need to be ad-hoc polymorphic, they work just fine as regular functions that call methods on path objects. e.g. def read(path): return path.open().read() def copy(path1, path2): path2.open('w').write(path1.read()) # won't work for very large files, blah blah blah Whereas the open method cannot work this way, because the path should define how file opening works. (It might return an io.StringIO for example.) And the return value of .open() might not be a real file with a real fd, so you can't implement a stat function in terms of open and f.fileno() and such. And so on. -- Devin

On Sat, Oct 6, 2012 at 9:44 PM, Calvin Spealman <ironfroggy@gmail.com> wrote:
The PEP needs to better articulate the rationale, but the key points are: - better abstraction and encapsulation of cross-platform logic so file manipulation algorithms written on Windows are more likely to work correctly on POSIX systems (and vice-versa) - improved ability to manipulate paths with Windows semantics on a POSIX system (and vice-versa) - better support for creation of "mock" filesystem APIs
It trades readability (and discoverability) for brevity. Not good.
It's the exact design philosophy as was used in the creation of the new ipaddress module: the objects in ipaddress must still be converted to a string or integer before they can be passed to other operations (such as the socket module APIs). Strings and integers remain the data interchange formats here as well (although far more focused on strings in the path case).
Indeed, I'm personally much happier with the "pure" path classes than I am with the ones that can do filesystem manipulation. Having both "p.open(mode)" and "open(str(p), mode)" seems strange. OTOH, I can see the attraction in being able to better fake filesystem access through the method API, so I'm willing to go along with it.
Overall, I'm completely -1 on the whole thing.
I find this very hard to square with your enthusiastic support for path.py. Like ipaddr, which needed to clean up its semantic model before it could be included in the standard library (as ipaddress), we need a clean cross-platform semantic model for path objects before a convenience API can be added for manipulating them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Oct 8, 2012 at 1:59 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Frankly, for 99% of file path work, anything I do on one "just works" on the other, and complicating things with these POSIX versus NT path types just seems to be a whole lot of early complication for a few edge cases most people never see. Simplest example is requiring the backslash separator on NT when it handles forward slash, just like POSIX, just fine, and has for a long, long time.
I admit the mock FS intrigues me
I thought it had all three. In these situations, where my and another's perception of a systems strengths and weaknesses are opposite, I don't really know how to make a good response. :-/
I somewhat dislike this because I loved path.py so much and this proposal seems to actively avoid exactly the aspects of path.py that I enjoyed the most (like the / joining).
Cheers, Nick.
path.py was in teh wild, and is still in use. Why do we find ourselves debating new libraries like this as PEPs? We need to let them play out, see what sticks. If someone wants to make this library and stick it on PyPI, I'm not stopping them. I'm encouraging it. Let's see how it plays out. if it works out well, it deserves a PEP. In two or three years. -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On Tue, Oct 9, 2012 at 3:02 AM, Calvin Spealman <ironfroggy@gmail.com>wrote:
I agree, This discussion has been framed unfairly. The only things that should appear in this PEP are the guidelines Guido mentioned earlier in the discussion along with some use cases. So python is chartering a path object module, and we should let whichever module is the best on pypi eventually get into the std-lib. Yuval Greenfield

Yuval Greenfield <ubershmekel@...> writes:
On Tue, Oct 9, 2012 at 3:02 AM, Calvin Spealman
<ironfroggy@gmail.com> wrote:
path.py was in teh wild, and is still in use. Why do we find ourselves
debating new libraries like this as PEPs? We need to let them play out, see what sticks. If someone wants to make this library and stick it on PyPI, I'm not stopping them. I'm encouraging it. Let's see how it plays out. if it works out well, it deserves a PEP. In two or three years.
I agree,
This discussion has been framed unfairly.
path.py (or a similar API) has already been rejected as PEP 355. I see no need to go through this again, at least not in this discussion thread. If you want to re-discuss PEP 355, please open a separate thread. Regards Antoine.

On Tue, Oct 9, 2012 at 4:33 PM, Yuval Greenfield <ubershmekel@gmail.com> wrote:
So python is chartering a path object module, and we should let whichever module is the best on pypi eventually get into the std-lib.
No, the module has to at least have a nodding acquaintance with good software design principles, avoid introducing too many ways to do the same thing, and various other concerns many authors of modules on PyPI often don't care about. That's *why* path.py got rejected in the first place. Just as ipaddress is not the same as ipaddr due to those additional concerns, so will whatever path abstraction makes into the standard library take those concerns into account. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Oct 5, 2012 at 11:25 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Thanks for getting this started! I haven't read the whole PEP or the whole thread, but I like many of the principles, such as not deriving from existing built-in types (str or tuple), immutability, explicitly caring about OS differences, and distinguishing between pure and impure (I/O-using) operations. (Though admittedly I'm not super-keen on the specific term "pure".) I can't say I'm thrilled about overloading p[s], but I can't get too excited about p/s either; p+s makes more sense but that would beg the question of how to append an extension to a path (transforming e.g. 'foo/bar' to 'foo/bar.py' by appending '.py'). At the same time I'm not in the camp that says you can't use / because it's not division. But rather than diving right into the syntax, I would like to focus on some use cases. (Some of this may already be in the PEP, my apologize.) Some things I care about (based on path manipulations I remember I've written at some point or another): - Distinguishing absolute paths from relative paths; this affects joining behavior as for os.path.join(). - Various normal forms that can be used for comparing paths for equality; there should be a pure normalization as well as an impure one (like os.path.realpath()). - An API that encourage Unix lovers to write code that is most likely also to make sense on Windows. - An API that encourages Windows lovers to write code that is most likely also to make sense on Unix. - Integration with fnmatch (pure) and glob (impure). - In addition to stat(), some simple derived operations like getmtime(), getsize(), islink(). - Easy checks and manipulations (applying to the basename) like "ends with .pyc", "starts with foo", "ends with .tar.gz", "replace .pyc extension with .py", "remove trailing ~", "append .tmp", "remove leading @", and so on. - While it's nice to be able to ask for "the extension" it would be nice if the checks above would not be hardcoded to use "." as a separator; and it would be nice if the extension-parsing code could deal with multiple extensions and wasn't confused by names starting or ending with a dot. - Matching on patterns on directory names (e.g. "does not contain a segment named .hg"). - A matching notation based on glob/fnmatch syntax instead of regular expressions. PS. Another occasional use for "posix" style paths I have found is manipulating the path portion of a URL. There are some posix-like features, e.g. the interpretation of trailing / as "directory", the requirement of leading / as root, the interpretation of "." and "..", and the notion of relative paths (although path joining is different). It would be nice if the "pure" posix path class could be reused for this purpose, or if a related class with a subset or superset of the same methods existed. This may influence the basic design somewhat in showing the need for custom subclasses etc. -- --Guido van Rossum (python.org/~guido)

On Sat, 6 Oct 2012 10:44:37 -0700 Guido van Rossum <guido@python.org> wrote:
The proposed API does function like os.path.join() in that respect: when joining a relative path to an absolute path, the relative path is simply discarded:
Impure normalization is done with the resolve() method:
(/etc/ssl/certs being a symlink to /etc/pki/tks/certs on my system) Pure comparison already obeys case-sensitivity rules as well as the different path separators:
Note the case information isn't lost either:
I agree on these goals, that's why I'm trying to avoid system-specific methods. For example is_reserved() is also defined under Unix, it just always returns False:
- Integration with fnmatch (pure) and glob (impure).
This is provided indeed, with the match() and glob() methods respectively.
- In addition to stat(), some simple derived operations like getmtime(), getsize(), islink().
The PEP proposes properties mimicking the stat object attributes:
And methods to query the file type:
Perhaps the properties / methods mix isn't very consistent.
I'll try to reconcile this with Ben Finney's suffix / suffixes proposal.
- Matching on patterns on directory names (e.g. "does not contain a segment named .hg").
Sequence-like access on the parts property provides this:
Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sun, Oct 7, 2012 at 10:37 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I would warn about caching these results on the path object. I can easily imagine cases where I want to repeatedly call stat() because I'm waiting for a file to change (e.g. tail -f does something like this). I would prefer to have a stat() method that always calls os.stat(), and no caching of the results; the user can cache the stat() return value. (Maybe we can add is_file() etc. as methods on stat() results now they are no longer just tuples?)
Sounds cool. I will try to refrain from bikeshedding much more on this proposal; I'd rather focus on reactors and futures... -- --Guido van Rossum (python.org/~guido)

On Sun, Oct 7, 2012 at 7:37 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
What's the use case for this behavior? I'd much rather if joining an absolute path to a relative one fail and reveal the potential bug.... >>> os.unlink(Path('myproj') / Path('/lib')) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: absolute path can't be appended to a relative path

On Sun, 7 Oct 2012 23:15:38 +0200 Yuval Greenfield <ubershmekel@gmail.com> wrote:
In all honesty I followed os.path.join's behaviour here. I agree a ValueError (not TypeError) would be sensible too. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Am 07.10.2012 23:42, schrieb Antoine Pitrou:
Please no -- this is a very important use case (for os.path.join, at least): resolving a path from config/user/command line that can be given either absolute or relative to a certain directory. Right now it's as simple as join(default, path), and i'd prefer to keep this. There is no bug here, it's working as designed. Georg

On Sun, 7 Oct 2012 22:43:02 +0100 Arnaud Delobelle <arnodel@gmail.com> wrote:
I don't know. How does os.path deal with it? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

Antoine Pitrou wrote:
Not all that well, apparently. From the docs for os.path: os.path.normcase(path) Normalize the case of a pathname. On Unix and Mac OS X, this returns the path unchanged; on case-insensitive filesystems, it converts the path to lowercase. On Windows, it also converts forward slashes to backward slashes. This is partially self-contradictory, since many MacOSX filesystems are actually case-insensitive; it depends on the particular filesystem concerned. Worse, different parts of the same path can have different case sensitivities. Also, with network file systems, not all paths are necessarily case-insensitive on Windows. So there's really no certain way to compare pure paths for equality. Basing it on which OS is running your code is no more than a guess. -- Greg

On Mon, 08 Oct 2012 11:55:26 +1300 Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
That's true, but considering paths case-insensitive under Windows and case-sensitive under (non-OS X) Unix is still a very good approximation that seems to satisfy most everyone.
So there's really no certain way to compare pure paths for equality. Basing it on which OS is running your code is no more than a guess.
I wonder how well other file-dealing tools cope under OS X, especially those that are portable and not OS X-specific. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Mon, Oct 08, 2012 at 12:00:22PM +0200, Ronald Oussoren <ronaldoussoren@mac.com> wrote:
Or CIFS filesystems mounted on a Linux? Case-sensitivity is a file-system property, not a operating system one.
But there is no API to ask what type of filesystem a path belongs to. So guessing by OS name is the only heuristic we can do. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 8 Oct, 2012, at 13:07, Oleg Broytman <phd@phdru.name> wrote:
I guess so, as neither statvs, statvfs, nor pathconf seem to be able to tell if a filesystem is case insensitive. The alternative would be to have a list of case insentive filesystems and use that that when comparing impure path objects. That would be fairly expensive though, as you'd have to check for every element of the path if that element is on a case insensitive filesystem. Ronald

On Mon, Oct 08, 2012 at 03:59:18PM +0200, Ronald Oussoren <ronaldoussoren@mac.com> wrote:
If a filesystem mounted to w32 is exported from a server by CIFS/SMB protocol -- is it case sensitive? What if said server is Linux? What if said filesystem was actually imported to Linux from a Novel server by NetWare Core Protocol. It's not a fictional situation -- I do it at oper.med.ru; the server is Linux that mounts two CIFS and NCP filesystem and reexport them via Samba. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Tue, Oct 9, 2012 at 1:28 AM, Oleg Broytman <phd@phdru.name> wrote:
And I thought I was weird in using sshfs and Samba together to "bounce" drive access without having to set up SMB passwords for lots of systems... Would it be safer to simply assume that everything's case sensitive until you actually do a filesystem call (a stat or something)? That is, every Pure function works as though the FS is case sensitive? ChrisA

On 8 October 2012 11:28, Oleg Broytman <phd@phdru.name> wrote:
Actually, after just thinking of a few corner cases, (and in this case seen some real world scenarios) it is easy to infer that it is impossible to estabilish for certain that a filesystem, worse, that a given directory, is case-sensitive or not. So, regardless of general passive assumptions, I think Python should include a way to actively verify the filesystem case sensitivity. Something along "assert_case_sensitiveness(<path>)" that would check for a filename in the given path, and try to retrieve it inverting some capitalization. If a suitable filename were not found in the given directory, it could raise an error - or try to make an active test by writtng there (this behavior should be controled by keyword parameters). So, whenever one needs to know about case sensitiveness, there would be one obvious way in place to know for shure, even at the cost of some extra system resources. js -><-

On 8 Oct, 2012, at 16:28, Oleg Broytman <phd@phdru.name> wrote:
Even more fun :-). CIFS/SMB from Windows to Linux or OSX behaves like a case-preserving filesystem on the systems I tested. Likewise a NFS filesystem exported from Linux to OSX behaves like a case sensitive filesystem if the Linux filesystem is case sensitive. All in all the best we seem to be able to do is use the OS as a heuristic, most Unix filesystems are case sensitive while Windows and OSX filesystems are case preserving. Ronald

Ronald Oussoren writes:
We can do better than that heuristic. All of the POSIX systems I know publish mtab by default. The mount utility by default will report the types of filesystems. While a path module should not depend on such information, I suppose[1], there ought to be a way to ask for it. Of course this is still an heuristic (at least some Mac filesystems can be configured to be case sensitive rather than case-preserving, and I don't think this information is available in mtab), but it's far more accurate than using only the OS. Footnotes: [1] Requires a system call or subprocess execution, and since mounts can be dynamically changed, doing it once at module initialization is not good enough.

Ronald Oussoren wrote:
neither statvs, statvfs, nor pathconf seem to be able to tell if a filesystem is case insensitive.
Even if they could, you wouldn't be entirely out of the woods, because different parts of the same path can be on different file systems... But how important is all this anyway? I'm trying to think of occasions when I've wanted to compare two entire paths for equality, and I can't think of *any*. -- Greg

On 10 October 2012 09:16, Ronald Oussoren <ronaldoussoren@mac.com> wrote:
Mercurial had to consider this issue when dealing with repositories built on Unix and being used on Windows. Specifically, it needed to know, if the repository contains files README and ReadMe, could it safely write both of these files without one overwriting the other. Actually, something as simple as an unzip utility could hit the same issue (it's just that it's not as critical to be careful with unzip as with a DVCS system... :-)) I don't know how Mercurial fixed the problem in the end - I believe the in-repo format encodes filenames to preserve case even on case insensitive systems, and I *think* it detects case insensitive filesystems for writing by writing a test file and reading it back in a different case. But that may have changed. Paul

Greg Ewing wrote:
Well, while I haven't had to compare the /entire/ path, I have had to compare (and sort) the filename portion. And since the SMB share uses lower-case, and our legacy FoxPro code writes upper-case, and files get copied from SMB to the local Windows drive, having the case-insensitive compare option in Path makes my life much easier. ~Ethan~

I was hesitant to put mine on PyPI because there's already a slew of others, but for the sake of discussion here it is [1]. Mine is str based, has no actual I/O components, and can easily be used in normal os, shutil, etc., calls. Example usage: job = '12345' home = Path('c:/orders'/job) work = Path('c:/work/') for pdf in glob(work/'*.pdf'): dash = pdf.filename.index('-') dest = home/'reports'/job + pdf.filename[dash:] shutil.copy(pdf, dest) Assuming I haven't typo'ed anything, the above code takes all the pdf files, removes the standard (and useless to me) header info before the '-' in the filename, then copies it over to its final resting place. If I understand Antoine's Path, the code would look something like: job = '12345' home = Path('c:/orders/')[job] work = Path('c:/work/') for child in work: if child.ext != '.pdf': continue name = child.filename dash = name.index('-') dest = home['reports'][name] shutil.copy(str(child), str(dest)) My biggest objections are the extra str calls, and indexing just doesn't look like path concatenation. ~Ethan~ [1]http://pypi.python.org/pypi/strpath P.S. Oh, very nice ascii-art!

On Sat, 06 Oct 2012 13:19:54 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
You could actually write `for child in work.glob('*.pdf')` (non-recursive) or `for child in work.glob('**/*.pdf')` (recursive). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On 05.10.12 21:25, Antoine Pitrou wrote:
PS: You can all admire my ASCII-art skills.
PurePosixPath and PureNTPath looks closer to Path than to PurePath.
The ``parent()`` method returns an ancestor of the path::
p[:-n] is shorter and looks neater than p.parent(n). Possible the ``parent()`` method is unnecessary?

Am 05.10.2012 20:25, schrieb Antoine Pitrou:
I already gave you my +1 on #python-dev. I've some additional ideas that I like to suggest for pathlib. * Jason Orendorff's path module has some methods that are quite useful for shell and find like script. I especially like the files(pattern=None), dirs(pattern=None) and their recursive counterparts walkfiles() and walkdirs(). They make code like recursively remove all pyc files easy to write: for pyc in path.walkfiles('*.py'): pyc.remove() * I like to see a convenient method to format sizes in SI units (for example 1.2 MB, 5 GB) and non SI units (MiB, GiB, aka human readable, multiple of 2). I've some code that would be useful for the task. * Web application often need to know the mimetype of a file. How about a mimetype property that returns the mimetype according to the extension? * Symlink and directory traversal attacks are a constant thread. I like to see a pathlib object that restricts itself an all its offsprings to a directory. Perhaps this can be implemented as a proxy object around a pathlib object? * While we are working on pathlib I like to improve os.listdir() in two ways. The os.listdir() function currently returns a list of file names. This can consume lots of memory for a directory with hundreds of thousands files. How about I implement an iterator version that returns some additional information, too? On Linux and most BSD you can get the file type (d_type, e.g. file, directory, symlink) for free. * Implement "if filename in directory" with os.path.exists(). Christian

On 07/10/12 09:41, Christian Heimes wrote:
Ouch! My source code!!! *grin*
So do I. http://pypi.python.org/pypi/byteformat Although it's only listed as an "alpha" package, that's just me being conservative about allowing changes to the API. The code is actually fairly mature. If there is interest in having this in the standard library, I am more than happy to target 3.4 and commit to maintaining it. -- Steven

Antoine Pitrou <solipsis@pitrou.net> writes:
The term “extension” is a barnacle from mainframe filesystems where a filename is necessarily divided into exactly two parts, the name and the extension. It doesn't really apply to POSIX filesystems. On filesystems where the user has always been free to have any number of parts in a filename, the closest concept is better referred to by the term “suffix”:: >>> p.suffix '.py' It may be useful to add an API method to query the *sequence* of suffixes of a filename:: >>> p = Path('/home/antoine/pathlib.tar.gz') >>> p.name 'pathlib.tar.gz' >>> p.suffix '.gz' >>> p.suffixes ['.tar', '.gz'] Thanks for keeping this proposal active, Antoine. -- \ “In any great organization it is far, far safer to be wrong | `\ with the majority than to be right alone.” —John Kenneth | _o__) Galbraith, 1989-07-28 | Ben Finney

Antoine Pitrou <solipsis@...> writes:
PS: You can all admire my ASCII-art skills.
but you got the direction of the "is a" arrows wrong. see http://en.wikipedia.org/wiki/Class_diagram#Generalization renaud

I would like to see some backwards compatibility here. ;) In other words, add method names where reasonable (such as .child or .children instead of or along with built-in iteration) so that this new Path beast can be backported to the 2.x line. I'm happy to take that task on if Antoine has better uses of his time. What this would allow is a nice shiny toy for the 2.x series, plus an easier migration to 3.x when the time comes. While I am very excited about the 3.x branch, and will use it whenever I can, some projects still have to be 2.x because of other dependencies. If the new Path doesn't have conflicting method or dunder names it would be possible to have a str-based 2.x version that otherwise acted remarkably like the non-str based 3.x version -- especially if the __strpath__ concept takes hold and Path objects can be passed around the os and os.path modules the way strings are now. ~Ethan~
participants (40)
-
Andrew McNabb
-
Andrew Svetlov
-
Antoine Pitrou
-
Arnaud Delobelle
-
Ben Finney
-
Calvin Spealman
-
Chris Angelico
-
Christian Heimes
-
Devin Jeanpierre
-
Eric Snow
-
Eric V. Smith
-
Ethan Furman
-
Georg Brandl
-
Greg Ewing
-
Guido van Rossum
-
Joachim König
-
Joao S. O. Bueno
-
Joshua Landau
-
Mark Shannon
-
Massimo Di Pierro
-
Massimo DiPierro
-
Mathias Panzenböck
-
Michele Lacchia
-
Mike Graham
-
Mike Meyer
-
MRAB
-
Nick Coghlan
-
Oleg Broytman
-
Oscar Benjamin
-
Paul Moore
-
Richard Oudkerk
-
rndblnch
-
Ronald Oussoren
-
Ryan D Hiebert
-
Serhiy Storchaka
-
Stefan Krah
-
Stephen J. Turnbull
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Yuval Greenfield