[Python-ideas] Working with Path objects: p-strings?

Sven R. Kunze srkunze at mail.de
Thu Mar 31 10:43:07 EDT 2016


On 31.03.2016 13:31, Koos Zevenhoven wrote:
> On Thu, Mar 31, 2016 at 12:46 PM, Sven R. Kunze <srkunze at mail.de> wrote:
>> I am not sure if I can make a good suggestion here because I am still
>> trapped at the point where I need to sort out if a path more like a dict or
>> another structured datatype, or if it is more a monolithic object. That will
>> be the next blog post topic for me.
> While discussing, pondering and experimenting with this, I have formed
> a quite clear view on how I think we should move forward with this.
> I'm working on a proposal that attempts to not be in conflict with any
> of the goals. I hope it is not in conflict with your present thoughts
> about structured/vs monolithic objects either :)

Not at all. The following post it just a reflection of this whole 
discussion to make it clear to myself and hopefully to others about 
where several thoughts emerged from and why this whole topic started in 
the first place. Basically, a structured collection of thoughts, mainly 
mine:


http://srkunze.blogspot.com/2016/03/what-is-path.html



I pasted the interesting meat below:


*What is a path in the first place?*

Brett Cannon made a good point of why PEP 428 (that's the one which 
introduced pathlib as a stdlib module) deliberately chose Path not to 
inherit from string. I pondered over it for a while and saw that from 
his perspective a path is actually not a string but rather a complex 
data-structure consisting of parts with distinct meanings: literally the 
steps (of which a path consists) to a resource. The string which 
represents a path as most people know it is just that: a representation 
of a more complex object, just like a dict or a list. Let me make this a 
bit clearer.

I think we agree on the following: writing down 21 characters in a row 
is a string, right? So, what about these 21 characters?

{1: 11, 2: 12, 3: 13}

If you see that in a Python program (and presumably in many other modern 
programming languages), you associate that with a dictionary, mapping, 
hash, etc. So, these 21 characters are a mere representation of a 
complex object with a very rich functionality.

The following paragraphs summarizes what makes the discussion about 
paths and strings to hard. Depending on whom you ask there are different 
interpretations of what a path actually is.

*Paths as complex objects and strings as their representation*

Let's put this analogy to work with paths. If you come from a language 
that treats strings as file paths (like Python), you can imagine and 
categorize the facilities of pathlib like so:

  * pure path - operating on the path string
  * concrete path - operating on files corresponding to the given path

The classic "extract the file extension" issue is done easily with the 
pure path methods. Writing to a file is also easily done with concrete 
path operations. So, it seems paths are pretty complex objects with some 
internal structure and a lot of functionality.

*Paths as monolithic object for addressing resources*

The previous interpretation is not the only one. Despite all the fine 
functionality of extracting file extensions, concatenating parts to a 
larger path, etc., building a path is not an end in itself. When you got 
a path, it addresses a resource on a machine. When doing so for reading 
or writing that resource, you actually don't care about whether the path 
consists of parts or not. To you, it's a monolithic structure, an address.

But, you might say, each part of a path represents a directory in 
hierarchical file systems. Sure that is true for many file systems but 
not for all. Moreover, how often do you really care about the underlying 
directory structure? It needs to be there to make things work, of 
course. When it's there, you mostly don't care. How often do you need to 
create a subtree in an directory in order to create a single config 
file? I encounter this once in a while and to be honest: it sucks.

```touch /home/me/on/your/ssd.conf``` will fail if the directory 
"on/your/" has not been created by somebody before me.

Especially for me, as a Web developer, it's quite hard to understand 
what purpose this restriction serves. Within a Web application the 
hierarchy of URLs is an emergent property not a prerequisite.

Users of git are accustomed to not-committing directories in. Why? 
Because it's unnecessary and the directory structure is again emerging 
from the files names themselves (aka from the content).

This said, it's rather cumbersome to attribute semantics to the parts of 
a string that happens to be separated by "/" or "\". At least to me, a 
path made of one piece.

*What about security then?*

One can further argue that Web development and git repositories are 
different here. There is a clear boundary where a path can lead. A URL 
path cannot address a foreign resource on another domain. git file paths 
are contained within the repository root.

See the common theme? There is a container from where the path of a 
resource cannot escape.

If you have a complete file system available at your fingertips,  a lot 
harm can be done when malicious user input is concatenated unattendedly 
as a subpath; actually to address a resource within a container but 
misused to gain access to the complete file system.

I cannot say if the container pattern would work for everybody but it's 
definitely worth exploring as there are some prominent working examples 
out there.

*Conclusion*

I really like pathlib since it solves many frequently asked questions in 
the right way once and for all.

But I don't like using it as an argument again inheriting paths from 
strings saying paths have internal structure in contrast to strings. At 
least to me, they do not. That, on the other hand, does not necessarily 
mean inheriting path from string is a good idea but it makes it no worse 
one than it was before.


That's it.

Best,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160331/fb5142f2/attachment-0001.html>


More information about the Python-ideas mailing list