On 31.03.2016 13:31, Koos Zevenhoven wrote:
On Thu, Mar 31, 2016 at 12:46 PM, Sven R. Kunze <srkunze@mail.de> wrote:
I am not sure if I can make a good suggestion here because I am still
trapped at the point where I need to sort out if a path more like a dict or
another structured datatype, or if it is more a monolithic object. That will
be the next blog post topic for me.
While discussing, pondering and experimenting with this, I have formed
a quite clear view on how I think we should move forward with this.
I'm working on a proposal that attempts to not be in conflict with any
of the goals. I hope it is not in conflict with your present thoughts
about structured/vs monolithic objects either :)

Not at all. The following post it just a reflection of this whole discussion to make it clear to myself and hopefully to others about where several thoughts emerged from and why this whole topic started in the first place. Basically, a structured collection of thoughts, mainly mine:


http://srkunze.blogspot.com/2016/03/what-is-path.html



I pasted the interesting meat below:


What is a path in the first place?

Brett Cannon made a good point of why PEP 428 (that's the one which introduced pathlib as a stdlib module) deliberately chose Path not to inherit from string. I pondered over it for a while and saw that from his perspective a path is actually not a string but rather a complex data-structure consisting of parts with distinct meanings: literally the steps (of which a path consists) to a resource. The string which represents a path as most people know it is just that: a representation of a more complex object, just like a dict or a list. Let me make this a bit clearer.

I think we agree on the following: writing down 21 characters in a row is a string, right? So, what about these 21 characters?

{1: 11, 2: 12, 3: 13}

If you see that in a Python program (and presumably in many other modern programming languages), you associate that with a dictionary, mapping, hash, etc. So, these 21 characters are a mere representation of a complex object with a very rich functionality.

The following paragraphs summarizes what makes the discussion about paths and strings to hard. Depending on whom you ask there are different interpretations of what a path actually is.

Paths as complex objects and strings as their representation

Let's put this analogy to work with paths. If you come from a language that treats strings as file paths (like Python), you can imagine and categorize the facilities of pathlib like so:
The classic "extract the file extension" issue is done easily with the pure path methods. Writing to a file is also easily done with concrete path operations. So, it seems paths are pretty complex objects with some internal structure and a lot of functionality.

Paths as monolithic object for addressing resources

The previous interpretation is not the only one. Despite all the fine functionality of extracting file extensions, concatenating parts to a larger path, etc., building a path is not an end in itself. When you got a path, it addresses a resource on a machine. When doing so for reading or writing that resource, you actually don't care about whether the path consists of parts or not. To you, it's a monolithic structure, an address.

But, you might say, each part of a path represents a directory in hierarchical file systems. Sure that is true for many file systems but not for all. Moreover, how often do you really care about the underlying directory structure? It needs to be there to make things work, of course. When it's there, you mostly don't care. How often do you need to create a subtree in an directory in order to create a single config file? I encounter this once in a while and to be honest: it sucks.

```touch /home/me/on/your/ssd.conf``` will fail if the directory "on/your/" has not been created by somebody before me.

Especially for me, as a Web developer, it's quite hard to understand what purpose this restriction serves. Within a Web application the hierarchy of URLs is an emergent property not a prerequisite.

Users of git are accustomed to not-committing directories in. Why? Because it's unnecessary and the directory structure is again emerging from the files names themselves (aka from the content).

This said, it's rather cumbersome to attribute semantics to the parts of a string that happens to be separated by "/" or "\". At least to me, a path made of one piece.

What about security then?

One can further argue that Web development and git repositories are different here. There is a clear boundary where a path can lead. A URL path cannot address a foreign resource on another domain. git file paths are contained within the repository root.

See the common theme? There is a container from where the path of a resource cannot escape.

If you have a complete file system available at your fingertips,  a lot harm can be done when malicious user input is concatenated unattendedly as a subpath; actually to address a resource within a container but misused to gain access to the complete file system.

I cannot say if the container pattern would work for everybody but it's definitely worth exploring as there are some prominent working examples out there.

Conclusion

I really like pathlib since it solves many frequently asked questions in the right way once and for all.

But I don't like using it as an argument again inheriting paths from strings saying paths have internal structure in contrast to strings. At least to me, they do not. That, on the other hand, does not necessarily mean inheriting path from string is a good idea but it makes it no worse one than it was before.


That's it.

Best,
Sven