[Python-ideas] Working with Path objects: p-strings?
Sven R. Kunze
srkunze at mail.de
Thu Mar 31 10:43:07 EDT 2016
On 31.03.2016 13:31, Koos Zevenhoven wrote:
> On Thu, Mar 31, 2016 at 12:46 PM, Sven R. Kunze <srkunze at mail.de> wrote:
>> I am not sure if I can make a good suggestion here because I am still
>> trapped at the point where I need to sort out if a path more like a dict or
>> another structured datatype, or if it is more a monolithic object. That will
>> be the next blog post topic for me.
> While discussing, pondering and experimenting with this, I have formed
> a quite clear view on how I think we should move forward with this.
> I'm working on a proposal that attempts to not be in conflict with any
> of the goals. I hope it is not in conflict with your present thoughts
> about structured/vs monolithic objects either :)
Not at all. The following post it just a reflection of this whole
discussion to make it clear to myself and hopefully to others about
where several thoughts emerged from and why this whole topic started in
the first place. Basically, a structured collection of thoughts, mainly
mine:
http://srkunze.blogspot.com/2016/03/what-is-path.html
I pasted the interesting meat below:
*What is a path in the first place?*
Brett Cannon made a good point of why PEP 428 (that's the one which
introduced pathlib as a stdlib module) deliberately chose Path not to
inherit from string. I pondered over it for a while and saw that from
his perspective a path is actually not a string but rather a complex
data-structure consisting of parts with distinct meanings: literally the
steps (of which a path consists) to a resource. The string which
represents a path as most people know it is just that: a representation
of a more complex object, just like a dict or a list. Let me make this a
bit clearer.
I think we agree on the following: writing down 21 characters in a row
is a string, right? So, what about these 21 characters?
{1: 11, 2: 12, 3: 13}
If you see that in a Python program (and presumably in many other modern
programming languages), you associate that with a dictionary, mapping,
hash, etc. So, these 21 characters are a mere representation of a
complex object with a very rich functionality.
The following paragraphs summarizes what makes the discussion about
paths and strings to hard. Depending on whom you ask there are different
interpretations of what a path actually is.
*Paths as complex objects and strings as their representation*
Let's put this analogy to work with paths. If you come from a language
that treats strings as file paths (like Python), you can imagine and
categorize the facilities of pathlib like so:
* pure path - operating on the path string
* concrete path - operating on files corresponding to the given path
The classic "extract the file extension" issue is done easily with the
pure path methods. Writing to a file is also easily done with concrete
path operations. So, it seems paths are pretty complex objects with some
internal structure and a lot of functionality.
*Paths as monolithic object for addressing resources*
The previous interpretation is not the only one. Despite all the fine
functionality of extracting file extensions, concatenating parts to a
larger path, etc., building a path is not an end in itself. When you got
a path, it addresses a resource on a machine. When doing so for reading
or writing that resource, you actually don't care about whether the path
consists of parts or not. To you, it's a monolithic structure, an address.
But, you might say, each part of a path represents a directory in
hierarchical file systems. Sure that is true for many file systems but
not for all. Moreover, how often do you really care about the underlying
directory structure? It needs to be there to make things work, of
course. When it's there, you mostly don't care. How often do you need to
create a subtree in an directory in order to create a single config
file? I encounter this once in a while and to be honest: it sucks.
```touch /home/me/on/your/ssd.conf``` will fail if the directory
"on/your/" has not been created by somebody before me.
Especially for me, as a Web developer, it's quite hard to understand
what purpose this restriction serves. Within a Web application the
hierarchy of URLs is an emergent property not a prerequisite.
Users of git are accustomed to not-committing directories in. Why?
Because it's unnecessary and the directory structure is again emerging
from the files names themselves (aka from the content).
This said, it's rather cumbersome to attribute semantics to the parts of
a string that happens to be separated by "/" or "\". At least to me, a
path made of one piece.
*What about security then?*
One can further argue that Web development and git repositories are
different here. There is a clear boundary where a path can lead. A URL
path cannot address a foreign resource on another domain. git file paths
are contained within the repository root.
See the common theme? There is a container from where the path of a
resource cannot escape.
If you have a complete file system available at your fingertips, a lot
harm can be done when malicious user input is concatenated unattendedly
as a subpath; actually to address a resource within a container but
misused to gain access to the complete file system.
I cannot say if the container pattern would work for everybody but it's
definitely worth exploring as there are some prominent working examples
out there.
*Conclusion*
I really like pathlib since it solves many frequently asked questions in
the right way once and for all.
But I don't like using it as an argument again inheriting paths from
strings saying paths have internal structure in contrast to strings. At
least to me, they do not. That, on the other hand, does not necessarily
mean inheriting path from string is a good idea but it makes it no worse
one than it was before.
That's it.
Best,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160331/fb5142f2/attachment-0001.html>
More information about the Python-ideas
mailing list