[Python-bugs-list] [ python-Bugs-749261 ] os.path.split does not handle . & .. properly

SourceForge.net noreply@sourceforge.net
Fri, 13 Jun 2003 11:43:46 -0700


Bugs item #749261, was opened at 2003-06-04 18:03
Message generated for change (Comment added) made by csiemens
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=749261&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Curtis Siemens (csiemens)
Assigned to: Nobody/Anonymous (nobody)
Summary: os.path.split does not handle . & .. properly

Initial Comment:
The os.path.split() & posixpath.split() functions in my
opinion do not handle '.' & '..' at the end of a path
properly which causes os.path.dirname() &
os.path.basename() to also return the wrong result
because they are directly based on os.path.split().

I'll demonstrate the Unix Python case (the Windows
ntpath.py case is just a close parallel variation).

Example:
>python
Python 2.1.1
>>> posixpath.split('.')
('', '.')
>>> posixpath.split('..')
('', '..')

Yet:
>>> posixpath.split('./')
('..', '')
>>> posixpath.split('../')
('..', '')

Now '.' really represents './', and '..' really
represents '../'
Since the split() function simply uses a string split
on '/' to
find directories, it goofs up on this one case.  The
'.' and
'..' are like the slash character in the sense that
they all
only refer to directories.
The '.' & '..' can never be files in Unix or Windows, so I
think that the split() function should treat paths like:
    .
    ..
    dir/.
    dir/..
    /dir1/dir2/.
    /dir1/dir2/..
as not having a file portion, just as if:
    ./
    ../
    dir/./
    dir/../
    /dir1/dir2/./
    /dir1/dir2/../
respectively were given instead.

The fix in posixpath.py for this is just to put a
little path
processing code at the beginning of the split() function
that looks for the follow cases:
    if p in ['.','..'] or p[-2:] == '/.' or p[-3:] ==
'/..':
        p = p+'/'
And then go into all the regular split() code.
In fix in ntpath.py is very similar.

----------------------------------------------------------------------

>Comment By: Curtis Siemens (csiemens)
Date: 2003-06-13 11:43

Message:
Logged In: YES 
user_id=794244

Ok, I like the statment,
  "split shortens the path whenever it contains more than one
   component"
I can go with that definition of os.path.split()
because that's consistent for all paths, absolute or relative,
and given that definition I'll agree that split is about
components.

Ok, onto dirname/basename which are really the source of my
concern.  I looked at the python documentation for basename()
and I think that it points out a problem that has been
tolerated.
It states:
    Note that the result of this function is different from the
    Unix basename program; where basename for '/foo/bar/'
    returns 'bar', the basename() function returns an empty
    string ('').
You state that the final component of a path should be
returned for basename() irregardless if it is a file or
directory.
I can get behind that, but then I think that statement supports
the Unix basename function implementation where /foo/bar/
has 'bar' (or 'bar/') returned for basename because /foo/bar
and /foo/bar/ are the same path, and to me 'bar' or 'bar/' is
the same single component since the trailing slash (and only
the trailing slash(es) case) is redundant.  Am I way off on
this?

----------------------------------------------------------------------

Comment By: Jeff Epler (jepler)
Date: 2003-06-13 05:03

Message:
Logged In: YES 
user_id=2772

OK-- so my statement of the "important property" of split
was only correct in the case of a non-absolute path.

The important point is that split shortens the path whenever
it contains more than one component.  You propose that of
the values given by repeated splits of "/foo/.."  or
"foo/..", you'll never see the one-component return "foo" or
"/foo".  Why do you believe that in the loop
    while 1:
        p = os.path.split(p)[0]
that p should never have one those values?  To me this seems
obviously incorrect.

You didn't respond to my point that os.path.split is about
components, not about whether those components name
directories.  For instance, because "/usr/local/bin" names a
directory on my system, shouldn't
os.path.split("/usr/local/bin") -> ('/usr/local/bin', '') if
your test really is about whether the final component names
a directory?  To me this seems obviously incorrect.

Let me also address your claim that because of this split
behavior, basename and dirname behave improperly.  This is
also wrong.  In "/tmp/.." and "/usr/local/bin", the first
names an entry ".." in the directory "/tmp", and the second
names an entry "bin" in the directory "/usr/local", just
like "/bin/sh" names an entry "sh" in the directory "/bin".

I strongly believe this bug should be marked closed,
resolution: invalid.

----------------------------------------------------------------------

Comment By: Curtis Siemens (csiemens)
Date: 2003-06-12 15:59

Message:
Logged In: YES 
user_id=794244

Ok, I see your points, but I have 2 points.

Point 1:
Your loop 'while path != "": path = os.path.split(path)[0]'
won't stop with an absolute path because it will get down
to '/' and go into infinite spin.
OK, so you can modify it to be:
  while path != "" and path != '/':path =os.path.split(path)[0]
But this too will spin if start with an absolute path that has
more than 2 slashes - like '//dir1/dir2' or '///dir1/dir2'
at the
front of the path.
OK, you can fix that up to by doing something like:
   old_path = ''
   while path != old_path:
       old_path = path
       path = os.path.split(path)[0]
But that final loop will work with my new os.path.split
proposal - which makes me wonder if your assertion that
split should have the 'terminate loop' property.

Point 2:
You may be right about os.path.split's slated task/job.
So maybe the change shouldn't be done to os.path.split(),
but rather os.path.dirname() & os.path.basename() should
be changed to not just simply return the 1st and 2nd
components of split(), but rather try to be as "smart" as
possible and dirname's intention is to return the directory
portion, and basename's intention is to return the (end)
filename portion - if possible.  With paths like /abc/xyz
you have no idea if xyz is a file or dir, so the default should
be 'file'.  Currently /abc/xyz/ knows that xyz is a dir and
returns /abc/xyz for the dirname and '' for the basename.
My point is that currently basename/dirname are "smart"
and not just returning the last component that is a file or
is a directory, otherwise it would return /abc for the dirname
and xyz/ for the basename.
So given the current behavior of dirname/basename, they
should be smart in ALL "we can tell its a directory" cases
such as:
  .
  ..
  dir/.
  dir/..
  /dir1/dir2/.
  /dir1/dir2/..

So do I have a good Point #1, and more importantly do I have
a good Point #2 - and if I do I could change this bug's title
to be os.path.dirname/basename related.

Curtis Siemens

----------------------------------------------------------------------

Comment By: Jeff Epler (jepler)
Date: 2003-06-08 08:34

Message:
Logged In: YES 
user_id=2772

I don't believe this behavior is a bug.  os.path.split's task is to split the last component of a path from the other components, regardless of whether any of the components actually names a directory.

Another property of os.path.split is that eventually this loop will terminate:
    while path != "": path = os.path.split(path)[0]
with your proposed change, this would not be true for paths that initially contain a "." or ".." component (since os.path.split("..") -> ('..', ''))

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=749261&group_id=5470