[Python-bugs-list] [ python-Bugs-749261 ] os.path.split does not handle . & .. properly

SourceForge.net noreply@sourceforge.net
Thu, 12 Jun 2003 15:59:59 -0700


Bugs item #749261, was opened at 2003-06-04 18:03
Message generated for change (Comment added) made by csiemens
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=749261&group_id=5470

Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Curtis Siemens (csiemens)
Assigned to: Nobody/Anonymous (nobody)
Summary: os.path.split does not handle . & .. properly

Initial Comment:
The os.path.split() & posixpath.split() functions in my
opinion do not handle '.' & '..' at the end of a path
properly which causes os.path.dirname() &
os.path.basename() to also return the wrong result
because they are directly based on os.path.split().

I'll demonstrate the Unix Python case (the Windows
ntpath.py case is just a close parallel variation).

Example:
>python
Python 2.1.1
>>> posixpath.split('.')
('', '.')
>>> posixpath.split('..')
('', '..')

Yet:
>>> posixpath.split('./')
('..', '')
>>> posixpath.split('../')
('..', '')

Now '.' really represents './', and '..' really
represents '../'
Since the split() function simply uses a string split
on '/' to
find directories, it goofs up on this one case.  The
'.' and
'..' are like the slash character in the sense that
they all
only refer to directories.
The '.' & '..' can never be files in Unix or Windows, so I
think that the split() function should treat paths like:
    .
    ..
    dir/.
    dir/..
    /dir1/dir2/.
    /dir1/dir2/..
as not having a file portion, just as if:
    ./
    ../
    dir/./
    dir/../
    /dir1/dir2/./
    /dir1/dir2/../
respectively were given instead.

The fix in posixpath.py for this is just to put a
little path
processing code at the beginning of the split() function
that looks for the follow cases:
    if p in ['.','..'] or p[-2:] == '/.' or p[-3:] ==
'/..':
        p = p+'/'
And then go into all the regular split() code.
In fix in ntpath.py is very similar.

----------------------------------------------------------------------

>Comment By: Curtis Siemens (csiemens)
Date: 2003-06-12 15:59

Message:
Logged In: YES 
user_id=794244

Ok, I see your points, but I have 2 points.

Point 1:
Your loop 'while path != "": path = os.path.split(path)[0]'
won't stop with an absolute path because it will get down
to '/' and go into infinite spin.
OK, so you can modify it to be:
  while path != "" and path != '/':path =os.path.split(path)[0]
But this too will spin if start with an absolute path that has
more than 2 slashes - like '//dir1/dir2' or '///dir1/dir2'
at the
front of the path.
OK, you can fix that up to by doing something like:
   old_path = ''
   while path != old_path:
       old_path = path
       path = os.path.split(path)[0]
But that final loop will work with my new os.path.split
proposal - which makes me wonder if your assertion that
split should have the 'terminate loop' property.

Point 2:
You may be right about os.path.split's slated task/job.
So maybe the change shouldn't be done to os.path.split(),
but rather os.path.dirname() & os.path.basename() should
be changed to not just simply return the 1st and 2nd
components of split(), but rather try to be as "smart" as
possible and dirname's intention is to return the directory
portion, and basename's intention is to return the (end)
filename portion - if possible.  With paths like /abc/xyz
you have no idea if xyz is a file or dir, so the default should
be 'file'.  Currently /abc/xyz/ knows that xyz is a dir and
returns /abc/xyz for the dirname and '' for the basename.
My point is that currently basename/dirname are "smart"
and not just returning the last component that is a file or
is a directory, otherwise it would return /abc for the dirname
and xyz/ for the basename.
So given the current behavior of dirname/basename, they
should be smart in ALL "we can tell its a directory" cases
such as:
  .
  ..
  dir/.
  dir/..
  /dir1/dir2/.
  /dir1/dir2/..

So do I have a good Point #1, and more importantly do I have
a good Point #2 - and if I do I could change this bug's title
to be os.path.dirname/basename related.

Curtis Siemens

----------------------------------------------------------------------

Comment By: Jeff Epler (jepler)
Date: 2003-06-08 08:34

Message:
Logged In: YES 
user_id=2772

I don't believe this behavior is a bug.  os.path.split's task is to split the last component of a path from the other components, regardless of whether any of the components actually names a directory.

Another property of os.path.split is that eventually this loop will terminate:
    while path != "": path = os.path.split(path)[0]
with your proposed change, this would not be true for paths that initially contain a "." or ".." component (since os.path.split("..") -> ('..', ''))

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=749261&group_id=5470