[ python-Bugs-1707768 ] os.path.normpath changes path (chops of trailing slash)

SourceForge.net noreply at sourceforge.net
Sun May 6 06:15:45 CEST 2007


Bugs item #1707768, was opened at 2007-04-26 01:44
Message generated for change (Comment added) made by siemer
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1707768&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Robert Siemer (siemer)
Assigned to: Nobody/Anonymous (nobody)
Summary: os.path.normpath changes path (chops of trailing slash)

Initial Comment:
Hello everybody!

>>> os.path.normpath('/etc/passwd')
'/etc/passwd'


I don't know any environment at all where

a) '/etc/passwd/'
b) '/etc/passwd'

are treated the same. It clearly does not apply for the path part of http urls (this is left as an exercise for the reader).

But it also does not apply for (e.g.) Linux either:
an open() on path a) return ENOTDIR while it succeeds with b).

(assuming /etc/passwd is a file)

This is definitively not a documentation bug, as "normpath" should normalize a path and not fuck it up.


Robert

----------------------------------------------------------------------

>Comment By: Robert Siemer (siemer)
Date: 2007-05-06 06:15

Message:
Logged In: YES 
user_id=150699
Originator: YES

1) I (submitter) didn't specify what I expected to see:

os.path.normpath('/etc/passwd/') --> '/etc/passwd/'

So, I agree with the latest consensus, but definitely not with the
"/etc/passwd/." version...


2) I can't draw any explicit normalization rules from the excerpts of the
POSIX standard posted by iszegedi. Saying that "dir/" should be treated as
"dir/." doesn't mean that it is the normalized version of the first one. -
I actually read implicitly that the first one is the habitual one that
needs interpretation.

And I think everybody agrees that - beeing the same or not - "dir/." is
unusual.

3) I don't know what this is good for in the proposal:
path = path.rstrip()

It removes significant whitespace from the path, what must be avoided.

----------------------------------------------------------------------

Comment By: Istvan Szegedi (iszegedi)
Date: 2007-05-01 20:05

Message:
Logged In: YES 
user_id=1772412
Originator: NO

I must admit that josm's comments make sense: in fact, I quickly tried out
how mkdir command from a bash shell would behave and it does the same:

# mkdir hello
# rmdir hello/. 
Invalid argument

whereas
# rmdir hello/

works fine. I also wrote a small C program using mkdir() and rmdir()
functions and they behave exactly the same as mkdir/rmdir from bash (well,
no real suprise).

My suggestion to get the original issue fixed was based on POSIX standard
and apparently the Linux commands are not fully POSIX compliant, either...
Or do I misunderstand the quotes from the standard?  Anyway, it is pretty
easy to modify my fix to be inline with Linux commands and C functions -
everything could be the same, apart from the last line where I added "/." 
-- this should be only "/".  So the entire function could look like this:

-- clip --


def normpath(path):
    """Normalize path, eliminating double slashes, etc."""
    if path == '':
        return '.'
    initial_slashes = path.startswith('/')
    # The next two lines were added by iszegedi
    path = path.rstrip()
    trailing_slash = path.endswith('/')
    # POSIX allows one or two initial slashes, but treats three or more
    # as single slash.
    if (initial_slashes and
        path.startswith('//') and not path.startswith('///')):
        initial_slashes = 2
    comps = path.split('/')
    new_comps = []
    for comp in comps:
        if comp in ('', '.'):
            continue
        if (comp != '..' or (not initial_slashes and not new_comps) or
             (new_comps and new_comps[-1] == '..')):
            new_comps.append(comp)
        elif new_comps:
            new_comps.pop()
    comps = new_comps
    path = '/'.join(comps)
    if initial_slashes:
        path = '/'*initial_slashes + path
    # The next two lines were added by iszegedi
    if trailing_slash:
        path = path + '/'
    return path or '.'


-- clip --

Nevertheless, I would really appreciate to receive some comments from
POSIX gurus, how they see this problem.


----------------------------------------------------------------------

Comment By: jos (josm)
Date: 2007-04-30 08:48

Message:
Logged In: YES 
user_id=1776568
Originator: NO

I think we should be careful enough to tackle on this.
iszegedi's patch seems to work correctly,
but XBD's spec itself has some defect.
http://www.opengroup.org/austin/mailarchives/ag-review/msg01722.html

What do you think of the follow behavior?
>>> os.mkdir('dir/')
>>> os.mkdir('dir2/')
>>> os.rmdir(os.path.normpath('dir'))
>>> os.rmdir(os.path.normpath('dir2/'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument: 'dir2/.'




----------------------------------------------------------------------

Comment By: Istvan Szegedi (iszegedi)
Date: 2007-04-28 22:27

Message:
Logged In: YES 
user_id=1772412
Originator: NO


Here is what Posix standard says about pathnames:

"Base Definitions volume of IEEE Std 1003.1-2001, Section 3.266,
Pathname.

A character string that is used to identify a file. In the context of IEEE
Std 1003.1-2001, a pathname consists of, at most, {PATH_MAX} bytes,
including the terminating null byte. It has an optional beginning slash,
followed by zero or more filenames separated by slashes. A pathname may
optionally contain one or more trailing slashes. Multiple successive
slashes are considered to be the same as one slash."

And in the details:

"A pathname that contains at least one non-slash character and that ends
with one or more trailing slashes shall be resolved as if a single dot
character ( '.' ) were appended to the pathname."

So if I am not mistaken, according to the POSIX standard the example that
you gave - '/etc/passwd/' - should be normalized to '/etc/passwd/.' That
does not happen, indeed.

The reason for that is that in posixpath.py file the normpath() function
is using a split('/') function to split up the path into smaller chunks,
skips everything which is empty or '.' and at the end of the normpath()
function it adds slash(es) only to the beginning of the string. 

As a test, I modified the normpath() function in the posixpath.py as
follows:

--- clip ---

def normpath(path):
    """Normalize path, eliminating double slashes, etc."""
    if path == '':
        return '.'
    initial_slashes = path.startswith('/')
    # The next two lines were added by iszegedi
    path = path.rstrip()
    trailing_slash = path.endswith('/')
    # POSIX allows one or two initial slashes, but treats three or more
    # as single slash.
    if (initial_slashes and
        path.startswith('//') and not path.startswith('///')):
        initial_slashes = 2
    comps = path.split('/')
    new_comps = []
    for comp in comps:
        if comp in ('', '.'):
            continue
        if (comp != '..' or (not initial_slashes and not new_comps) or
             (new_comps and new_comps[-1] == '..')):
            new_comps.append(comp)
        elif new_comps:
            new_comps.pop()
    comps = new_comps
    path = '/'.join(comps)
    if initial_slashes:
        path = '/'*initial_slashes + path
    # The next two lines were added by iszegedi
    if trailing_slash:
        path = path + '/.'
    return path or '.'
  
-- clip --

So I added two lines (marked as "added by iszegedi" ) in the beginning to
remove any trailing whitespaces and check whether the path ends with slash.
Then at the end of the function I added another two lines to append '/.' to
the end of the return value if the input path variable ended by slash

This works now fine.

What makes it a bit tricky is that python os module imports different
xxxpath.py module depending on the host operating system. So it imports
different modules for nt, for mac, for os2, for posix, etc.  The solution
above works for posix, but the other modules need to be checked, to.


----------------------------------------------------------------------

Comment By: Robert Siemer (siemer)
Date: 2007-04-26 01:47

Message:
Logged In: YES 
user_id=150699
Originator: YES

A bugreport bug:

The example should read os.path.normpath('/etc/passwd/')...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1707768&group_id=5470


More information about the Python-bugs-list mailing list