confusion regarding os.path.walk()

Thu Feb 21 04:35:42 EST 2002

In article <3C72244A.5704081A at imk.fraunhofer.de>, Joachim Kaeber wrote:

>Hi,

>Andrew Brown wrote:
>> The practical question I was left with was "how do you identify broken
>> symlinks with python?" I'd have liked a script that rm-ed broken and only
>> broken symlinks, and I can't figure out how to test for them.
>> os.path.islink(file) will tell me whether it's a symlink. But is there a
>> call to say what it's supposed to point at? Then I can test whether that
>> exists.

>Maybe os.stat is your friend:

>% ln -s /dev/null a
>% ln -s /dev/xxxx b

 Maybe.  That will let the underlying OS attempt
 to follow the chain of symlinks to it's terminus
 and (obviously, as below) raise and exception
 (from the ENOENT, EPERM or ELOOP or ??? error returned 
 by the system call on your particular OS).

 However, you might also want to use the os.path.os.readlink() 
 method (I think os.path.os looks odd, but I understand the idea, 
 it a subset of OS dependent functions which are a subset of the 
 overall os.path domain).  Using readlink() you can follow the
 symlink chain one link at a time.  You can also use os.path.os.lstat()
 to get inode details about the link *rather* than about the target
 of the link.  (Think of stat() as following the link chain and 
 returning the results of the real target, it dereferences the
 pointers, lstat() just returns data on the link (and any inode that
 stores the link's target, perms, timestamps, etc).  Note that 
 lstat() data (ownership and permissions) are mostly ignored by
 most versions of UNIX (including Linux).  So most of the lstat()
 data is totally useless to almost all applications.

 It's also helpful to remember that UNIX was well established before
 symlinks were added to it.  So it was vital that they be mostly 
 transparent to "legacy" applications and utilities of the time. Thus
 it makes perfect sense that they'd had stat(), open(), etc follow
 the symlinks and implement a new system call lstat() to provide 
 lower-level utilities (especially commands like cp, ls, and archivers
 like cpio, tar, and later pax) with the means to discriminate between
 the data (target) and metadata (symlink).

 Of course you probably still want to run that from within
 an exception block --- since there are many fussy reasons why
 the call might fail (following an islink() with a subsequent
 readlink() is inherently a race condition, for example).  

 In general all access to the fs should be run in an exception 
 catching block if you want your application to be gracefully 
 robust in the face of most failures.  Obviously your own little
 utilities can just die and report the problem, but application
 end-users don't like to see tracebacks polluting their pretty 
 UIs (even their text/curses UIs).

>Python 2.1 (#1, May 17 2001, 11:31:53)
>[GCC 2.95.3 19991030 (prerelease)] on linux2
>Type "copyright", "credits" or "license" for more information.
>>>> import os
>>>> os.stat("/tmp/a")
>(8630, 8, 6L, 1, 0, 0, 0, 0, 0, 0)
>>>> os.stat("/tmp/b")
>Traceback (most recent call last):
>  File "<stdin>", line 1, in ?
>OSError: [Errno 2] No such file or directory: '/tmp/b'

>HTH