confusion regarding os.path.walk()
Jim Dennis
jimd at vega.starshine.org
Thu Feb 21 04:35:42 EST 2002
In article <3C72244A.5704081A at imk.fraunhofer.de>, Joachim Kaeber wrote:
>Hi,
>Andrew Brown wrote:
>> The practical question I was left with was "how do you identify broken
>> symlinks with python?" I'd have liked a script that rm-ed broken and only
>> broken symlinks, and I can't figure out how to test for them.
>> os.path.islink(file) will tell me whether it's a symlink. But is there a
>> call to say what it's supposed to point at? Then I can test whether that
>> exists.
>Maybe os.stat is your friend:
>% ln -s /dev/null a
>% ln -s /dev/xxxx b
Maybe. That will let the underlying OS attempt
to follow the chain of symlinks to it's terminus
and (obviously, as below) raise and exception
(from the ENOENT, EPERM or ELOOP or ??? error returned
by the system call on your particular OS).
However, you might also want to use the os.path.os.readlink()
method (I think os.path.os looks odd, but I understand the idea,
it a subset of OS dependent functions which are a subset of the
overall os.path domain). Using readlink() you can follow the
symlink chain one link at a time. You can also use os.path.os.lstat()
to get inode details about the link *rather* than about the target
of the link. (Think of stat() as following the link chain and
returning the results of the real target, it dereferences the
pointers, lstat() just returns data on the link (and any inode that
stores the link's target, perms, timestamps, etc). Note that
lstat() data (ownership and permissions) are mostly ignored by
most versions of UNIX (including Linux). So most of the lstat()
data is totally useless to almost all applications.
It's also helpful to remember that UNIX was well established before
symlinks were added to it. So it was vital that they be mostly
transparent to "legacy" applications and utilities of the time. Thus
it makes perfect sense that they'd had stat(), open(), etc follow
the symlinks and implement a new system call lstat() to provide
lower-level utilities (especially commands like cp, ls, and archivers
like cpio, tar, and later pax) with the means to discriminate between
the data (target) and metadata (symlink).
Of course you probably still want to run that from within
an exception block --- since there are many fussy reasons why
the call might fail (following an islink() with a subsequent
readlink() is inherently a race condition, for example).
In general all access to the fs should be run in an exception
catching block if you want your application to be gracefully
robust in the face of most failures. Obviously your own little
utilities can just die and report the problem, but application
end-users don't like to see tracebacks polluting their pretty
UIs (even their text/curses UIs).
>Python 2.1 (#1, May 17 2001, 11:31:53)
>[GCC 2.95.3 19991030 (prerelease)] on linux2
>Type "copyright", "credits" or "license" for more information.
>>>> import os
>>>> os.stat("/tmp/a")
>(8630, 8, 6L, 1, 0, 0, 0, 0, 0, 0)
>>>> os.stat("/tmp/b")
>Traceback (most recent call last):
> File "<stdin>", line 1, in ?
>OSError: [Errno 2] No such file or directory: '/tmp/b'
>HTH
More information about the Python-list
mailing list