[Python-ideas] os.listdir iteration support
Giampaolo Rodola'
gnewsg at gmail.com
Fri Nov 23 22:26:40 CET 2007
On 23 Nov, 21:23, "Guido van Rossum" <gu... at python.org> wrote:
> But how many FTP servers are written in Python *and* have directories
> with 20,000 files in them?
>
> --Guido
I sincerely don't know.
Surely it's a rather specific use case, but it is one of the tasks
which takes the longest amount of time on an FTP server. 20,000 is
probably an exaggerated hypothetical situation, so I did a simple test
with a more realistic scenario.
On windows a very crowded directory is C:\windows\system32. Currently
the C:\windows\system32 of my Windows XP workstation contains 2201
files.
I tried to run the code below which is how an FTP server should
properly respond to a "LIST" command issued by client.
It took 1.70300006866 seconds to complete the first time and
0.266000032425 the second one.
I don't know if such specific use case could justify a listdir
generators support to have into the stdlib but having something like
Greg Ewing's opendirs module could have saved a lot of time in this
specific case.
-- Giampaolo
import os, stat, time
from tarfile import filemode
try:
import pwd, grp
except ImportError:
pwd = grp = None
def format_list(directory):
"""Return a directory listing emulating "/bin/ls -lA" UNIX
command output.
This is how output appears to client:
-rw-rw-rw- 1 owner group 7045120 Sep 02 3:47 music.mp3
drwxrwxrwx 1 owner group 0 Aug 31 18:50 e-books
-rw-rw-rw- 1 owner group 380 Sep 02 3:40 module.py
"""
listing = os.listdir(directory)
result = []
for basename in listing:
file = os.path.join(directory, basename)
# if the file is a broken symlink, use lstat to get stat for
# the link
try:
stat_result = os.stat(file)
except (OSError,AttributeError):
stat_result = os.lstat(file)
perms = filemode(stat_result.st_mode) # permissions
nlinks = stat_result.st_nlink # number of links to inode
if not nlinks: # non-posix system, let's use a bogus value
nlinks = 1
if pwd and grp:
# get user and group name, else just use the raw uid/gid
try:
uname = pwd.getpwuid(stat_result.st_uid).pw_name
except KeyError:
uname = stat_result.st_uid
try:
gname = grp.getgrgid(stat_result.st_gid).gr_name
except KeyError:
gname = stat_result.st_gid
else:
# on non-posix systems the only chance we use default
# bogus values for owner and group
uname = "owner"
gname = "group"
size = stat_result.st_size # file size
# stat.st_mtime could fail (-1) if file's last modification
# time is too old, in that case we return local time as last
# modification time.
try:
mtime = time.strftime("%b %d %H:%M",
time.localtime(stat_result.st_mtime))
except ValueError:
mtime = time.strftime("%b %d %H:%M")
# if the file is a symlink, resolve it, e.g. "symlink ->
real_file"
if stat.S_ISLNK(stat_result.st_mode):
basename = basename + " -> " + os.readlink(file)
# formatting is matched with proftpd ls output
result.append("%s %3s %-8s %-8s %8s %s %s\r\n" %(
perms, nlinks, uname, gname, size, mtime, basename))
return ''.join(result)
if __name__ == '__main__':
before = time.time()
format_list(r'C:\windows\system32')
print time.time() - before
More information about the Python-ideas
mailing list