[Python-ideas] os.listdir iteration support

Giampaolo Rodola' gnewsg at gmail.com
Fri Nov 23 22:26:40 CET 2007


On 23 Nov, 21:23, "Guido van Rossum" <gu... at python.org> wrote:
> But how many FTP servers are written in Python *and* have directories
> with 20,000 files in them?
>
> --Guido

I sincerely don't know.
Surely it's a rather specific use case, but it is one of the tasks
which takes the longest amount of time on an FTP server. 20,000 is
probably an exaggerated hypothetical situation, so I did a simple test
with a more realistic scenario.
On windows a very crowded directory is C:\windows\system32. Currently
the C:\windows\system32 of my Windows XP workstation contains 2201
files.
I tried to run the code below which is how an FTP server should
properly respond to a "LIST" command issued by client.
It took 1.70300006866 seconds to complete the first time and
0.266000032425 the second one.
I don't know if such specific use case could justify a listdir
generators support to have into the stdlib but having something like
Greg Ewing's opendirs module could have saved a lot of time in this
specific case.


-- Giampaolo


import os, stat, time
from tarfile import filemode
try:
    import pwd, grp
except ImportError:
    pwd = grp = None


def format_list(directory):
    """Return a directory listing emulating "/bin/ls -lA" UNIX
    command output.

    This is how output appears to client:
    -rw-rw-rw-   1 owner   group    7045120 Sep 02  3:47 music.mp3
    drwxrwxrwx   1 owner   group          0 Aug 31 18:50 e-books
    -rw-rw-rw-   1 owner   group        380 Sep 02  3:40 module.py
    """
    listing = os.listdir(directory)

    result = []
    for basename in listing:
        file = os.path.join(directory, basename)

        # if the file is a broken symlink, use lstat to get stat for
        # the link
        try:
            stat_result = os.stat(file)
        except (OSError,AttributeError):
            stat_result = os.lstat(file)

        perms = filemode(stat_result.st_mode)  # permissions

        nlinks = stat_result.st_nlink   # number of links to inode
        if not nlinks:  # non-posix system, let's use a bogus value
            nlinks = 1

        if pwd and grp:
            # get user and group name, else just use the raw uid/gid
            try:
                uname = pwd.getpwuid(stat_result.st_uid).pw_name
            except KeyError:
                uname = stat_result.st_uid
            try:
                gname = grp.getgrgid(stat_result.st_gid).gr_name
            except KeyError:
                gname = stat_result.st_gid
        else:
            # on non-posix systems the only chance we use default
            # bogus values for owner and group
            uname = "owner"
            gname = "group"

        size = stat_result.st_size  # file size

        # stat.st_mtime could fail (-1) if file's last modification
        # time is too old, in that case we return local time as last
        # modification time.
        try:
            mtime = time.strftime("%b %d %H:%M",
time.localtime(stat_result.st_mtime))
        except ValueError:
            mtime = time.strftime("%b %d %H:%M")

        # if the file is a symlink, resolve it, e.g. "symlink ->
real_file"
        if stat.S_ISLNK(stat_result.st_mode):
            basename = basename + " -> " + os.readlink(file)

        # formatting is matched with proftpd ls output
        result.append("%s %3s %-8s %-8s %8s %s %s\r\n" %(
            perms, nlinks, uname, gname, size, mtime, basename))

    return ''.join(result)

if __name__ == '__main__':
    before = time.time()
    format_list(r'C:\windows\system32')
    print time.time() - before





More information about the Python-ideas mailing list