[Python-Dev] Caching directory files in import.c

James C. Ahlstrom jim@interet.com
Mon, 12 Nov 2001 09:30:02 -0500


"James C. Ahlstrom" wrote:

> Looking at the code, I saw that I could do an os.listdir(path),
> and record the directory file names into the same dictionary.
> Then it would not be necessary to perform a large number of
> fopen()'s.  The same dictionary lookup is used instead.
> 
> Is this a good idea???

I now have benchmarks based on 2.2a3 which compare the speed of
importing 100 modules from Python ./Lib for the original 2.2a3
versus the new logic that uses os.listdir to maintain a Python
dictionary of directory contents.  Note that this is not related
to importing from zip files.

The bottom line is that imports are 1.3 times faster for the
local drive, and 1.8 to 3.0 times faster for the network drive.

Benchmarks can be confusing.  Importing from the local C: takes about
3 seconds after a re-boot, but repeated imports lowers this to 1 second.
This must be a measure of Windows 2000's ability to cache file system
data.  Moving the "correct" directory, the one where the files really
reside, from the beginning to the end of sys.path increases this only
slightly for the local drive.  I believe the times after re-boot, when
the file cache is empty, is more representative of real Python imports.

When importing from a network drive, things are different.  Times are
quite consistent, and don't show the scatter after reboot.  They are
also much longer, indicating that Windows 2000 with Samba is relatively
ineffective in caching network file data.

The new logic using os.listdir shows little change from local drive to
network drive, and doesn't depend on the correct placement of the source
path in sys.path.

Here is the data:

                                   Original                Using os.listdir
                               ---------------------     ---------------------
Local drive, Start of path     3.2, 2.5, 3.2 -> 1.02     2.3, 2.5, 2.3 -> 0.87
Local drive, End of path       2.8, 3.9, 3.0 -> 1.32     Same as above.
Net drive, Start of path       5.7, 5.7, 5.7 -> 5.7      2.1, 2.1, 2.1 -> 1.8
Net drive, End of path         9.4, 9.4, 9.3 -> 9.35     2.1, 2.1, 2.1 -> 1.8

Benchmarks were performed on a Pentium 4 clone, 1.4 GHz, 256 Meg.
The machine was running Windows 2000.
Times are in seconds, and are the time to import about 100 modules from Lib.

"Local drive" means C:, "Net drive" means network using a Linux/Samba server.
"Start of path" means sys.path had its default value.
"End of path" means the correct Lib directory was moved to the end of sys.path.

Initial times are after a re-boot of the system, the time after "->" is the
time after repeated runs.  Times to import from C: after a re-boot are
rather highly variable, but are more realistic.

JimA