[Python-Dev] Builtin open() too slow

Christian Heimes lists at cheimes.de
Sat Mar 12 17:14:27 CET 2011


Am 12.03.2011 16:13, schrieb Lukas Lueg:
> i've a storage engine that stores a lot of files (e.g. > 10.000) in
> one path. Running the code under cProfile, I found that with a total
> CPU-time of 1,118 seconds, 121 seconds are spent in 27.013 calls to
> open(). The number of calls is not the problem; however I find it
> *very* discomforting that Python spends about 2 minutes out of 18
> minutes of cpu time just to get a file-handle after which it can spend
> some other time to read from them.
> 
> May this be a problem with the way Python 2.7 gets filehandles from
> the OS or is it a problem with large directories itself?

Your issue is most like not a Python bug. The open() function in 2.7 is
a thin wrapper around fopen(3). You didn't tell us how you profiled your
program and what's your operating system, configuration (file system)
and hardware. Are you sure you have measured two CPU minutes and not two
minutes runtime mostly spent in I/O?

Recently I've seen a system that sometimes takes more than a minute or
two just to create and remove 16 files in 16 directories on a remote NFS
cluster. 10k files in one directory are not a bottleneck, if you use a
good file system (XFS, ext* with hashed btree index). I've over a
million files in one directory because it's as fast as 1000 directories
with 1000 files each for all my performance relevant operations --
sometimes even faster.

Christian



More information about the Python-Dev mailing list