No space left on device

Jimmy Retzlaff jimmy at retzlaff.com
Wed Feb 20 17:47:09 EST 2002


Sue said:
>I have a program that generates a LOT of small files in a 
>Windows2000 subdirectory.  It is my understanding (often faulty) 
>that there is no limit to the number of files such a subdirectory can 
>contain.  However, I am getting an error when I hit about 21,843 
>files.

>  File "C:\Python21\Scripts\Tests\ManyFiles.py", line 14, in ?
>    fp = open(fileName, 'wb')
>IOError: [Errno 28] No space left on device: 'c:\\legacystoret\\~916-
>22850tst.pck'

I just ran your code on my Windows XP machine and it created all 100K
files. It became extremely slow by the end - it started out creating
about 600 files per second and at the end it was down to about 3 files
per second. I'm running Python 2.2 and my partition is formatted with
NTFS. FAT variants may have more trouble with this if that is what your
partition is.

A few years back I had a nightly process that created, processed, and
deleted about 30K files in a directory. It ran on Python 1.5 and later
1.5.2 on an NTFS partition under NT4. I later restructured it to use a
directory tree because directory access can become excruciatingly slow
when you get that many files - I assume filename lookups do a linear
search. If you can come up with a hierarchy for your files, directories
can help immensely. For example, if you are storing files representing
people, you could separate them by the first letter of their last name -
so Giller would go in the "G" directory and Retzlaff in the "R"
directory. If you still have tons of files in each of those directories,
you could further subdivide those - so Giller would go in the "G\I"
directory and so on. Then when you ask the file system for a file, it
doesn't have to search through 100K file names, but something like 200
(26 to find "G" plus 26 to find "I" plus around 100K/26/26 to find the
file, assuming a relatively even distribution). A similar approach
helped my process' performance significantly. And it might help you get
around whatever problem you're running up against.

Jimmy




More information about the Python-list mailing list