2018-04-11 2:03 GMT+02:00 Steven D'Aprano <steve@pearwood.info>:
[snip]
 
I shouldn't think that the number of files on disk is very important,
now that they're hidden away in the __pycache__ directory where they can
be ignored by humans. Even venerable old FAT32 has a limit of 65,534
files in a single folder, and 268,435,437 on the entire volume. So
unless the std lib expands to 16000+ modules, the number of files in the
__pycache__ directory ought to be well below that limit.
[snip] 

Hi all,

Just for information for everyone:
(I was a VMS system manager  more than a decade ago, and I know that Win NT (at least the core) is developed by a former VMS engineer. NTFS  is created on the bases of Files-11 (Files-11B) file system. And in both file systems the directory is a tree (in Files-11 it is a B-tree, maybe in NTFS it is different tree, but tree). Holding the files ordered alphabetically.
And if there are "too much" files then accessing files will be slower. (check for example the windows\system32 folder).

Of course it is not matter if there are some hundred or 1-2 thousand files. But the too much matters.

I did a little measurement (intentionally not used functions not to make the result wrong):



import os
import time

try:
os.mkdir('tmp_thousands_of_files')
except:
pass

name1 = 10001

start = time.time()
file_name = 'tmp_thousands_of_files/' + str(name1)
f = open(file_name, 'w')
f.write('aaa')
f.close()

stop = time.time()

file_time = stop-start

print(f'one file time {file_time} \n {start} \n {stop}')


for i in range(10002, 20000):
file_name = 'tmp_thousands_of_files/' + str(i)
f = open(file_name, 'w')
f.write('aaa')
f.close()



name2 = 10000

start = time.time()
file_name = 'tmp_thousands_of_files/' + str(name2)
f = open(file_name, 'w')
f.write('aaa')
f.close()

stop = time.time()

file_time = stop-start
print(f'after 10k, name before {file_time} \n {start} \n {stop}')


name3 = 20010

start = time.time()
file_name = 'tmp_thousands_of_files/' + str(name3)
f = open(file_name, 'w')
f.write('aaa')
f.close()

stop = time.time()

file_time = stop-start
print(f'after 10k, name after {file_time} \n {start} \n {stop}')

"""
result

c:\>python several_files_in_one_folder.py
one file time 0.0
1523476699.5144918
1523476699.5144918
after 10k, name before 0.015625953674316406
1523476714.622918
1523476714.6385438
after 10k, name after 0.0
1523476714.6385438
1523476714.6385438
"""


used: Python 3.6.1, windows 8.1, SSD drive

As you can see, when there an insertion into the beginning of the tree it is much slower then adding to the end. (yes, I know the list insertion is slow as well, but I saw VMS directory with 50k files, and the dir command gave 5-10 files then waited some seconds before the next 5-10 files ... ;-) )


BR,
George