Creating a file with $SIZE

Bryan Olson fakeaddress at nowhere.org
Fri Mar 14 08:41:45 EDT 2008


Robert Bossy wrote:
> Bryan Olson wrote:
>> Robert Bossy wrote:  
>>>> Robert Bossy wrote:       
>>>>> Indeed! Maybe the best choice for chunksize would be the file's buffer
>>>>> size...         
>>
>> That bit strikes me as silly.
>>   
> The size of the chunk must be as little as possible in order to minimize 
> memory consumption. However below the buffer-size, you'll end up filling 
> the buffer anyway before actually writing on disk.

First, which buffer? The file library's buffer is of trivial size,
a few KB, and if we wanted to save even that we'd use os.open and
have no such buffer at all. The OS may set up a file-specific
buffer, but again those are small, and we could fill our file much
faster with larger writes.

Kernel buffers/pages are dynamically assigned on modern operating
systems. There is no particular buffer size for the file if you mean
the amount of kernel memory holding the written data. Some OS's
do not buffer writes to disk files; the write doesn't return until
the data goes to disk (though they may cache it for future reads).

To fill the file fast, there's a large range of reasonable sizes
for writing, but user-space buffer size - typically around 4K - is
too small. 1 GB is often disastrously large, forcing paging to and
from disk to access the memory. In this thread, Matt Nordhoff used
10MB; fine size today, and probably for several years to come.

If the OP is writing to a remote disk file to test network
throughput, there's another size limit to consider. Network file-
system protocols do not steam very large writes; the client has to
break a large write into several smaller writes. NFS version 2 had
a limit of 8 KB; version 3 removed the limit by allowing the server
to tell the client the largest size it supports. (Version 4 is now
out, in hundreds of pages of RFC that I hope to avoid reading.)


-- 
--Bryan



More information about the Python-list mailing list