Python v.s. huge files PROBLEM!!
djw
dwelch91 at nospam.attbi.com
Fri Jul 19 10:50:11 EDT 2002
Jose Rivera wrote:
> Hi..
> Scenario:
> OS : WinNT 4.0
> FileName: RESPALDO_MENSUAL_Data.MDF
> Size : 243,386,941,440 bytes
>
> Problem:
> I want to copy this file to another disk. Both disks have 500 GB of
> free space.
>
> Microsoft problem:
> You May Not Be Able to Copy Large Files on Computers That Are
> Running Windows NT 4.0 or Windows 2000 (Q259837)
>
> Workaround suggested by Microsoft:
> Use Backup / Restore utilities
> Result: They didn't work either... using HP OmniBack
>
> Workaround made by us:
> Make a python program that read and writes to the other file in
> theother disk.
>
> Python Code:
>
> import sys
>
> if len(sys.argv) != 3:
> print 'Format:'
> print '\t pyCopy.exe SourceFile EndFile'
> else:
> fn1=sys.argv[1]
> fn2=sys.argv[2]
> f1=open(fn1,'rb')
> f2=open(fn2,'wb')
> data=f1.read(1024*1000)
> while data:
> f2.write(data)
> data=f1.read(1024*1000)
> f1.close()
> f2.close()
>
> Result:
> IOError: [Errno 22] Invalid argument
>
> Question:
> Is there anything wrong?
If buffered I/O doesn't work, how about non-buffered I/O?
According to MSDN:
Use CreateFile for Non-Buffered File I/O
If your application performs file input or output without using the
intermediate buffering or caching provided by the system, the
application must call CreateFile with the FILE_FLAG_NO_BUFFERING flag
set when opening the file. In this case, your application must pass a
buffer to ReadFile or WriteFile that is correctly aligned for the
device. Note that the alignments changed for some devices with Window
2000. For more information, see the FILE_FLAG_NO_BUFFERING description
in the CreateFile section and the VirtualAlloc section.
One way to align buffers on integer multiples of the volume sector size
is to use VirtualAlloc to allocate the buffers. This function allocates
memory that is aligned on addresses that are integer multiples of the
operating system's memory page size. Because both memory page and volume
sector sizes are powers of 2, this memory is also aligned on addresses
that are integer multiples of a volume's sector size. Your application
must make sure that it reads and writes in multiples of the actual
sector size of the input or output device. An application can determine
a volume's sector size by calling the GetDiskFreeSpaceEx function.
So how about some code that looks vaguely like this (only tested a
little bit, no error checking and no guarantees!)
(Note the use of win32con.FILE_FLAG_NO_BUFFERING):
import win32file, win32con, win32api
fn1=sys.argv[1]
fn2=sys.argv[2]
f1 = win32file.CreateFile( fn1,
win32con.GENERIC_READ,
win32con.FILE_SHARE_READ,
None,
win32con.OPEN_EXISTING,
win32con.FILE_FLAG_NO_BUFFERING,
0)
f2 = win32file.CreateFile( fn2,
win32con.GENERIC_WRITE,
win32con.FILE_SHARE_WRITE,
None,
win32con.CREATE_ALWAYS,
win32con.FILE_FLAG_NO_BUFFERING,
0)
# bad assumption of using C: in next line! Need to change...
spc, bps, fc, tc = win32file.GetDiskFreeSpace( "c:\\" )
bpc = spc * bps # = 4096 on my XP box
while 1:
hr, r1 = win32file.ReadFile( f1,
bpc )
e, bw = win32file.WriteFile( f2,
r1 )
if bw == 0: break
win32api.CloseHandle( f1 )
win32api.CloseHandle( f2 )
Worked on my system for a 19Mb file... way smaller than yours, but
I don't have many 243Gb files laying around... in fact this was the
largest file on my harddrive!
I could not figure out how to tell when ReadFile() was complete. The
param that is usually passed back from the Win32API -
lpNumberOfBytesRead - is not returned by the Python wrapping of the
function (don't know why not). However, the check for bytes written
(returned by win32file.WriteFile() seems to do the trick.
Also, note that I used GetDiskFreeSpace(), not the Ex version. I think
there is an error in MS's docs - the Ex version doesn't return the
cluster sizes and such like the non-Ex version.
Regards,
Don
More information about the Python-list
mailing list