write a 20GB file

Fri May 14 07:04:37 EDT 2010

Jackie Lee wrote:
> Hello there,
>
> I have a 22 GB binary file, a want to change values of specific
> positions. Because of the volume of the file, I doubt my code a
> efficient one:
>
> #! /usr/bin/env python
> #coding=utf-8
> import sys
> import struct
>
> try:
>         f=open(sys.argv[1],'rb+')
> except (IOError,Exception):
>     print '''usage:
>         scriptname segyfilename
> '''
>     sys.exit(1)
>
> #skip EBCDIC header
> try:
>     f.seek(3200)
> except Exception:
>     print 'Oops! your file is broken..'
>
> #read binary header
> binhead = f.read(400)
> ns = struct.unpack('>h',binhead[20:22])[0]
> if ns < 0:
>     print 'file read error'
>     sys.exit(1)
>
> #read trace header
> while True:
>     f.seek(28,1)
>     f.write(struct.pack('>h',1))
>     f.seek(212,1)
>     f.seek(ns*4,1)
>
> f.close()
>
>   
I don't see a question anywhere.  So perhaps you just want comments on 
your code.

1) How do you plan to test this?
2) Consider doing a lot more checking to see that you have in fact a 
file of the right type.
3) Fix indentation - perhaps you've accidentally used a tab in the source.
4) Provide a termination condition for the while True loop, which 
currently will (I think) go forever, or perhaps until the disk fills up.
5) Depending on the purpose of this file, you should consider making the 
changes on a copy, then deleting and renaming.  As it stands, if the 
program gets aborted part way through, there's no way to know how far it 
got.  Since it's just clobbering bytes, it would be safe to rerun the 
same program again, but many times that's not the case.  And this 
program clearly isn't finished yet, so perhaps it's not true here either.
6) I don't see anything inefficient about it.  The nature of the problem 
is going to be very slow (for small values of ns), but I don't know what 
your code could do to speed it up.  Perhaps make sure the file is on a 
fast drive, and not RAID 5.

DaveA