write a 20GB file
Dave Angel
davea at ieee.org
Fri May 14 07:04:37 EDT 2010
Jackie Lee wrote:
> Hello there,
>
> I have a 22 GB binary file, a want to change values of specific
> positions. Because of the volume of the file, I doubt my code a
> efficient one:
>
> #! /usr/bin/env python
> #coding=utf-8
> import sys
> import struct
>
> try:
> f=open(sys.argv[1],'rb+')
> except (IOError,Exception):
> print '''usage:
> scriptname segyfilename
> '''
> sys.exit(1)
>
> #skip EBCDIC header
> try:
> f.seek(3200)
> except Exception:
> print 'Oops! your file is broken..'
>
> #read binary header
> binhead = f.read(400)
> ns = struct.unpack('>h',binhead[20:22])[0]
> if ns < 0:
> print 'file read error'
> sys.exit(1)
>
> #read trace header
> while True:
> f.seek(28,1)
> f.write(struct.pack('>h',1))
> f.seek(212,1)
> f.seek(ns*4,1)
>
> f.close()
>
>
I don't see a question anywhere. So perhaps you just want comments on
your code.
1) How do you plan to test this?
2) Consider doing a lot more checking to see that you have in fact a
file of the right type.
3) Fix indentation - perhaps you've accidentally used a tab in the source.
4) Provide a termination condition for the while True loop, which
currently will (I think) go forever, or perhaps until the disk fills up.
5) Depending on the purpose of this file, you should consider making the
changes on a copy, then deleting and renaming. As it stands, if the
program gets aborted part way through, there's no way to know how far it
got. Since it's just clobbering bytes, it would be safe to rerun the
same program again, but many times that's not the case. And this
program clearly isn't finished yet, so perhaps it's not true here either.
6) I don't see anything inefficient about it. The nature of the problem
is going to be very slow (for small values of ns), but I don't know what
your code could do to speed it up. Perhaps make sure the file is on a
fast drive, and not RAID 5.
DaveA
More information about the Python-list
mailing list