write a 20GB file
Jackie Lee
jackie.space at gmail.com
Fri May 14 07:32:15 EDT 2010
Thx, Dave,
The code works fine. I just don't know how f.write works. It says that
file.write won't write the file until file.close or file.flush. So I
don't know if the following one is more efficient (sorry I forget to
add condition to break the loop):
#! /usr/bin/env python
#coding=utf-8
import sys
import struct
try:
f=open(sys.argv[1],'rb+')
except (IOError,Exception):
print '''usage:
scriptname segyfilename
'''
sys.exit(1)
#skip EBCDIC header
try:
f.seek(3200)
except Exception:
print 'Oops! your file is broken..'
#read binary header
binhead = f.read(400)
ns = struct.unpack('>h',binhead[20:22])[0]
if ns < 0:
print 'file read error'
sys.exit(1)
#read trace header
while True:
f.seek(28,1)
if f.read(2) == '':
break
f.seek(-2,1)
f.write(struct.pack('>h',1))
f.seek(210,1)
f.seek(ns*4,1)
f.close()
On Fri, May 14, 2010 at 6:04 PM, Dave Angel <davea at ieee.org> wrote:
> Jackie Lee wrote:
>>
>> Hello there,
>>
>> I have a 22 GB binary file, a want to change values of specific
>> positions. Because of the volume of the file, I doubt my code a
>> efficient one:
>>
>> #! /usr/bin/env python
>> #coding=utf-8
>> import sys
>> import struct
>>
>> try:
>> f=open(sys.argv[1],'rb+')
>> except (IOError,Exception):
>> print '''usage:
>> scriptname segyfilename
>> '''
>> sys.exit(1)
>>
>> #skip EBCDIC header
>> try:
>> f.seek(3200)
>> except Exception:
>> print 'Oops! your file is broken..'
>>
>> #read binary header
>> binhead = f.read(400)
>> ns = struct.unpack('>h',binhead[20:22])[0]
>> if ns < 0:
>> print 'file read error'
>> sys.exit(1)
>>
>> #read trace header
>> while True:
>> f.seek(28,1)
>> f.write(struct.pack('>h',1))
>> f.seek(212,1)
>> f.seek(ns*4,1)
>>
>> f.close()
>>
>>
>
> I don't see a question anywhere. So perhaps you just want comments on your
> code.
>
> 1) How do you plan to test this?
> 2) Consider doing a lot more checking to see that you have in fact a file of
> the right type.
> 3) Fix indentation - perhaps you've accidentally used a tab in the source.
> 4) Provide a termination condition for the while True loop, which currently
> will (I think) go forever, or perhaps until the disk fills up.
> 5) Depending on the purpose of this file, you should consider making the
> changes on a copy, then deleting and renaming. As it stands, if the program
> gets aborted part way through, there's no way to know how far it got. Since
> it's just clobbering bytes, it would be safe to rerun the same program
> again, but many times that's not the case. And this program clearly isn't
> finished yet, so perhaps it's not true here either.
> 6) I don't see anything inefficient about it. The nature of the problem is
> going to be very slow (for small values of ns), but I don't know what your
> code could do to speed it up. Perhaps make sure the file is on a fast
> drive, and not RAID 5.
>
> DaveA
>
>
--
Jackie
More information about the Python-list
mailing list