how to remove n bytes in a file?
Tim Chase
python.list at tim.thechases.com
Sat Sep 2 10:34:46 EDT 2006
> Suppose we have a very large file, and wanna remove 'n' bytes in the
> middle of the file. My thought is:
> 1, read() until we reach the bytes should be removed, and mark the
> position as 'pos'.
> 2, seek(tell() + n) bytes
> 3, read() until we reach the end of the file, into a variable, say 'a'
> 4, seek(pos) back to 'pos'
> 5, write(a)
> 6, truncate()
>
> If the file is really large, the performance may be a problem.
The biggest problem I see would be trying to read some massive
portion if step #3 involves a huge amount of data. If you're
dealing with a multi-gigabyte file, and you want to delete 5
bytes beginning at 20 bytes into the file, step #3 involves
reading in file_size-(20+5) bytes into memory, and then spewing
them all back out. A better way might involve reading a
fixed-size chunk each time and then writing that back to its
proper offset.
def shift(f, offset, size, buffer_size=1024*1024):
"""deletes a portion of size "size" from file "f", starting at
offset, and shifting the remainder of the file to fill.
The buffer_size can be tweaked for performance preferences,
defaulting to 1 megabyte.
"""
f.seek(offset+size)
while True:
buffer = f.read(buffer_size)
if not buffer: break
f.seek(offset)
f.write(buffer)
f.seek(buffer_size,1)
offset += buffer_size
f.truncate()
if __name__ == '__main__':
offset = ord('p')
size = 5
buffer_size = 30
from StringIO import StringIO
f = StringIO(''.join([chr(i) for i in xrange(256)]))
print repr(f.read())
print '=' * 50
f.seek(0)
shift(f, offset, size, buffer_size)
f.seek(0)
print repr(f.read())
> Is there a clever way to finish? Could mmap() help? Thx
No idea regarding mmap().
-tkc
More information about the Python-list
mailing list