[Tutor] Simultaneous read and write on file

Alan Gauld alan.gauld at btinternet.com
Tue Jan 19 04:12:06 EST 2016


On 19/01/16 05:41, Anshu Kumar wrote:

> Here is my actual scenario. I have a csv file and it would already be
> present. I need to read and remove some rows based on some logic. I have
> written earlier two separate file opens which I think was nice and clean.

Yes, it looks straightforward. The only possible issue is that
it reads the entire input file in before writing the output
which could become a memory hog.

> with open(file_path, 'rb') as fr:
>     for row in csv.DictReader(fr):
>         #Skip for those segments which are part of overridden_ids
>         if row['id'] not in overriden_ids:
>             segments[row['id']] = {
>                 'id': row['id'],
>                 'attrib': json.loads(row['attrib']),
>                 'stl': json.loads(row['stl']),
>                 'meta': json.loads(row['meta']),
>             }
> #rewriting files with deduplicated segments
> with open(file_path, 'wb') as fw:
>     writer = csv.UnicodeWriter(fw)
>     writer.writerow(["id", "attrib", "stl", "meta"])
>     for seg in segments.itervalues():
>         writer.writerow([seg['id'], json.dumps(seg["attrib"]),
> json.dumps(seg["stl"]), json.dumps(seg["meta"])])
> 
> 
> I have got review comments to improve this block by having just single
> file open and minimum memory usage.

I'd ignore the advice to use a single file. One extra file
handle is insignificant in memory terms and the extra simplicity
two handles brings is worth far more.
What I would do is open both files at the start and instead
of creating the segments just write the data direct to the
output file. That will slash your memory footprint.

Contrast that with using a single file:
You need to read a line. check its length, seek back to
the beginning of the line.
Create the new output string. Check its length.
If it is the same length(miracles happen!) just write the line
if it is shorter than the original write the new line,
then write spaces to fill the gap.
If it is longer than the original - oh dear. If you write it you will
overwrite part of your next line. So you need to do a look ahead to grab
the next line of data before writing.
But now your next line has to compare against
data.length-overlap.length and if the new line
is longer than that repeat.
And if your new line is longer than two old lines it gets even worse.
On top of that you now have a file that is partially full of new style
data while the rest is old style. Anyone trying to read that will get
very confused.
And we haven't even considered what to do about the lines you
want to delete...

In short this is not a situation where + mode is a good idea.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list