[Tutor] Simultaneous read and write on file

Peter Otten __peter__ at web.de
Tue Jan 19 04:52:20 EST 2016


Anshu Kumar wrote:

> Hello All,
> 
> So much Thanks for your response.
> 
> Here is my actual scenario. I have a csv file and it would already be
> present. I need to read and remove some rows based on some logic. I have
> written earlier two separate file opens which I think was nice and clean.
> 
> actual code:
> 
> with open(file_path, 'rb') as fr:
>     for row in csv.DictReader(fr):
>         #Skip for those segments which are part of overridden_ids
>         if row['id'] not in overriden_ids:

Oops typo; so probably not your actual code :(

>             segments[row['id']] = {
>                 'id': row['id'],
>                 'attrib': json.loads(row['attrib']),
>                 'stl': json.loads(row['stl']),
>                 'meta': json.loads(row['meta']),
>             }
> #rewriting files with deduplicated segments
> with open(file_path, 'wb') as fw:
>     writer = csv.UnicodeWriter(fw)
>     writer.writerow(["id", "attrib", "stl", "meta"])
>     for seg in segments.itervalues():
>         writer.writerow([seg['id'], json.dumps(seg["attrib"]),
> json.dumps(seg["stl"]), json.dumps(seg["meta"])])
> 
> 
> I have got review comments to improve this block by having just single
> file open and minimum memory usage.

Are the duplicate ids stored in overridden_ids or are they implicitly 
removed by overwriting them in

segments[row["id"]] = ...

? If the latter, does it matter whether the last or the first row with a 
given id is kept?



More information about the Tutor mailing list