[Tutor] Simultaneous read and write on file

Martin A. Brown martin at linux-ip.net
Mon Jan 18 23:41:10 EST 2016


Hi all,

>>>> The + modes are deceptively appealing but they are full of dangers
>>>> for precisely the reasons you have discovered(*).
>
>> Yes and so have I. Maybe twice in 30 years of programming. It's 
>> sometimes necessary but it's much, much harder to get right and 
>> very easy to get wrong, usually with data corruption as a result.

Yes.
Yes.
Yes.
Yes.
Yes.

> I may have done it a little more than that; I agree it is very 
> rare. I may be biased because I was debugging exactly this last 
> week. (Which itself is an argument against mixed rerad/write with 
> one file - it was all my own code and I spent over a day chasing 
> this because I was looking in the wrong spot).

Oh yes.  Ooof.  Today's decisions are tomorrow's albatross.

Speaking of which, I have an albatross in very good condition and 
I'm looking for a buyer.  Anybody interested in a short-tailed 
albatross with good manners.  Looking for a good home.

>> So for a beginner I would never encourage it. For an experienced 
>> programmer sure' if there is no other option (and especially if 
>> you have fixed size records where things get easier).
>>
>>> Tip for new players: if you do any .write()s, remember to do a 
>>> .flush() before doing a seek or a read
>>
>> That's exactly my point. There are so many things you have to do 
>> extra when working in mixed mode. Too easy to treat things like 
>> normal mode files and get it wrong. Experts can do it and make it 
>> work, but mostly it's just not needed.
>
> Yes. You're write - for simplicity and reliability two distinct 
> open file instances are much easier.

Yes, he's write [sic].  He writes a bunch!  ;)

[Homonyms mess me up when I'm typing, all sew.]

OK, a bit more seriously, I will add a thought or two.

Modern filesystems are beautiful.  They are fast, reliable and 
efficient.  Application software, e.g. Python, can be hairy (see the 
points of Alan and Cameron earlier in this thread).  Why not take 
advantage of filesystem atomicity, a feature guarantee to userspace 
from all (?) modern local filesystems.

Options:

  * If disk throughput is not a problem, then there's practically 
    nothing but a benefit to reading from input file, writing to 
    output file, closing both and renaming (effectively squashing 
    the original file)

      import os
      A = open('a', 'w')
      A.write('hammy')
      A.close()
      A = open('a', 'r')
      B = open('b', 'w')
      data = A.read()                 # -- processing handled
      data = data.replace('m', 'p')   #    here, until happy
      B.write(data)
      B.close()
      A.close()
      os.rename(B.name, A.name)       # -- atomic [0]

  * Alternative:  If disk throughput is a problem, this is an 
    argument for using a database system where this class of data 
    integrity problem has been solved for the application 
    developer.

I'd suggest measuring the amount of time it takes to read, rewrite 
and os.rename() the entire file before deciding you need to 
undertake the massive complexity of modifying an existing file in 
situ.

If you can avoid modifying an existing file, don't bother with it. 
You will likely bring yourself (and maybe even others) headache.  

For example, if another reader of the file comes along while you are 
performing your in situ modification magic tricks, they (and you) 
will have no guarantees about what data they will receive.  That 
will be left up to the operating system (i.e. kernel).

So, take control of the data back into your own hands by taking 
adavantage of the beauty of the filesystem.

Filesystem atomicity!

Good luck,

-Martin

 [0] Or just about as close as conceivably possible to atomic as you 
     can be guaranteed in userspace applications.

-- 
Martin A. Brown
http://linux-ip.net/


More information about the Tutor mailing list