[Tutor] Running Python Scripts at same time

Cameron Simpson cs at cskk.id.au
Sat Jun 27 19:05:39 EDT 2020


On 26Jun2020 11:21, John Weller <john at johnweller.co.uk> wrote:
>I have a Python program which will be running 24/7 (I hope 😊).  It is 
>generating data in a file which I want to clean up overnight.  The way 
>I am looking at doing it is to run a separate program as a Cron job at 
>midnight – will that work?  The alternative is to add it to the loop 
>and check for the time. I have tried researching this but only got even 
>more confused.

Running a separate program is perfectly reasonable.

And crontab is a perfect place for a regular task like this.

The primary issue usually is that you do not want both programms to be 
using the file at the same time.

Supposing the file were, say, a CSV file to which your long running 
programme (A) appended data. ANd that the clean up program (B) reads the 
CSV file, tidies some stuff, and rewrites the CSV file. You can imagine 
this sequence:

    - programme B opens the file and reads the data
    - programme B thinks about the data to clean it
    - programme A appends more data to the file
    - programme B rewrites the clean data into the file,
      _overwriting_ the new data programme A just appended

The usual process with a shared external file is to use a lock facility.  
These come in a few forms, and it is essential that both programme A and 
programme B use the same locking system.

One of the easiest and most portable is to make a lock file while you 
work with the file. If your data file is called "foo" you might use a 
lock fie called "foo.lock".

On a UNIX type system (includes Linux) you can atomicly make such a file 
like this:

    import os
    .......
    lockpath = datafilepath + '.lock'
    lockfd = os.open(lockpath, os.O_CREAT | os.O_EXCL | os.O_RDWR, 0)

That is a special mode of the OS "open" call (_not_ Python's default 
"open" builtin) whose parameters have the following meanings:

    - os.O_CREAT: create the file if missing
    - os.O_EXCL: ensure that the file is created - if it already exists 
      this raises an exception
    - os.O_RDWR: open the file for read and write
    - 0: the initial permissions, ensuring that the file is _not_ 
      readable or writable

See "man 2 open" on a UNIX system for the spec.

The combination of O_RDWR and 0 permissions means that if the file 
already exists (made by the "other" programme) then it won't have any 
permissions, which means we won't get read or write access and the open 
will fail. The nice thing about this is that the initial permissions are 
_immediate_ when the file is created by the OS - there's no tiny window 
where the file has read/write perms which then get removed - the OS 
ensures it. This is nice on networked file shares (if they are 
reliable).

Anyway, the upshort of the os.open() call above is that if the lockfile 
already exists, the open will fail, and otherwise it will succeed, 
preventing antoehr programme doing the same thing.

When finished, close the lockfd and remove the lock file:

    os.close(lockfd)
    os.remove(lockpath)

No, because the whole scenario is that occasionally both programms want 
the file at the same time, the os.open _will_ fail in that case. SO the 
idea is that you repeat it until it succeeds, then do your work:

    while True:
        try:
            lockfd = os.open(lockpath, os.O_CREAT | os.O_EXCL | os.O_RDWR, 0)
        except OSError as e:
            print("lock not obtained, sleeping")
            time.sleep(1)
        else:
            break
    .... work with the data file ...
    os.close(lockfd)
    os.remove(lockpath)

Put that logic in both programmes and you should be ok.

You can see a more elaborate version of this logic in my "makelockfile" 
function here:

    https://bitbucket.org/cameron_simpson/css/src/tip/lib/python/cs/fileutils.py#lines-527

(Atlassian are going to nuke that repo soon, alas, because they find 
mercurial too hard. But until then the link should be good.)

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Tutor mailing list