[Tutor] Pass arguments from bash script to embedded python script

Cameron Simpson cs at cskk.id.au
Wed Oct 23 17:58:59 EDT 2019


On 23Oct2019 15:08, Stephen P. Molnar <s.molnar at sbcglobal.net> wrote:
>I have revised my script to make use of the def function:

There's no "def function". A "def" statement defines a function.  
Anyway, remarks inline below:

>fileList = []
>filesList = []
>
>for files in glob.glob("*.log"):
>    fileName, fileExtension = os.path.splitext(files)
>    fileList.append(fileName)
>    filesList.append(files)

This iterates over a list of filenames. So "files" is a single filename; 
I would not make this a plural. 

Also, "fileList" and "filesList" are so similar that I would expect them 
to cause confusion. And, in fact, they did confuse me later.

>fname = fileList

And here "fileList" is a list of filenames. So "fname" _should_ be 
plural. This may seem like nitpicking, but getting this right is very 
important for readability and therefore debugging. So much so that I 
initially misread what your loop above did.

>for fname in fname:
>    fname

This loop does nothing _inside_ the loop; why bother? However, its 
control action is to iterate over the list "fname", assigning each value 
to...  "fname"!

The end result of this is that after the loop, "fname" is no longer a 
list of filenames, it is now just the _last_ filename.

Again, getting plurality consistent would probably prevent you from this 
result.

>fname1 = fname+'.log'
>fname2 = fname+'-dG'
>print('fname = ', fname)
>print('fname1 = ',fname)
>print('fname2 = ',fname2)

Ok, preparing a filename (by reassembling the stuff you undid earlier) 
and the associated "-dG" name. And printing them out (which is fine, an 
aid to debugging).

>def dG(filesList):
>    data = np.genfromtxt(fname1, usecols=(1), skip_header=27, 
>skip_footer=1, encoding=None)
>    np.savetxt(fname2, data, fmt='%.10g', header=fname)
>    return(data)

The function dG does not use its parameter "filesList". Why do you pass 
it in?

Also, it is a source of bugs to use a parameter with the same name as a 
global because inside the function you might work on the parameter, 
_thinking_ you were working on the global. This is called "shadowing", 
and linters will say something like "the parameter filesList shadows a 
global of the same name" in order to point this out to you.

Then within the function you use the _global_ names "fname", "fname1" 
and "fname2". Normally a function will never use any global names; that 
is why we pass parameters to them. The whole point is to encapsulate 
their tasks as a generic method of doing something, _not_ dependent on 
any outside state.

I would have written this function thus:

    def dG(basic_name):
        src_filename = basic_name + '.log'
        dst_filename = basic_name + '-dG'
        data = np.genfromtxt(src_filename, usecols=(1), skip_header=27, skip_footer=1, encoding=None)
        np.savetxt(dst_filename, data, fmt='%.10g', header=basic_name)
        return data

so that has no dependence on external global names.

>data = dG(filesList)

Again, the function never uses "filesList" - there's no point in passing 
it in. I would have used the revised function above and gone:

    data = dG(fname)

>It seems to work with one little (actually major) problem. The only 
>result saved is for the last file in the list 14-7.log.
>Which s the last file  in the list.

That is because of the earlier for-loop I pointed out, which puts just 
the last filename into fname.

Regardless, your script will only ever process one file because the call 
to dG() is not inside any kind of iteration; it will only be run once.

Consider this:

    for fname in filesList:
        data = dG(fname)
        print(data)

which calls the function (in this case my revised function) once for 
each name in filesList.

You could put a print call inside dG() to see what fnam it was 
processing to make things more obvious.

Finally, I recommend avoiding global variables altogether - they are a 
rich source of bugs, particularly when some function quietly uses a 
global. Instead you can put _all_ the code into functions, eg:

    def dG(......):
        ..... as above ...

    def main(argv):
        fileList = []
        filesList = []
        for files in glob.glob("*.log"):
            fileName, fileExtension = os.path.splitext(files)
            fileList.append(fileName)
            filesList.append(files)
        for fname in filesList:
            data = dG(fname)
            print(data)

    # call the main function
    main()

By structuring things this way there are no global variables and you 
cannot accidentally use one in dG().

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Tutor mailing list