convert script awk in python
Loris Bennett
loris.bennett at fu-berlin.de
Thu Mar 25 03:14:33 EDT 2021
"Avi Gross" <avigross at verizon.net> writes:
> Just to be clear, Cameron, I retired very early and thus have had no reason
> to use AWK in a work situation and for a while was not using UNIX-based
> machines. I have no doubt I would have continued using WK as one part of my
> toolkit for years albeit less often as I found other tools better for some
> situations, let alone the kind I mentioned earlier that are not text-file
> based such as databases.
>
> It is, as noted, a great tool and if you only had one or a few tools like it
> available, it can easily be bent and twisted to do much of what the others
> do as it is more programmable than most. But following that line of
> reasoning, fairly simple python scripts can be written with python -c "..."
> or by pointing to a script
>
> Anyone have a collection of shell scripts that can be used in pipelines
> where each piece is just a call to python to do something simple?
I'm not doing that, but I am trying to replace a longish bash pipeline
with Python code.
Within Emacs, often I use Org mode[1] to generate date via some bash
commands and then visualise the data via Python. Thus, in a single Org
file I run
/usr/bin/sacct -u $user -o jobid -X -S $start -E $end -s COMPLETED -n | \
xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 " " $9}' | sed 's/%//g'
The raw numbers are formatted by Org into a table
| cpu_eff | mem_eff |
|---------+---------|
| 96.6 | 99.11 |
| 93.43 | 100.0 |
| 91.3 | 100.0 |
| 88.71 | 100.0 |
| 89.79 | 100.0 |
| 84.59 | 100.0 |
| 83.42 | 100.0 |
| 86.09 | 100.0 |
| 92.31 | 100.0 |
| 90.05 | 100.0 |
| 81.98 | 100.0 |
| 90.76 | 100.0 |
| 75.36 | 64.03 |
I then read this into some Python code in the Org file and do something like
df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0])
cpu_data = df.loc[: , "cpu_eff"]
mem_data = df.loc[: , "mem_eff"]
...
n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5))
n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5))
which generates nice histograms.
I decided rewrite the whole thing as a stand-alone Python program so
that I can run it as a cron job. However, as a novice Python programmer
I am finding translating the bash part slightly clunky. I am in the
middle of doing this and started with the following:
sacct = subprocess.Popen(["/usr/bin/sacct",
"-u", user,
"-S", period[0], "-E", period[1],
"-o", "jobid", "-X",
"-s", "COMPLETED", "-n"],
stdout=subprocess.PIPE,
)
jobids = []
for line in sacct.stdout:
jobid = str(line.strip(), 'UTF-8')
jobids.append(jobid)
for jobid in jobids:
seff = subprocess.Popen(["/usr/bin/seff", jobid],
stdin=sacct.stdout,
stdout=subprocess.PIPE,
)
seff_output = []
for line in seff.stdout:
seff_output.append(str(line.strip(), "UTF-8"))
...
but compared the to the bash pipeline, this all seems a bit laboured.
Does any one have a better approach?
Cheers,
Loris
> -----Original Message-----
> From: Cameron Simpson <cs at cskk.id.au>
> Sent: Wednesday, March 24, 2021 6:34 PM
> To: Avi Gross <avigross at verizon.net>
> Cc: python-list at python.org
> Subject: Re: convert script awk in python
>
> On 24Mar2021 12:00, Avi Gross <avigross at verizon.net> wrote:
>>But I wonder how much languages like AWK are still used to make new
>>programs as compared to a time they were really useful.
>
> You mentioned in an adjacent post that you've not used AWK since 2000.
> By contrast, I still use it regularly.
>
> It's great for proof of concept at the command line or in small scripts, and
> as the innards of quite useful scripts. I've a trite "colsum" script which
> does nothing but generate and run a little awk programme to sum a column,
> and routinely type "blah .... | colsum 2" or the like to get a tally.
>
> I totally agree that once you're processing a lot of data from places or
> where a shell script is making long pipelines or many command invocations,
> if that's a performance issue it is time to recode.
>
> Cheers,
> Cameron Simpson <cs at cskk.id.au>
Footnotes:
[1] https://orgmode.org/
--
This signature is currently under construction.
More information about the Python-list
mailing list