convert script awk in python
Peter Otten
__peter__ at web.de
Thu Mar 25 04:51:10 EDT 2021
On 25/03/2021 08:14, Loris Bennett wrote:
> I'm not doing that, but I am trying to replace a longish bash pipeline
> with Python code.
>
> Within Emacs, often I use Org mode[1] to generate date via some bash
> commands and then visualise the data via Python. Thus, in a single Org
> file I run
>
> /usr/bin/sacct -u $user -o jobid -X -S $start -E $end -s COMPLETED -n | \
> xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 " " $9}' | sed 's/%//g'
>
> The raw numbers are formatted by Org into a table
>
> | cpu_eff | mem_eff |
> |---------+---------|
> | 96.6 | 99.11 |
> | 93.43 | 100.0 |
> | 91.3 | 100.0 |
> | 88.71 | 100.0 |
> | 89.79 | 100.0 |
> | 84.59 | 100.0 |
> | 83.42 | 100.0 |
> | 86.09 | 100.0 |
> | 92.31 | 100.0 |
> | 90.05 | 100.0 |
> | 81.98 | 100.0 |
> | 90.76 | 100.0 |
> | 75.36 | 64.03 |
>
> I then read this into some Python code in the Org file and do something like
>
> df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0])
> cpu_data = df.loc[: , "cpu_eff"]
> mem_data = df.loc[: , "mem_eff"]
>
> ...
>
> n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5))
> n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5))
>
> which generates nice histograms.
>
> I decided rewrite the whole thing as a stand-alone Python program so
> that I can run it as a cron job. However, as a novice Python programmer
> I am finding translating the bash part slightly clunky. I am in the
> middle of doing this and started with the following:
>
> sacct = subprocess.Popen(["/usr/bin/sacct",
> "-u", user,
> "-S", period[0], "-E", period[1],
> "-o", "jobid", "-X",
> "-s", "COMPLETED", "-n"],
> stdout=subprocess.PIPE,
> )
>
> jobids = []
>
> for line in sacct.stdout:
> jobid = str(line.strip(), 'UTF-8')
> jobids.append(jobid)
>
> for jobid in jobids:
> seff = subprocess.Popen(["/usr/bin/seff", jobid],
> stdin=sacct.stdout,
> stdout=subprocess.PIPE,
> )
The statement above looks odd. If seff can read the jobids from stdin
there should be no need to pass them individually, like:
sacct = ...
seff = Popen(
["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE,
universal_newlines=True
)
for line in seff.communicate()[0].splitlines():
...
> seff_output = []
> for line in seff.stdout:
> seff_output.append(str(line.strip(), "UTF-8"))
>
> ...
>
> but compared the to the bash pipeline, this all seems a bit laboured.
>
> Does any one have a better approach?
>
> Cheers,
>
> Loris
>
>
>> -----Original Message-----
>> From: Cameron Simpson <cs at cskk.id.au>
>> Sent: Wednesday, March 24, 2021 6:34 PM
>> To: Avi Gross <avigross at verizon.net>
>> Cc: python-list at python.org
>> Subject: Re: convert script awk in python
>>
>> On 24Mar2021 12:00, Avi Gross <avigross at verizon.net> wrote:
>>> But I wonder how much languages like AWK are still used to make new
>>> programs as compared to a time they were really useful.
>>
>> You mentioned in an adjacent post that you've not used AWK since 2000.
>> By contrast, I still use it regularly.
>>
>> It's great for proof of concept at the command line or in small scripts, and
>> as the innards of quite useful scripts. I've a trite "colsum" script which
>> does nothing but generate and run a little awk programme to sum a column,
>> and routinely type "blah .... | colsum 2" or the like to get a tally.
>>
>> I totally agree that once you're processing a lot of data from places or
>> where a shell script is making long pipelines or many command invocations,
>> if that's a performance issue it is time to recode.
>>
>> Cheers,
>> Cameron Simpson <cs at cskk.id.au>
>
> Footnotes:
> [1] https://orgmode.org/
>
More information about the Python-list
mailing list