convert script awk in python
Loris Bennett
loris.bennett at fu-berlin.de
Thu Mar 25 06:16:37 EDT 2021
Peter Otten <__peter__ at web.de> writes:
> On 25/03/2021 08:14, Loris Bennett wrote:
>
>> I'm not doing that, but I am trying to replace a longish bash pipeline
>> with Python code.
>>
>> Within Emacs, often I use Org mode[1] to generate date via some bash
>> commands and then visualise the data via Python. Thus, in a single Org
>> file I run
>>
>> /usr/bin/sacct -u $user -o jobid -X -S $start -E $end -s COMPLETED -n | \
>> xargs -I {} seff {} | grep 'Efficiency' | sed '$!N;s/\n/ /' | awk '{print $3 " " $9}' | sed 's/%//g'
>>
>> The raw numbers are formatted by Org into a table
>>
>> | cpu_eff | mem_eff |
>> |---------+---------|
>> | 96.6 | 99.11 |
>> | 93.43 | 100.0 |
>> | 91.3 | 100.0 |
>> | 88.71 | 100.0 |
>> | 89.79 | 100.0 |
>> | 84.59 | 100.0 |
>> | 83.42 | 100.0 |
>> | 86.09 | 100.0 |
>> | 92.31 | 100.0 |
>> | 90.05 | 100.0 |
>> | 81.98 | 100.0 |
>> | 90.76 | 100.0 |
>> | 75.36 | 64.03 |
>>
>> I then read this into some Python code in the Org file and do something like
>>
>> df = pd.DataFrame(eff_tab[1:], columns=eff_tab[0])
>> cpu_data = df.loc[: , "cpu_eff"]
>> mem_data = df.loc[: , "mem_eff"]
>>
>> ...
>>
>> n, bins, patches = axis[0].hist(cpu_data, bins=range(0, 110, 5))
>> n, bins, patches = axis[1].hist(mem_data, bins=range(0, 110, 5))
>>
>> which generates nice histograms.
>>
>> I decided rewrite the whole thing as a stand-alone Python program so
>> that I can run it as a cron job. However, as a novice Python programmer
>> I am finding translating the bash part slightly clunky. I am in the
>> middle of doing this and started with the following:
>>
>> sacct = subprocess.Popen(["/usr/bin/sacct",
>> "-u", user,
>> "-S", period[0], "-E", period[1],
>> "-o", "jobid", "-X",
>> "-s", "COMPLETED", "-n"],
>> stdout=subprocess.PIPE,
>> )
>>
>> jobids = []
>>
>> for line in sacct.stdout:
>> jobid = str(line.strip(), 'UTF-8')
>> jobids.append(jobid)
>>
>> for jobid in jobids:
>> seff = subprocess.Popen(["/usr/bin/seff", jobid],
>> stdin=sacct.stdout,
>> stdout=subprocess.PIPE,
>> )
>
> The statement above looks odd. If seff can read the jobids from stdin
> there should be no need to pass them individually, like:
>
> sacct = ...
> seff = Popen(
> ["/usr/bin/seff"], stdin=sacct.stdout, stdout=subprocess.PIPE,
> universal_newlines=True
> )
> for line in seff.communicate()[0].splitlines():
> ...
Indeed, seff cannot read multiple jobids. That's why had 'xargs' in the
original bash code. Initially I thought of calling 'xargs' via
Popen, but this seemed very fiddly (I didn't manage to get it working)
and anyway seemed a bit weird to me as it is really just a loop, which I
can implement perfectly well in Python.
Cheers,
Loris
>> seff_output = []
>> for line in seff.stdout:
>> seff_output.append(str(line.strip(), "UTF-8"))
>>
>> ...
>>
>> but compared the to the bash pipeline, this all seems a bit laboured.
>>
>> Does any one have a better approach?
>>
>> Cheers,
>>
>> Loris
>>
>>
>>> -----Original Message-----
>>> From: Cameron Simpson <cs at cskk.id.au>
>>> Sent: Wednesday, March 24, 2021 6:34 PM
>>> To: Avi Gross <avigross at verizon.net>
>>> Cc: python-list at python.org
>>> Subject: Re: convert script awk in python
>>>
>>> On 24Mar2021 12:00, Avi Gross <avigross at verizon.net> wrote:
>>>> But I wonder how much languages like AWK are still used to make new
>>>> programs as compared to a time they were really useful.
>>>
>>> You mentioned in an adjacent post that you've not used AWK since 2000.
>>> By contrast, I still use it regularly.
>>>
>>> It's great for proof of concept at the command line or in small scripts, and
>>> as the innards of quite useful scripts. I've a trite "colsum" script which
>>> does nothing but generate and run a little awk programme to sum a column,
>>> and routinely type "blah .... | colsum 2" or the like to get a tally.
>>>
>>> I totally agree that once you're processing a lot of data from places or
>>> where a shell script is making long pipelines or many command invocations,
>>> if that's a performance issue it is time to recode.
>>>
>>> Cheers,
>>> Cameron Simpson <cs at cskk.id.au>
>>
>> Footnotes:
>> [1] https://orgmode.org/
>>
>
--
Dr. Loris Bennett (Hr./Mr.)
ZEDAT, Freie Universität Berlin Email loris.bennett at fu-berlin.de
More information about the Python-list
mailing list