[Tutor] text processing, reformatting
Steven D'Aprano
steve at pearwood.info
Wed Dec 12 16:37:55 CET 2012
On 13/12/12 01:34, lconrad at go2france.com wrote:
>
> From an much larger, messy report file, I extracted these lines:
>
>
> OPM010 HUNT INGR FRI 16/11/12 00:00:00 QRTR
> HTGP PEG OVFL
> 0012 00000 00000
> 0022 00000 00000
> 0089 00000 00000
> 0379 00000 00000
> OPM010 HUNT INGR FRI 16/11/12 00:15:00 QRTR
> HTGP PEG OVFL
> 0012 00000 00000
> 0022 00000 00000
[snip]
> the engineer needs that reformatted into the "log line" the original
>machine should have written anyway, for importing into Excel:
>
> yyyy-mm-dd;hh:mm:ss;<htgp>;<peg>;<ovfl>;
>
> With Bourne shell, I could eventually whack this out as I usually do,
>but as a Python pupil, I'd like to see how, learn from you aces and
>python could do it. :)
Well, it's not entirely clear what the relationship between the "before"
and "after" text should be. It's always useful to give an actual example
so as to avoid misunderstandings.
In the absence of a detailed specification, I will just have to guess,
and then you can complain when I guess wrongly :-)
My guess is that the above report lines should be reformatted into:
2012-11-16;00:00:00;0012;00000;00000;
2012-11-16;00:00:00;0022;00000;00000;
2012-11-16;00:00:00;0089;00000;00000;
2012-11-16;00:00:00;0379;00000;00000;
2012-11-16;00:15:00;0012;00000;00000;
2012-11-16;00:15:00;0022;00000;00000;
Here is a quick and dirty version, with little in the way of error
checking or sophistication, suitable for a throw-away script:
# === cut ===
date, time = "unknown", "unknown"
for line in open("input.txt"):
line = line.strip() # remove whitespace
if not line:
# skip blanks
continue
if line.startswith("OPM010") and line.endswith("QRTR"):
# extract the date from line similar to
# OPM010 HUNT INGR FRI 16/11/12 00:15:00 QRTR
date, time = line.split()[4:5]
# convert date from DD/MM/YY to YYYY-MM-DD
dd, mm, yy = date.split("/")
date = "20%s-%s-%s" % (yy, mm, dd)
elif line == "HTGP PEG OVFL":
continue
else:
# output log lines
htgp, peg, ovfl = line.split()
print(";".join([date, time, htgp, peg, ovfl]))
# === cut ===
--
Steven
More information about the Tutor
mailing list