pattern block expression matching

MRAB python at mrabarnett.plus.com
Sat Jul 21 12:37:00 EDT 2018


On 2018-07-21 15:20, aldi.kraja at gmail.com wrote:
> Hi,
> I have a long text, which tells me which files from a database were downloaded and which ones failed. The pattern is as follows (at the end of this post). Wrote a tiny program, but still is raw. I want to find term "ERROR" and go 5 lines above and get the name with suffix XPT, in this first case DRXIFF_F.XPT, but it changes in other cases to some other name with suffix XPT. Thanks, Aldi
> 
> # reading errors from a file txt
> import re
> with open('nohup.out', 'r') as fh:
>    lines = fh.readlines()
>    for line in lines:
>        m1 = re.search("XPT", line)
>        m2 = re.search('ERROR', line)
>        if m1:
>          print(line)
>        if m2:
>          print(line)
> 
Firstly, you don't need regex for something has simple has checking for 
the presence of a string.

Secondly, I think it's 4 lines above, not 5.

'enumerate' comes in useful here:

with open('nohup.out', 'r') as fh:
     lines = fh.readlines()
     for i, line in enumerate(lines):
         if 'ERROR' in line:
             print(line)
             print(lines[i - 4])

> 
> --2018-07-14 21:26:45--  https://wwwn.cdc.gov/Nchs/Nhanes/2009-2010/DRXIFF_F.XPT
> Resolving wwwn.cdc.gov (wwwn.cdc.gov)... 198.246.102.39
> Connecting to wwwn.cdc.gov (wwwn.cdc.gov)|198.246.102.39|:443... connected.
> HTTP request sent, awaiting response... 404 Not Found
> 2018-07-14 21:26:46 ERROR 404: Not Found.
> 
> --2018-07-14 21:26:46--  https://wwwn.cdc.gov/Nchs/Nhanes/2009-2010/DRXTOT_F.XPT
> Resolving wwwn.cdc.gov (wwwn.cdc.gov)... 198.246.102.39
> Connecting to wwwn.cdc.gov (wwwn.cdc.gov)|198.246.102.39|:443... connected.
> HTTP request sent, awaiting response... 404 Not Found
> 2018-07-14 21:26:46 ERROR 404: Not Found.
> 
> --2018-07-14 21:26:46--  https://wwwn.cdc.gov/Nchs/Nhanes/2009-2010/DRXFMT_F.XPT
> Resolving wwwn.cdc.gov (wwwn.cdc.gov)... 198.246.102.39
> Connecting to wwwn.cdc.gov (wwwn.cdc.gov)|198.246.102.39|:443... connected.
> HTTP request sent, awaiting response... 404 Not Found
> 2018-07-14 21:26:46 ERROR 404: Not Found.
> 
> --2018-07-14 21:26:46--  https://wwwn.cdc.gov/Nchs/Nhanes/2009-2010/DSQ1_F.XPT
> Resolving wwwn.cdc.gov (wwwn.cdc.gov)... 198.246.102.39
> Connecting to wwwn.cdc.gov (wwwn.cdc.gov)|198.246.102.39|:443... connected.
> HTTP request sent, awaiting response... 404 Not Found
> 2018-07-14 21:26:47 ERROR 404: Not Found.
> 
> --2018-07-14 21:26:47--  https://wwwn.cdc.gov/Nchs/Nhanes/1999-2000/DSII.XPT
> Resolving wwwn.cdc.gov (wwwn.cdc.gov)... 198.246.102.39
> Connecting to wwwn.cdc.gov (wwwn.cdc.gov)|198.246.102.39|:443... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 56060880 (53M) [application/octet-stream]
> Saving to: ‘DSII.XPT’
> 



More information about the Python-list mailing list