extract text from log file using re

Peter Otten __peter__ at web.de
Fri Sep 14 11:40:24 CEST 2007


Fabian Braennstroem wrote:

> I would like to delete a region on a log file which has this
> kind of structure:
> 
> 
> #------flutest------------------------------------------------------------
>    498 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
> 8.3956e-04 3.8560e-03 4.8384e-02 11:40:01  499
>    499 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
> 8.3956e-04 3.8560e-03 4.8384e-02 11:40:01  499
> reversed flow in 1 faces on pressure-outlet 35.
> 
> Writing
> "/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.cas"...
>  5429199 mixed cells, zone 29, binary.
> 11187656 mixed interior faces, zone 30, binary.
>    20004 triangular wall faces, zone 31, binary.
>     1104 mixed velocity-inlet faces, zone 32, binary.
>   133638 triangular wall faces, zone 33, binary.
>    14529 triangular wall faces, zone 34, binary.
>     1350 mixed pressure-outlet faces, zone 35, binary.
>    11714 mixed wall faces, zone 36, binary.
>  1232141 nodes, binary.
>  1232141 node flags, binary.
> Done.
> 
> 
> Writing
> "/home/gcae504/SCR1/Solververgleich/Klimakruemmer_AK/CAD/Daimler/fluent-0500.dat"...
> Done.
> 
>    500 1.0049e-03 2.4630e-04 9.8395e-05 1.4865e-04
> 8.3913e-04 3.8545e-03 1.3315e-01 11:14:10  500
> 
>  reversed flow in 2 faces on pressure-outlet 35.
>    501 1.0086e-03 2.4608e-04 9.8589e-05 1.4908e-04
> 8.3956e-04 3.8560e-03 4.8384e-02 11:40:01  499
> 
> #------------------------------------------------------------------
> 
> I have a small script, which removes lines starting with
> '(re)versed', '(i)teration' and '(t)urbulent'  and put the
> rest into an array:
> 
> # -- plot residuals ----------------------------------------
>       import re
> filename="flutest"
> reversed_flow=re.compile('^\ re')
> turbulent_viscosity_ratio=re.compile('^\ tu')
> iteration=re.compile('^\ \ i')
> 
> begin_of_res=re.compile('>\ \ \ i')
> end_of_res=re.compile('^\ ad')

The following regular expressions have some extra backslashes 
which change their meaning:

> begin_of_writing=re.compile('^\Writing')
> end_of_writing=re.compile('^\Done')

But I don't think you need regular expressions at all. 
Also, it's better to iterate over the file just once because 
you don't need to remember the position of regions to be skipped. 
Here's a simplified demo:

def skip_region(items, start, end):
    items = iter(items)
    while 1:
        for line in items:
            if start(line):
                break
            yield line
        else:
            break
        for line in items:
            if end(line):
                break
        else:
            break

def begin(line): 
    return line.strip() == "Writing"

def end(line): 
    return line.strip() == "Done."

# --- begin demo setup (remove to test with real data) ---
def open(filename):
    from StringIO import StringIO
    return StringIO("""\
iteration # to be ignored
alpha
beta
    reversed # to be ignored
Writing
to
be
ignored
Done.
gamma
delta

""")
# --- end demo setup ---

if __name__ == "__main__":
    filename = "fluetest"
    for line in skip_region(open(filename), begin, end):
        line = line.strip()
        if line and not line.startswith(("reversed", "iteration")):
            print line

skip_region() takes a file (or any iterable) and two functions
that test for the begin/end of the region to be skipped.
You can nest skip_region() calls if you have regions with different 
start/end conditions.

Peter



More information about the Python-list mailing list