Text Processing

Tue Dec 20 15:03:21 EST 2011

Tue, 20 Dec 2011 11:17:15 -0800 (PST)
Yigit Turgut a écrit:

> Hi all,
> 
> I have a text file containing such data ;
> 
>         A                B                C
> -------------------------------------------------------
> -2.0100e-01    8.000e-02    8.000e-05
> -2.0000e-01    0.000e+00   4.800e-04
> -1.9900e-01    4.000e-02    1.600e-04
> 
> But I only need Section B, and I need to change the notation to ;
> 
> 8.000e-02 = 0.08
> 0.000e+00 = 0.00
> 4.000e-02 = 0.04
> 
> Text file is approximately 10MB in size. I looked around to see if
> there is a quick and dirty workaround but there are lots of modules,
> lots of options.. I am confused.
> 
> Which module is most suitable for this task ?

You could try to do it yourself.

You'd need to know what seperates the datas. Tabulation character ? Spaces ?

Exemple :

Input file
----------

        A                B                C
-------------------------------------------------------
-2.0100e-01    8.000e-02    8.000e-05
-2.0000e-01    0.000e+00    4.800e-04
-1.9900e-01    4.000e-02    1.600e-04

Python code
-----------

# Open file
with open('test1.plt','r') as f:

    b_values = []

    # skip as many lines as needed
    line = f.readline()
    line = f.readline()
    line = f.readline()

    while line:
        #start = line.find(u"\u0009", 0) + 1   #seek Tab
        start = line.find("   ", 0) + 4        #seek 4 spaces
        #end = line.find(u"\u0009", start)
        end = line.find("   ", start)
        b_values.append(float(line[start:end].strip()))
        line = f.readline()

    print b_values

It gets trickier if the amount of spaces is not constant. I would then try
with regular expressions. Perhaps would regexp be more efficient in any case.

-- 
Jérôme