Stripping non-numbers from a file parse without nested lists?

daku9999 at gmail.com daku9999 at gmail.com
Wed Apr 1 16:20:14 EDT 2009


On Apr 1, 8:10 am, jay logan <dear.jay.lo... at gmail.com> wrote:
> On Apr 1, 11:05 am, jay logan <dear.jay.lo... at gmail.com> wrote:
>
>
>
> > On Apr 1, 2:35 am, daku9... at gmail.com wrote:
>
> > > On Mar 31, 6:47 pm, "Rhodri James" <rho... at wildebst.demon.co.uk>
> > > wrote:
>
> > > > What you're doing (pace error checking) seems fine for the data
> > > > structures that you're using.  I'm not entirely clear what your usage
> > > > pattern for "dip" and "dir" is once you've got them, so I can't say
> > > > whether there's a more appropriate shape for them.  I am a bit curious
> > > > though as to why a nested list is non-ideal?
>
> > > > ...
> > > >      if "/" in word and "dip" not in word:
> > > >         dip_n_dir.append(word.split("/", 1))
>
> > > > is marginally shorter, and has the virtue of making it harder to use
> > > > unrelated dip and dir values together.
>
> > > > --
> > > > Rhodri James *-* Wildebeeste Herder to the Masses
>
> > > Rhodri,
>
> > > Thanks.  That works better than what I had before and I learned a new
> > > method of parsing what I was looking for.
>
> > > Now I'm on to jumping a set number of lines from a given positive
> > > search match:
>
> > > ...(lines of garbage)...
> > > 5656      (or some other value I want, but don't explicitly know)
> > > ...(18 lines of garbage)...
> > > search object
> > > ...(lines of garbage)...
>
> > > I've tried:
>
> > > def read_poles(filename):
> > >   index = 0
> > >   fh = None
> > >   try:
> > >       fh = open(filename, "r")
> > >       lines=fh.readlines()
> > >       while True:
>
> > >           if "search object" in lines[index]
> > >               poles = int(lines[index-18])
> > >               print(poles)
>
> > >           index +=1
>
> > >   except(IndexError): pass
>
> > >   finally:
> > >       if fh is not None: # close file
> > >           fh.close()
>
> > > ------------------
>
> > > Which half works.  If it's not found, IndexError is caught and passed
> > > (avoids quitting on lines[index out of range].  The print(poles)
> > > properly displays the value I am looking for (_always_ 18 lines before
> > > the search object).
>
> > > However, since it is assigned using the index variable, the value of
> > > poles doesn't keep (poles is always zero when referenced outside of
> > > the read_poles function).  I'm assuming because I'm pointing to a
> > > certain position of an object and once index moves on, it no longer
> > > points to anything valid.  My python book suggested using
> > > copy.deepcopy, but that didn't get around the fact I am calling it on
> > > (index-18).
>
> > > Any experience jumping back (or forward) a set number of lines once a
> > > search object is found?  This is the only way I can think of doing it
> > > and it clearly has some problems.
>
> > > Reading the file line by line using for line in blah works for finding
> > > the search object, but I can't see a way of going back the 18 lines to
> > > grabbing what I need.
>
> > > Thanks for the help!  I'm slowly getting this mangled mess of a file
> > > into something automated (hand investigating the several thousand
> > > files I need to do would be unpleasant).
>
> > # You could try using a deque holding 18 lines and search using that
> > deque
> > # This is untested, but here's a try (>=Python 3.0)
> > from collections import deque
> > import itertools as it
> > import sys
>
> > def read_poles(filename):
> >     with open(filename) as f:
> >         line_iter = iter(f)
> >         d = deque(it.islice(line_iter,17), maxlen=18)
>
> >         for line in line_iter:
> >             d.append(line)
>
> >             if 'search object' in line:
> >                 poles = int(d[0])
> >                 print(poles)
> >                 return poles
> >         else:
> >             print('No poles found in', filename, file=sys.err)
>
> Notice that I returned the "pole" from the function so you could catch
> the return value as follows:
> pole = read_poles(filename)
>
> if pole is None:
>     # no poles found
> else:
>     print('Function returned this pole:', pole)
>
> If you need a list of poles, then return a list:
>
> def read_poles(filename):
>     all_poles = []
>     with open(filename) as f:
>         line_iter = iter(f)
>         d = deque(it.islice(line_iter,17), maxlen=18)
>
>         for line in line_iter:
>             d.append(line)
>
>             if 'search object' in line:
>                 all_poles.append(int(d[0]))
>     return all_poles
>
> ...
> poles = read_poles(filename)
>
> if poles:
>     print('Here are the poles:\n', '\n'.join(map(str,poles)))
> else:
>     print('There were no poles found in', filename)


I think I found an easier (if possibly uglier way) of doing it:

for filenames in files.split():
        try:
            fh = open(filenames.replace("/","\\"),"r")
            lines=fh.readlines()
        except(IOError) as err:
            print(filename, err)
        finally:
            if fh is not None:
                fh.close()
                print(read_poles4(lines))

... which opens my file (always < 10 megs) into the list lines

def read_poles4(lines):
    try:
        poles = lines[(lines.index("Poles Plotted\n") - 18)].rstrip()
        return poles
    except ValueError as err:
        return err

...

Seems like the simpler solution, at least for small files where I can
hold the entire thing in memory.

Thanks!



More information about the Python-list mailing list