[Tutor] Read from large text file, parse, find string, print string + line number to second text file.

Dave Angel davea at davea.name
Fri Feb 1 21:36:43 CET 2013


On 02/01/2013 03:09 PM, Scurvy Scott wrote:
> Hey all how're things?
>
> I'm hoping for some guidance on a problem I'm trying to work through.
> I know this has been previously covered on this list but I'm hoping it
> won't bother you guys to run through it again.
>
> My basic program I'm attempting to create is like this..
>
> I want to read from a large, very large file.
> I want to find a certain string
> if it finds the string I would like to select the first 15-20
> characters pre and proceeding the string and then output that new
> string to a new file along with the line the string was located on
> within the file.

Why not just use grep ?

>
> It seems fairly straight forward but I'm wondering if y'all can point
> me to a direction that would help me accomplish this..
>
> Firstly I know I can read a file and search for the string with (a
> portion of this code was found on stackoverflow and is not mine and
> some of it is my own)
>

First, you probably want to do something to quit when you get your first 
match.  If you do want to continue finding matches, then you'd have to 
change the location of that open() on the newfile.  Currently, it'll 
throw out any earlier contents, and just write the match.

The linenum is easy, using enumerate.

> with open('largeFile', 'r') as inF:
>      for line in inF:

        for linenum, line in enumerate(inF):


>          myString = "The String"

This should be moved to a location before the loop;  it's a waste 
reassigning it every time through the loop.

>          if 'myString' in line:
>              f = open(thenewfile', 'w')
>              f.write(myString)
>              f.close()

                break     #quit upon first match

>
> I guess what I'm looking for then is tips on A)My stated goal of also
> writing the 15-20 characters before and after myString to the new file
> and
> B)finding the line number and writing that to the file as well.
>
> Any information you can give me or pointers would be awesome, thanks in advance.
>
> I'm on Ubuntu 12.10 running LXDE and working with Python 2.7
>

About giving the 15 characters before and after the match:

Is it sufficient to truncate that spec at the line boundaries?  What I 
mean is that if the match occurs at column 10, do you really need the 
last 5 characters of the previous line?   Likewise, if it occurs near 
the end of the line, do you need some from the next line(s) ?


If you never need to show more than the current line, then you can parse 
the line (write a separate function).  If you have to go 15 characters 
earlier in the file, then consider using file.seek

http://docs.python.org/2/library/stdtypes.html?highlight=seek#file.seek

The catch to that is that it messes up the position in the file, so if 
you do want multiple matches, you'll need to use file.tell to save and 
restore the location to continue reading lines.

Lots of other options, but it all depends on what you REALLY want.



-- 
DaveA


More information about the Tutor mailing list