[Tutor] Tab delimited question
Martin A. Brown
martin at linux-ip.net
Mon Dec 13 21:43:52 CET 2010
: I'm searching line by line for certain tags and then printing the
: tag followed by the word immediately following the tag.
What you are describing is an awful lot like 'grep'. But, of
course, many different sorts of file searching resemble grep.
: So for example, suppose I had the following line of text in a file:
: "this is a key test123 noise noise noise noise noise"
: In this example, I would want to print "key test123" to a new
: file. The rest of the words I would not want.
: Here is my code so far:
: def test(infile, outfile):
: for line in infile:
: tagIndex = line.find("key")
: start = tagIndex + 4
: stop = line[start:].find("\t") -1
: if tagIndex != -1:
: print("start is: ", start)
: print("stop is: ", stop)
: print("spliced word is ", line[start: stop])
Your problem is that you are calculating the value for 'stop' from a
subset of the 'line string (and then subtracting 1), though you want
to be adding the value of 'start'. Replace your above line which
performs assignment on the stop variable with the following.
stop = line[start:].find("\t") + start
: My question is the following: What is wrong w/ the variable
: 'stop'? The index it gives me when I print out 'stop' is not even
: close to the right number. Furthermore, when I try to print out
: just the word following the tag w/ the form: line[start: stop],
: it prints nothing (it seems b/c my stop variable is incorrect).
Now, think about why this is happening....
You are calculating 'stop' based on a the substring of 'line'. You
use the 'start' offset to create a substring, in which you then
search for a tab. Then, you subtract 1 and try to use that to mean
something in the original string 'line. Finally, you are slicing
incorrectly (well, that's just the issue of subtracting 1 when you
shouldn't be), a not uncommon slicing problem (see this post for
more detail ).
Finally, I have to wonder why are you doing so much of the work
yourself, when ....
: I would greatly appreciate any help you have. This is a much
: simplified example from the script I'm actually writing, but I
: need to figure out a way to eliminate the noise after the key and
: the word immediately following it are found.
I realize that your question was not like the above, but in your
example, it seems that you don't know about the 'csv' module. It's
convenient, simple, easy to use and quite robust. This should help
you. I don't know much about your data format, nor why you are
searching, but let's assume that you are searching where you wish to
match 'key' as the contents of an entire field. If that's the case,
tsv = csv.reader(infile, delimiter='\t')
for row in tsv:
if sought in row:
outfile.write( '\t'.join( row ) + '\n' )
Now, how would you call this function?
if __name__ == '__main__':
test(sys.stdin, sys.stdout, sys.argv)
And, suppose you were at a command line, how would you call that?
python tabbed-reader.py < "$MYFILE" 'key'
OK, so the above function called 'test' is probably not quite what
you had wanted, but you should be able to adapt it pretty readily.
Martin A. Brown
More information about the Tutor