[Tutor] How to match strange characters
J. Van Brimmer
jerry.vb at gmail.com
Mon Sep 8 08:46:37 CEST 2008
I have a legacy program at work that outputs a text file with this header:
ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
º Radio Source Precession Program º
º by John B. Doe º
º 31 August 1992 º
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍŒ
Enter Date for Precession as (MM-DD-YYYY) or C/R for 05-28-2004 >
05-28-2004
Enter the Catalog Name or C/R for CATALOG.SRC >
The Julian Date is = 2453153.5
0022+002 5.6564 +0.2713 00:22:37.54 00:16:16.65
0106+013 17.2117 +1.6052 01:08:50.80 01:36:18.58
.
.
.
much more regular integer data lines to the end of the section.
One section is created each time the program is run. Each section has
one of these headers. Each section is appended to the end of the file
every time the program is run. So that each new header follows the last
data line in the previous section.
I am trying to write a python script to strip this header (the first
five lines)(these headers) from the file. The name of this legacy
program is PRECESS. Every time we run PRECESS, this header is repeated,
not just at the top.
Here's my code so far:
(code)
import re
def main():
f = open('/home/jerry/sepoct08.txt', 'r') # sepoct08.txt is the PRECESS
output file
for line in f:
if re.search('ÉÍÍÍ', line):
print line
elif re.search('> ..-..-....', line): # this line prints out
print line
elif re.search('Catalog', line): # this line prints out
print line
elif re.search('Julian', line): # this line prints out
print line
print "Hi there!" # I print out this just so I know my script is looping
f.close()
if __name__ == "__main__":
main()
(/code)
Here's the output from my code:
(output)
Hi there!
Hi there!
Hi there!
Hi there!
Hi there!
Enter Date for Precession as (MM-DD-YYYY) or C/R for 05-28-2004 > 05-28-2004
Hi there!
Enter the Catalog Name or C/R for CATALOG.SRC >
Hi there!
The Julian Date is = 2453153.5
Hi there!
Hi there!
.
.
.
. end of file
(/output)
As you can see, I can print out the three lines after the strange header
lines, but not the strange character lines. How can I match on those
strange characters? What are they?
I'm just trying to figure out how to print out each line from the header
first, then later I will modify the code to process those lines as
needed. My problem is those strange characters in the top part of the
header. The re module doesn't recognize them. How can I match on them,
so I can delete those lines? I can't do it by line number because they
aren't recognized
The original PRECESS code cannot be modified. So, short of rewriting the
PRECESS program, I thought it would be easy to modify the output as
needed. I'm pretty sure PRECESS is written in C.
Sorry for the long post, I tried to only include the relevant
information. Please fire away with questions and comments.
TIA,
Jerry
More information about the Tutor
mailing list