[Tutor] How to match strange characters

Mon Sep 8 08:46:37 CEST 2008

I have a legacy program at work that outputs a text file with this header:

ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» 

º Radio Source Precession Program º
º by John B. Doe º
º 31 August 1992 º
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍŒ
Enter Date for Precession as (MM-DD-YYYY) or C/R for 05-28-2004 > 
05-28-2004
Enter the Catalog Name or C/R for CATALOG.SRC >
The Julian Date is = 2453153.5
0022+002 5.6564 +0.2713 00:22:37.54 00:16:16.65
0106+013 17.2117 +1.6052 01:08:50.80 01:36:18.58
.
.
.
much more regular integer data lines to the end of the section.

One section is created each time the program is run. Each section has 
one of these headers. Each section is appended to the end of the file 
every time the program is run. So that each new header follows the last 
data line in the previous section.

I am trying to write a python script to strip this header (the first 
five lines)(these headers) from the file. The name of this legacy 
program is PRECESS. Every time we run PRECESS, this header is repeated, 
not just at the top.

Here's my code so far:

(code)
import re

def main():
f = open('/home/jerry/sepoct08.txt', 'r') # sepoct08.txt is the PRECESS 
output file
for line in f:
if re.search('ÉÍÍÍ', line):
print line
elif re.search('> ..-..-....', line): # this line prints out
print line
elif re.search('Catalog', line): # this line prints out
print line
elif re.search('Julian', line): # this line prints out
print line
print "Hi there!" # I print out this just so I know my script is looping

f.close()

if __name__ == "__main__":
main()
(/code)

Here's the output from my code:

(output)
Hi there!
Hi there!
Hi there!
Hi there!
Hi there!
Enter Date for Precession as (MM-DD-YYYY) or C/R for 05-28-2004 > 05-28-2004

Hi there!
Enter the Catalog Name or C/R for CATALOG.SRC >

Hi there!
The Julian Date is = 2453153.5

Hi there!
Hi there!
.
.
.
. end of file
(/output)

As you can see, I can print out the three lines after the strange header 
lines, but not the strange character lines. How can I match on those 
strange characters? What are they?

I'm just trying to figure out how to print out each line from the header 
first, then later I will modify the code to process those lines as 
needed. My problem is those strange characters in the top part of the 
header. The re module doesn't recognize them. How can I match on them, 
so I can delete those lines? I can't do it by line number because they 
aren't recognized

The original PRECESS code cannot be modified. So, short of rewriting the 
PRECESS program, I thought it would be easy to modify the output as 
needed. I'm pretty sure PRECESS is written in C.

Sorry for the long post, I tried to only include the relevant 
information. Please fire away with questions and comments.

TIA,
Jerry