[Tutor] How to match strange characters
Kent Johnson
kent37 at tds.net
Mon Sep 8 12:57:13 CEST 2008
On Mon, Sep 8, 2008 at 2:46 AM, J. Van Brimmer <jerry.vb at gmail.com> wrote:
> I have a legacy program at work that outputs a text file with this header:
>
> ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
> º Radio Source Precession Program º
> º by John B. Doe º
> º 31 August 1992 º
> ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍŒ
> Enter Date for Precession as (MM-DD-YYYY) or C/R for 05-28-2004 > 05-28-2004
> Enter the Catalog Name or C/R for CATALOG.SRC >
> The Julian Date is = 2453153.5
> 0022+002 5.6564 +0.2713 00:22:37.54 00:16:16.65
> 0106+013 17.2117 +1.6052 01:08:50.80 01:36:18.58
> .
> I am trying to write a python script to strip this header (the first five
> lines)(these headers) from the file.
> As you can see, I can print out the three lines after the strange header
> lines, but not the strange character lines. How can I match on those strange
> characters? What are they?
The strange characters seem to be box drawing characters from DOS
codepage 437. See
http://www.microsoft.com/globaldev/reference/oem/437.htm
My guess is that the characters in your program are not actually the
same as the characters in the file because they use different
encodings. Try using the hex values for the characters:
if re.search('\xc9\xcd\xcd\xcd', line):
Kent
More information about the Tutor
mailing list