[Tutor] How to match strange characters

Kent Johnson kent37 at tds.net
Mon Sep 8 12:57:13 CEST 2008


On Mon, Sep 8, 2008 at 2:46 AM, J. Van Brimmer <jerry.vb at gmail.com> wrote:
> I have a legacy program at work that outputs a text file with this header:
>
> ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
> º Radio Source Precession Program º
> º by John B. Doe º
> º 31 August 1992 º
> ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍŒ
> Enter Date for Precession as (MM-DD-YYYY) or C/R for 05-28-2004 > 05-28-2004
> Enter the Catalog Name or C/R for CATALOG.SRC >
> The Julian Date is = 2453153.5
> 0022+002 5.6564 +0.2713 00:22:37.54 00:16:16.65
> 0106+013 17.2117 +1.6052 01:08:50.80 01:36:18.58
> .
> I am trying to write a python script to strip this header (the first five
> lines)(these headers) from the file.

> As you can see, I can print out the three lines after the strange header
> lines, but not the strange character lines. How can I match on those strange
> characters? What are they?

The strange characters seem to be box drawing characters from DOS
codepage 437. See
http://www.microsoft.com/globaldev/reference/oem/437.htm

My guess is that the characters in your program are not actually the
same as the characters in the file because they use different
encodings. Try using the hex values for the characters:
if re.search('\xc9\xcd\xcd\xcd', line):

Kent


More information about the Tutor mailing list