Looking for Form Feeds
sjmachin at lexicon.net
Tue Jan 25 01:37:39 CET 2005
Greg Lindstrom wrote:
> I have a file generated by an HP-9000 running Unix containing form
> signified by ^M^L. I am trying to scan for the linefeed to signal
> certain processing to be performed but can not get the regex to "see"
> it. Suppose I read my input line into a variable named "input"
> The following does not seem to work...
> input = input_file.readline()
You are shadowing a builtin.
> if re.match('\f', input): print 'Found a formfeed!'
> else: print 'No linefeed!'
formfeed == not not linefeed????
> I also tried to create a ^M^L (typed in as <ctrl>Q M <ctrlQ> L) but
> gives me a syntax error when I try to run the program (re does not
> the control characters, I guess). Is it possible for me to pull out
> formfeeds in a straightforward manner?
For a start, resolve your confusion between formfeed and linefeed.
Formfeed makes your printer skip to the top of a new page (form),
without changing the column position. FF, '\f', ctrl-L, 0x0C.
Linefeed makes the printer skip to a new line, without changing the
column position. LF, '\n', ctrl-J, 0x0D.
There is also carriage return, which makes your typewriter return to
column 1, without moving to the next line. CR, '\r', ctrl-M, 0x0A.
Now you can probably guess why the writer of your report file is
emitting "\r\f". What we can't guess for you is where in your file
these "\r\f" occurrences are in relation to the newlines (i.e. '\n')
which Python is interpreting as line breaks. As others have pointed
out, (1) re.match works on the start of the string and (2) you probably
don't need to use re anyway. The solution may be as simple as: if
input_line[:2] == "\r\f":
BTW, have you checked that there are no other control characters
embedded in the file, e.g. ESC (introducing an escape sequence), SI/SO
(change character set), BEL * 100 (Hey, Fred, the printout's finished),
HT, VT, BS (yeah, probably lots of that, but I mean BackSpace)?
More information about the Python-list