newbie raw text question
Ian Sparks
Ian.Sparks at
Tue Feb 4 09:44:26 EST 2003
Thanks for the reply Dennis. Your breakdown of the meaning of the RTF codes is pretty-much spot on. However, I'm still not "getting it". You say :
What escaped characters? The \ is a tag introducer (for lack of a
better word) and is part of the actual data. "\rtf1" is NOT <cr>tf1.
So here's a simple command-line test :
>>> print "\rtf1"
>>> print r"\rtf1"
Looks to me like \rtf1 *is* <cr>tf1 unless you define the string as a raw string and then it can contain the "\" character.
This is all very well for strings you define at the command line but what if a variable "x" contains "\rtf1" (NOT a raw string). Now how can you deal with it?
>>> print x
>>> print rx #attempt to turn x into a raw string for printing.
Traceback (most recent call last):
File "<interactive input>", line 1, in ?
NameError: name 'rx' is not defined
How can I print x as though it were a raw string? Like I said, its probably pretty obvious, I just don't "get it".
Ian Sparks fed this fish to the penguins on Monday 03 February 2003
12:11 pm:
> I'm confused about this one. I'm reading some RTF formatted data from
> a database. The resulting string is :
> {\rtf1\ansi\ansicpg1252\deff0\deftab720{\fonttbl{\f0\fswiss MS Sans
> {Serif;}{\f1\froman\fcharset2 Symbol;}{\f2\fswiss Arial;}{\f3\fswiss
> {Arial;}} \colortbl\red0\green0\blue0;}
> \deflang1033\pard\plain\f3\fs16 Some text
> }
> obviously this is chock-full of escaped characters. I need to strip
> the RTF codes and all my regular expressions are expecting raw strings
> but I don't see a way of converting an escaped string to a raw string
> to use in the regex.
What escaped characters? The \ is a tag introducer (for lack of a
better word) and is part of the actual data. "\rtf1" is NOT <cr>tf1.
What I see in your sample (and I've not studied RFT) is:
RTF version 1 (hypothetical this)
Codepage 1252
define font 0 (guessing) define tab 720 decipoints (1inch)(guessing,
might be centipoints/0.1inch)
font table
font 0 "swiss" font (san serif) is MS San Serif
font 1 "roman" font (serif) is character set 2 Symbol
font 2 "swiss" font is Arial
font 3 "swiss" font is Arial
color table
red 0
green 0
blue 0
define language 1033
plain (not bold or italic)
use font 3
font size 16
> There must be some way out of here...
