newbie raw text question

Ian Sparks Ian.Sparks at etrials.com
Tue Feb 4 15:21:32 EST 2003


Thank you Terry for taking the time to give a full explanation! 


-----Original Message-----
From: Terry Reedy [mailto:tjreedy at udel.edu]
Sent: Tuesday, February 04, 2003 3:09 PM
To: python-list at python.org
Subject: Re: newbie raw text question


[post and cc]
"Ian Sparks" <Ian.Sparks at etrials.com> wrote in message
news:mailman.1044369911.27254.python-list at python.org...
>Thanks for the reply Dennis. Your breakdown of the meaning of the RTF
codes is >pretty-much spot on. However, I'm still not "getting it".

What you are not quite getting fully (and others have had the same
problem) is the difference between a string literal in your code and a
string object in your execution space.  String literals are used to
give a sequence-of-bytes value to string objects, but they are not the
objects themselves.  'Rawness' is only a property of code literals,
but not of strings themselves, nor of non-code input, nor of output
text.  And it only consists in the presence of the 'r' prefix.

In a sense, even 'raw literal' is a slight misnomer.  All string
literals are 'raw' as written.  The 'r' prefix is a "leave raw, do not
cook" directive to the interpreter.  So a 'raw' literal is a literal
that is left raw when fed to the interpreter and the expressions it
appears in.

The convention could be the opposite -- that string literals be left
as they are (raw) unless tagged with a 'cook' directive.  It is not, I
presume, because of the fairly frequent use, in some situations, of
'\n' and possibly '\t'.  The current situation is analogous to the
following: if I say "Feed X an egg" and X is human, I probably mean
"Feed X an (initially raw) egg that is cooked in the 'standard'
manner".  I would have to say "Feed X a raw egg" to disable the usual
processing.

As for output: let s be a string.  Then

'file.write(s)' writes the bytes of s to the device that 'file'
represents exactly as they are, with the possible exception of
'text-mode' expansion of '\n' (but that is separate issue).

'str(s)', which 'print s' uses, produces a 'friendly' graphical
representation

'repr(s)' (== `s`), which the interpreter uses to echo expressions in
interactive mode, produces an exact graphical representation that
'eval()'s back to s.  IE,
eval(repr(s)) == s.  repr() does not use 'r' prefixes (which are a
somewhat recent addition to the language.  However, it could, and, if
it were not for the problem of breaking code that depends on the exact
current behavior, I might even think it should.

Pending a change, you could write your own function:  here is a start
(untested)

def r_rep(s): # modify repr() to use 'r' prefix when possible and
useful
  rep = repr(s)
  return r_able(rep) and 'r'+rep.replace(r'\\', '\\') or rep

where r_able(rep), left as an exercise for you, checks that rep
a) contains backslashes (so that 'r' prefixing is useful)
b) only has even-numbered backslash sequences (so all can be
undoubled)
c) ends with a multiple-of-four (possibly 0) backslash sequence (so
there will still be an even number after halving)

Terry J. Reedy


-- 
http://mail.python.org/mailman/listinfo/python-list





More information about the Python-list mailing list