Raw String Output

Bengt Richter bokr at oz.net
Thu Jun 20 22:19:17 EDT 2002


On Thu, 20 Jun 2002 19:09:41 -0400, "mike" <gringro1 at umbc.edu> wrote:

>If I have a string variable with the value '\n' how do I output it to a file
>with the value of '\012' instead ?
>
First, let me get straight what you mean vs what I think you might mean ;-)
If you want to write a single line feed character with code value 012 (10 decimal),
don't worry about it, just write the string (in binary on windows, or it will become
\r\n in the file). Note that:

 >>> ord('\n'), ord('\012'), ord('\x0a')
 (10, 10, 10)

and

 >>> 'one\ntwo\012three\x0a'
 'one\ntwo\nthree\n'
 >>> print 'one\ntwo\012three\x0a'
 one
 two
 three

To see the string in terms of the character sequence a file write would see, convert
to a list:

 >>> list('one\ntwo\012three\x0a')
 ['o', 'n', 'e', '\n', 't', 'w', 'o', '\n', 't', 'h', 'r', 'e', 'e', '\n']

I.e., the actual internal \n, \012,and \x0a codes are the same single characters.
OTOH, if you need to pass custom escaped strings to some other platform or software
for evaluation there, that would be different.

If you type
    s1 = '\n'
You will get an internal string object with length one bound to the 's1' on the left.

If you type
    s2 = '\012'
or
    s3 = '\x0a'
You will get string objects with the identical value (even sharing the identical immutable single-char
string object in current Python, I believe), but bound to 's2'. The numeric character code will be 10
in decimal for all three, which you can verify with ord():
 >>> s1='\n'; s2='\012'; s3='\x0a'

When you inspect s1 or s2 by just typing the names interactively, the interactive loop will
use the __repr__ methods of the respective strings and print the charaters returned.
 >>> s1,s2,s3
 ('\n', '\n', '\n')
It's the same character, so convential repr shows it as '\n'

 >>> ord(s1),ord(s2),ord(s3)
 (10, 10, 10)

"'\n'" is a string literal, which if you evaluate it will give you a length-1
string whose single character will have a numerical code with value 10 decimal.

If you write (in binary mode) any of the strings s1, s2 or s3 to a file, you will only write
one character, and it will be the same character, with the ascii value 10.

If for some reason you want to write a _representation_ of the internal encoded string, you
have choices. The conventional representation is produced by repr(s) or s.__repr__, or `s`,
and line feed will be represented as '\n' on the screen, which comes from two characters:
'\\' and 'n'. To create an alternate _representation_ for the same internal character, using
four characters (i.e., your '\\' '0','1','2') instead of two, you could produce the conventional
representation and substitute '\\012' for '\\n'. ('\\012' and '\\n' can be written r'\012' and r'\n').

Here is an example of text with two line feeds:
 >>> s="""\
 ... line 1
 ... line 2
 ... """

Triple quotes keeps the line feeds, and the convential string
representation printed shows the \n characters escaped:
 >>> s
 'line 1\nline 2\n'

Backquotes are short for calling for a repr() or __repr__ representation string,
which will have the actual backslash escape characters as such paired with the plain
characters they are escaping. When shown on the screen, _that_ string with its back-slashes
is repr'd and printed, so the backslashes are shown doubled:
 >>> sr1 = `s`
 >>> sr1
 "'line 1\\nline 2\\n'"

If you want to change the _representation_ r'\n' of chr(10) to _representation_ r'\012',
you can split out the former and join in the latter:
 >>> sr1.split(r'\n')
 ["'line 1", 'line 2', "'"]
 >>> sr2 = r'\012'.join(sr1.split(r'\n'))
 >>> sr2
 "'line 1\\012line 2\\012'"

Now compare the conventional representation sr1 and your sr2:

 >>> print sr1
 'line 1\nline 2\n'
 >>> print sr2
 'line 1\012line 2\012'

Note the single quotes printed. They are part of the repr string (though they could have been
double quotes if the text included single quotes, so don't depend on the quoting character's being
"'" or '"'. You might conceivably want to strip the outer quotes off, depending on the destination
of your custom string representation.

And note that if you evaluate the representation strings, you get the original
identical string back:
 >>> eval(sr1)
 'line 1\nline 2\n'
 >>> eval(sr2)
 'line 1\nline 2\n'

as the original
 >>> s
 'line 1\nline 2\n'

And naturally if you print them, they print the same:
 >>> print eval(sr1)
 line 1
 line 2

 >>> print eval(sr2)
 line 1
 line 2

Now the question is, do you want to write a string to a file
where it takes one character to represent EOL (or two if windows
cooks it in text mode), or do you actually want to write a representation
string that has to be eval'd to retrieve the original internal string?
And if the latter, why would you want r'\012' instead of r'\n' ?

Again, the issue of abstractions vs representations that play different
and related roles is the key.

>Thank you in advance!
>
HTH

Regards,
Bengt Richter



More information about the Python-list mailing list