raw strings

Bengt Richter bokr at oz.net
Thu Oct 10 17:02:05 EDT 2002


On 10 Oct 2002 10:39:51 -0700, mis6 at pitt.edu (Michele Simionato) wrote:

>bokr at oz.net (Bengt Richter) wrote in message news:<ao08m6$dpv$0 at 216.39.172.122>...
>> >
>> No guarantees, but does this do what you want?
>> 
>>  >>> def raw_string(s):
>>  ...     return "r'%s'" % ''.join(
>>  ...         [(x,`x`[1:-1],'\\')[`x`[1:].startswith('\\')+(x=='\\')] for x in s]
>>  ...     )
>>  ...
>>  >>> s='\*hello\*\n'
>>  >>> print raw_string(s)
>>  r'\*hello\*\n'
>>  >>> r=r'\*hello\*\n'
>>  >>> print raw_string(r)
>>  r'\*hello\*\n'
>> 
>> Regards,
>> Bengt Richter
>
>Cool attempt, but doesn't work: for instance 'hel\\lo' is sent in
>r'hel\lo' and not in r'hel\\lo'. And there are other drawbacks (try
>for instance to apply the function to '\1'). The problems seems much more
>complicated than I expected.

Well you realize that

 >>> assert '\*hello\*\n' == r'\*hello\*\n'
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
 AssertionError

fails, I'm sure, so I went for a related problem, i.e., returning
an 'r' string representation for an 'r' string input. ( I didn't cover
all the bases there either, since there is more than one way to represent
many raw strings too, using different quotes. What I did just happened
to cover part of what you asked ;-).

There is no way to tell from inside raw_string whether to return
r'x' or r"x" etc., if passed the following:

 >>> r'x', r"x", r'''x''', r"""x"""
 ('x', 'x', 'x', 'x')

are all the same byte string, so how to choose a source representation?

 >>> assert eval(raw_string(r'\*hello\*\n'))==r'\*hello\*\n'

is ok, and

 >>> assert eval(raw_string(r'\1'))==r'\1'

is ok (though the eval is not seeing a r'\1')

But there is no way that raws_string can tell which of
various representations were evaluated to make the actual byte string
that got passed to it. E.g., typing in various source representations:

 >>> various = ['\1', '\x01','\001', '^A', chr(1)]
                                      ^^----(screen echo from typing sing character Ctrl-A) 
we can see that they all generated the same data (and you could vary the quote chars too):

 >>> various = ['\1', '\x01','\001', '^A', chr(1)]
 >>> for v in various: print '%s %s' % (`v`, raw_string(v))
 ...
 '\x01' r'\x01'
 '\x01' r'\x01'
 '\x01' r'\x01'
 '\x01' r'\x01'
 '\x01' r'\x01'

There's no way to reconstitute the different sources in various from

 >>> various
 ['\x01', '\x01', '\x01', '\x01', '\x01']

So what was it you really wanted? ;-)

I think you could possibly define rules for generating a unique
canonical raw-string representation (chosen from multiple legal
possibilities) that would evaluate to the same bytes
as the input string. But repr is pretty consistent... So I'm
curious what use you have in mind.

Regards,
Bengt Richter



More information about the Python-list mailing list