backslash plague
Bengt Richter
bokr at oz.net
Sat Oct 23 18:33:05 EDT 2004
On Fri, 22 Oct 2004 21:20:30 +0200, aleaxit at yahoo.com (Alex Martelli) wrote:
>Luis P. Mendes <luisXX_lupe2XX at netvisaoXX.pt> wrote:
> ...
>> I've already read many pages on this but I'm not able to separate the
>> string 'R0\1.2646\1.2649\D' in four elements, using the \ as the separator.
>
>x = r'R0\1.2646\1.2649\D'
>elements = x.split('\\')
>
>> and why must I write two '' after the \? If I hadn't used r I would
>> understand...
>
>A raw literal can't end with an odd number of backslashes (_some_ way
>has to be there to escape the quote char, after all).
>
Hm, just had the thought that something analogous to HDLC bit-stuffing
could be used. IIRC bitstreams had escape flags composed of 5 successive bits,
and if you wanted to transmit 5 successive data bits, you just added an extra bit
at the end to make 6 to show that the five did not comprise a flag. The extra bits
would get dropped on decoding when a 6th 1 followed 11111 and would be recognized
as a flag otherwise.
Translating this to quoted character sequences, we could have an alternate triple
quoted raw string format, with quote-stuffing instead of escapes. I.e., to quote
three successive quote characters, we stuff a 4th quote, which the tokenizer drops
as it creates the internal byte sequence string representation, so we don't need
escapes in the usual sense.
Thus (using f prefix to indicate flagged quote-stuffing syntax) you could write:
x = f'''c:\whatever\'''
and to quote the line above (without taking advantage of alternate quotes):
q = f''' x = f''''c:\whatever\'''''''
^^^ ^^^| ^^^|^^^
where ^^^ is flag and | indicates a stuffed quote that
makes the previous otherwise-flag into three quotes in the data.
You could quote again (using same type quote for illustrative purposes
again, since oviously you could do better using both ' and "):
r = f'''f'''' x = f'''''c:\whatever\'''''''''''
^^^| ^^^| ^^^|^^^|^^^
(I think ;-)
I guess the worst-case data to quote would be a repeating pattern of
'''""" or """''' since neither type of quote character would give an
advantage, but 1-in-6 overhead is still not too bad, and it would be rare.
Is there a hole in this raw string quoting syntax?
Regards,
Bengt Richter
More information about the Python-list
mailing list