raw strings under windows

Bengt Richter bokr at oz.net
Sun Jun 15 18:18:46 EDT 2003


On Sun, 15 Jun 2003 08:12:35 GMT, Alex Martelli <aleax at aleax.it> wrote:

><posted & mailed>
>
>Cecil H. Whitley wrote:
>
>> Hi,
>> When doing the following:
>> 
>> #!/usr/bin/env python
>> 
>> path = r"c:\python23\"
>> 
>> I get a syntax error, unexpected EOL with singlequoted string.  It was my
>> (mis?) understanding that raw strings did not process escaped characters?
>
>They don't, in that the backslash remains in the string resulting from
>the raw literal, BUT so does the character right after the backslash,
That seems like a contradiction to me. I.e., the logic that says to
include "...the character right after the backslash, unconditionally." must
be noticing (processing) backslashes.

>unconditionally.  As a result, a raw string literal cannot end with an
>odd number of backslashes.  If they did otherwise, it would instead be
>impossible to include a single quote character in a single-quoted raw
So? Those cases would be 99.99% easy to get around with alternative quotes,
especially considering that """ and ''' are alternative quotes.

>string literal, etc.  Raw string literals are designed mainly to ease
>the task of entering regular expressions, and for that purpose an odd
>number of ending backslashes is never needed, while making inclusion of
>quote characters harder _would_ be an issue, so the design choice was
>easy to make.
ISTM only inclusion of same-as-initial quote characters at the end would
be a problem. Otherwise UIAM triple quotes take care of all but sequences with
embedded triple quotes, which are pretty unusual, and pretty easy to spell
alternatively (e.g. in tokenizer-concatenated pieces, with adjacent string
literals separated by optional whitespace).

Was the design choice made before triple quotes? Otherwise what is the use case
that would cause real difficulty? Of course, now there is a backwards-compatibility
constraint, so that r"""xxxx\"""" must mean r'xxxx\"' and not induce a syntax error.

>
>Of course people who use raw string literals to represent DOS paths might
>wish otherwise, but as has been pointed out it's not a big problem in
>any case -- not only, as you note:
>
>> Of course
>> path = "c:\\python23\\"
>> 
>> works just fine.

I wouldn't mind a raw-string format that really did treat backslashes
as ordinary characters. Perhaps upper case R could introduce that. E.g.,

   path = R"c:\python23\"
>
>but so, almost invariably, does 'c:/python23/' (Microsoft's C runtime
>libraries accept / interchangeably with \ as part of file path syntax,
>and Python relies on the C runtime libraries and so does likewise).
>
Another alternative would be a chosen-delimiter raw format, e.g.,

   path = d'|c:\python23\|

or

   path = d'$c:\python23\$

I.e., the first character after d' is the chosen delimiter.
Even matching-brackets delimiting could be possible

   d'[c:\python23\] == d'<c:\python23\> == d'{c:\python23\}

by recognizing [, <, or { delimiters specially. Space as a delimiter would be iffy practice.

BTW, I think I would have preferred a chosen-delimiter form with alternate raw and normal
designators over triple quoting. If upper case meant raw, D'' or D"" would have been equivalent
to most current uses of r' and r" and there would be more flexibility. The d' normal format
would treat \ as non-magic in front of the chosen delimiter, and otherwise recognizing standard
control character spellings as now.

While I'm at it (;-) and last but not least, to be able to paste arbitrary text into
a string literal, some other delimiting method is needed. A number exist for other text
contexts (e.g., perl sources, MIME format, etc.) so I guess it's a matter of whether
the itch is sufficient. Apparently not yet ;-)

BTW, has anyone heard of a terminating delimiter in the form of an escape-introduced hash
of the preceding text in hex? It would mean computing a running hash for such a string and
checking every time a hash-escape was encountered to see if the next characters were a
matching hex hash. In a lot of cases the "hash" could probably just be the byte count,
but in the extreme an md5 hash could be used. Twice if you want to get weird and hash
the first hash to check two successive hashes. That should let you delimit pretty much
any unmodified arbitrary text ;-)

enough ... ;-)

Regards,
Bengt Richter




More information about the Python-list mailing list