RFC PEP candidate: q'<delim>'quoted<delim> ?

Bengt Richter bokr at oz.net
Sun Mar 3 04:50:58 EST 2002


Problem: How to put quotes around an arbitrary program text?

Obviously a program may contain quoted material using all the
defined string quoting methods (and this new method as well),
so the problem is defining delimiters that won't occur in the text.

I propose using a variation of the MIME multipart delimiter idea:

The leading delimiter for this new quoted string will be written
just like a raw string, except using 'q' in place of 'r'.
The interpretation of '\' escapes within q'xxx' will be identical to r'xxx'

Immediately following the trailing quote(s) of the q'xxx' (or q'''xxx''') string,
the quoted content starts. '\' characters will be scanned as just
another literal character, and have no escape effect in this content.
The content continues until the defined raw delimiter q-string occurs,
i.e., the xxx part of q'xxx' or q"""xxx""" etc.

Thus the whole value of the q'delim'...delim string representation is exactly
just the characters between the delimiters. E.g., you could write

    assert q'<-=delim=->'content here<-=delim=-> == 'content here' # this would be true

without getting an error.

Note the lack of quotes around the final delimiter string, since it itself is
the final delimiter. This can also be used to solve the final unescaped
backslash problem for quoting windows paths:

    q'|'c:\foo\bar\|

Also note nestability, assuming you guarantee unique delimiter strings
(which you could do by generating a guid if paranoid, or if you had access
to the whole string before outputting the delimiter, you could e.g. use an MD5
hash of the content in hex as delimiter):

    q'
C7593104AF1DE7534F169D5EF1579BF6
'q'|'c:\foo\bar\|
C7593104AF1DE7534F169D5EF1579BF6

(Note two EOLs in the delimiter besides the MD5,
so the quoted content itself has no EOL in it here)

To make this line oriented quoting a little cleaner
(i.e., no funny ' at the beginning of the first quoted line),
one could use Q'xxx'z123 to mean  q'xxxz'123
I.e., include the first raw character in the otherwise
super-quoted content as part of the delimiter, and start
the content with the next character. Below is assumed a single
character EOL.

Thus you can include an EOL at both ends of the delimiter, and

    assert Q'
C7593104AF1DE7534F169D5EF1579BF6'
q'|'c:\foo\bar\|
C7593104AF1DE7534F169D5EF1579BF6
==  q'~'q'|'c:\foo\bar\|~

would not fail (we can write the last, since we know the content).
of course knowing the content here, you could also wrap with

    r"q'|'c:\foo\bar\|"

A null q delimiter could be defined to imply delimiting by the end of the file or
other representation container. I.e., q''<-- content up to EOF -->

Escapes are recognized according to raw string rules inside the quotes of the
q delimiter string, so the delimiter itself does have a final backslash problem,
but that shouldn't be too hard to live with, since there's no such problem for the
quoted payload.

A q string could theoretically allow putting unescaped arbitrary binary data in
a source file, though many editors would have problems dealing with it. Even so,
there might be some use for that. E.g., a binary .gif file could easily be
converted to an importable file binding a symbol to the gif data as a string.
Just prefix symbol = q'' as in

    symbol = q''<binary .gif data><EOF>.

Or you could delimit on both ends of course.

BTW, this will provide a better raw data encapsulation mechanism than XML has ;-)

(XML's <![CDATA[ ... ]]> construct can't nest, because of the fixed delimiters --
though maybe they have a fix by now. I haven't looked for the most recent spec).

I don't know how hard q'xxx'...xxx would be to implement, but I would think a relatively
minor variant of raw string processing would do it (just at the usual end continue
instead of wrapping up, and look for the string you have so far as a trailing
delimiter for a new string starting with the next character).

The Q'xxx'z...xxxz variant isn't essential. Just a way to put an EOL at the z
for aesthetics.

Regards,
Bengt Richter




More information about the Python-list mailing list