Mandis Quotes (aka retiring """ and ''')

Andrew Dalke adalke at mindspring.com
Mon Oct 4 19:41:57 CEST 2004


Russell Nelson wrote:
 > "Mandis quotes".

Trying it out ...

'Dave said 'Let's go to Bill's house and shoot a round of
pool.  Afterwards we'll watch a movie.  'Ere long 'twould
be 'ardly fittin' 'Dave said'

'quote
'When will we three three meet again?

'quote
'

'quote
'You see I had a spacesuit.  How it happened was
  this way.

'quote
'


The latter one will be tricky because the spaces.  I
mixed "quote  \n", "quote \n", "quote \n" and "quote \n"
in the above, so that there's one string with the quotes
from Shakespeare and Heinlein rather than two.

Were this implemented I would suggest that whitespace
characters not be allowed, or at least be prohibited
from the terminal points of the Manis quote indicators.

> More formally, a mandis quote is a pair of tokens surrounding a
> completely arbitrary sequence of bytes.  These tokens are comprised of
> a possibly null sequence of characters preceded by and followed by a
> single quote.

I was going to say it precludes reading in a huge block
of bytes (>1GB in size) and quoting it because you'll need
to buffer everything in memory.  Then I remembered string
concatenation.  Process 1MB at a time.


> To save time, here's why this pre-PEP proposal sucks in decreasing
> order of severity:
> 
> o Python source is typically represented, not as an arbitrary string
>   of ASCII or Unicode characters, but instead as a sequence of lines
>   separated by the native line terminator (e.g. CRLF, LF, or CR).
> 
> o Editors are not all up to the task of inserting arbitrary
>   characters into strings (although they SHOULD).

One thought is that the actual quote identifier doesn't need
to be shown.  To start the quote, press '.  The computer
inserts '' and puts the cursor so the next character is
in between the two quotes.  Everything between those two
characters is treated as a string.  To stop the quote,
right arrow past the final quote or, in THE/HUMANE style,
LEAP to it.

When the text is saved, the editor is free to use an
arbitrarily created Mandis quote delimiter.

> o Email cannot withstand arbitrary strings of characters (although
>   quoted-printable suffices)7.

But doesn't that mean email can "withstand arbitrary strings
of characters"?

> o Some distinct Unicode characters are represented using the same
>   glyph, so that information is lost when text gets printed (but
>   that's more of a Unicode stupidism.)

When working with byte oriented data it's very helpful
to be able to see a text representation for non-printable
data.  For example, seeing "\r\n" instead of "
" (actually, that's only a "\n").  Similarly there are
non-visible unicode characters, including
   5760 OGHAM SPACE MARK
   8192 EN QUAD
   8193 EM QUAD
   8194 EN SPACE
   8195 EM SPACE
   8196 THREE-PER-EM SPACE
   8197 FOUR-PER-EM SPACE
   8198 SIX-PER-EM SPACE
   8199 FIGURE SPACE
   8200 PUNCTUATION SPACE
   8201 THIN SPACE
   8202 HAIR SPACE
   8203 ZERO WIDTH SPACE


I would like to be able to see exactly what I've got.
For example, here's something I could do with Python
as it is now

   if u"\N{EN QUAD}" in s:
     print "Has an 'en quad'"

How would I do that with Mandis quotes?  Would it
use editor support to show special characters vs.
normal ones?  How?


Any binary data can be inside the quote.  When does the
program know that that binary data is a representation
of unicode characters?  Not all binary data is valid
Unicode, and there are many possible encodings.  Or
would there still be an indicator like

   s'This is a character string'
   b'This is a byte string'

				Andrew
				dalke at dalkescientific.com



More information about the Python-list mailing list