[Doc-SIG] Idea: make double-space between sentences meaningful

Beni Cherniavsky cben at users.sf.net
Sat May 15 18:57:57 EDT 2004


Some formats (notably LaTeX) support the typographical convention (of
some languages, e.g. English but not French IIRC) of putting a bigger
space after the end of a sentence than between words.  LaTeX tries to
guess intellegently but can fail.  Its guessing can be explicitly
overriden [1]_.

Currently, reST provides no way to convey this information to the output
format.  Producing high-quality output requires this information.  There
already exists an obvious convention supported by programs (e.g. Emacs) 
for representing it in plain text: just use a double space after the end
of a sentence.  I propose to make this official for reStructuredText:
more than one space between words after punctuation [2]_ signifies a
sentence end [3]_.

Backward compatiblity: at worse, it will force all sentence ends to
single spaces in existing documents that don't use the double-space
convention in the reST source.  It's a good bet that anybody who cares
about it in his LaTeX output also cares about his source, but it's a
good idea to make this a parser option (defaulting off?)...

It is even possible, if desired, to support this in HTML output, using
some hack (`` `` won't do because we *want* it to be breakable -
it's even better there; perhaps ``<span class="sentence-end"> </span>``
with appropriate CSS?).

.. [1] By using ``\@.`` for a sentence end and ``.\ `` for a sentence
        non-end.  See `The Not So Short Introduction to LaTeX 2ε`__,
        section 2.6.

        __ http://www.ctan.org/tex-archive/info/lshort/english/lshort.pdf

.. [2] Punctuation should be taken in a wide sense of the word.  E.g.
        many people end a sentence with a smiley without putting a period
        after it ;-).

.. [3] A period at end-of-line should be considered a sentence end per
        Emacs conventions (it acutally avoids putting non-sentence-end
        periods at end-of-line when refilling paragraphs!).  However, if
        there is a trailing whitespace, it should be used to decide (in
        the style of RFC 2646 - wrap lines *after* the whitespace - which
        is the only unambiguos way to retain spacing info at line ends;
        some editors (pico/nano) use this only when there is more than 

        one space - this algorithm will support them all).

-- 
Beni Cherniavsky <cben at users.sf.net>
Note: I can only read email on week-ends...



More information about the Doc-SIG mailing list