[Doc-SIG] German Umlauts

Mon Jul 7 10:36:15 EDT 2003

On Mon, 7 Jul 2003, Dinu Gherman wrote:

> Christian Tismer:
>
> > Where I have problems are so-called "Umlauts".
> > I know this is kinda Unicode issue, but I'd just
> > like to know if and how this is handled in RST?
>
> I'm much of a reST newbie myself, but I do see Umlauts using the
> Latin-1 encoding and a call like this:
>
>    rest = docutils.core.publish_string(
>          text,
>          writer_name='html',
>          settings_overrides={'input_encoding': 'latin-1',
>                              'output_encoding': 'latin-1'})
>
> > The simple HTML way of &uuml; doesn't work (why?).
>
> Obviously, normal HTML snippets are not just recognised without
> using some kind of magic directives or escaping mechanisms...
> Finding that out is on my todo list as well...

being ignorant as long as possible, my 2cent:

0.01: we have two encodings

      a. the input encoding: which tells the reader (reST parser)
         what to expect.
      b. the output encoding: which tells the writer what to
         produce.

0.02: the html writer does handle the smallest possible number
      of html character encoding &<>" and as a bonus @ will be
      written as &#64; to maybe fool some foolish spamrobots.

      believing from seeing: when i make a document here (latin1 or iso-8859-1)

      a) running through: html.py without specifying an ecnoding gives me
         ``Ã„`` for an Ä.
      b) option "-i iso-8859" produces "Ä" for "Ä".
         and ``<?xml version="1.0" encoding="utf-8" ?>`` in the header.

0.03: would explain this behaviour but i offered only 2.

although i can offer another one two add encodings for any requested
html-entity (within reasonable limits not the whole unicode universe).

-- 
 BINGO: This left unindentionally unblank
 --- Engelbert Gruber -------+
  SSG Fintl,Gruber,Lassnig  /
  A6170 Zirl   Innweg 5b   /
  Tel. ++43-5238-93535 ---+