[Doc-SIG] German Umlauts
grubert at users.sourceforge.net
grubert at users.sourceforge.net
Mon Jul 7 10:36:15 EDT 2003
On Mon, 7 Jul 2003, Dinu Gherman wrote:
> Christian Tismer:
>
> > Where I have problems are so-called "Umlauts".
> > I know this is kinda Unicode issue, but I'd just
> > like to know if and how this is handled in RST?
>
> I'm much of a reST newbie myself, but I do see Umlauts using the
> Latin-1 encoding and a call like this:
>
> rest = docutils.core.publish_string(
> text,
> writer_name='html',
> settings_overrides={'input_encoding': 'latin-1',
> 'output_encoding': 'latin-1'})
>
> > The simple HTML way of ü doesn't work (why?).
>
> Obviously, normal HTML snippets are not just recognised without
> using some kind of magic directives or escaping mechanisms...
> Finding that out is on my todo list as well...
being ignorant as long as possible, my 2cent:
0.01: we have two encodings
a. the input encoding: which tells the reader (reST parser)
what to expect.
b. the output encoding: which tells the writer what to
produce.
0.02: the html writer does handle the smallest possible number
of html character encoding &<>" and as a bonus @ will be
written as @ to maybe fool some foolish spamrobots.
believing from seeing: when i make a document here (latin1 or iso-8859-1)
a) running through: html.py without specifying an ecnoding gives me
``Ã`` for an Ä.
b) option "-i iso-8859" produces "Ä" for "Ä".
and ``<?xml version="1.0" encoding="utf-8" ?>`` in the header.
0.03: would explain this behaviour but i offered only 2.
although i can offer another one two add encodings for any requested
html-entity (within reasonable limits not the whole unicode universe).
--
BINGO: This left unindentionally unblank
--- Engelbert Gruber -------+
SSG Fintl,Gruber,Lassnig /
A6170 Zirl Innweg 5b /
Tel. ++43-5238-93535 ---+
More information about the Doc-SIG
mailing list