[Python-ideas] Add "htmlcharrefreplace" error handler

Fri Jun 14 11:43:41 CEST 2013

On Fri, 14 Jun 2013 11:38:46 +0200
"M.-A. Lemburg" <mal at egenix.com> wrote:
> On 14.06.2013 10:49, Antoine Pitrou wrote:
> > On Fri, 14 Jun 2013 09:44:09 +0200
> > "M.-A. Lemburg" <mal at egenix.com> wrote:
> >>
> >>> IMHO character references (named or numerical) should never be used in
> >>> HTML (with the exception of " > and <).
> >>> They exist mainly for three reasons:
> >>> 1) provide a way to include characters that are not available in the
> >>> used encoding (e.g. if you are using an obsolete encoding like
> >>> windows-1252 but still want to use "fancy" characters);
> >>> 2) to keep the HTML source ASCII-only;
> >>
> >> This is the main reason for using them. HTML's default encoding
> >> is Latin-1, unlike XML.
> > 
> > I'd like to know which good reasons there are to not use utf-8 for HTML
> > pages in 2013.
> > "Keeping the HTML source ASCII-only" is just silly IMO, and it doesn't
> > warrant special support in Python's codec error handlers.
> 
> Ezio and I gave reasons, but you've cut them away ;-)

Uh, no, you cut Ezio's own rebuttals to those reasons.
Ezio's point still stands: named HTML character references have a use
for *manual* entering of HTML text (though of course they are
cumbersome), but that doesn't warrant a codec error handler which by
construction is used for *automatic* generation of HTML text.

Regards

Antoine.