[Python-ideas] Add "htmlcharrefreplace" error handler

M.-A. Lemburg mal at egenix.com
Fri Jun 14 11:58:44 CEST 2013


On 14.06.2013 11:25, Masklinn wrote:
> On 2013-06-14, at 10:49 , Antoine Pitrou wrote:
>> On Fri, 14 Jun 2013 09:44:09 +0200
>> "M.-A. Lemburg" <mal at egenix.com> wrote:
>>>
>>>> IMHO character references (named or numerical) should never be used in
>>>> HTML (with the exception of " > and <).
>>>> They exist mainly for three reasons:
>>>> 1) provide a way to include characters that are not available in the
>>>> used encoding (e.g. if you are using an obsolete encoding like
>>>> windows-1252 but still want to use "fancy" characters);
>>>> 2) to keep the HTML source ASCII-only;
>>>
>>> This is the main reason for using them. HTML's default encoding
>>> is Latin-1, unlike XML.
>>
>> I'd like to know which good reasons there are to not use utf-8 for HTML
>> pages in 2013.
>> "Keeping the HTML source ASCII-only" is just silly IMO, and it doesn't
>> warrant special support in Python's codec error handlers.
> 
> As far as I know M.A. is technically wrong, there is no such thing as
> a default HTML encoding (browsers have their own possibly configurable[0]
> defaults with "proprietary" heuristics, but no HTML spec defines
> any kind of default only a sequence of encoding extraction before
> falling back on heuristics).

AFAIK, this was first defined in HTML 2.0, perhaps even earlier:

http://tools.ietf.org/html/draft-ietf-html-spec-05#section-6.1
http://tools.ietf.org/html/draft-ietf-html-spec-05#section-9.5

It's still part of HTML 4.0:

http://www.w3.org/TR/html401/sgml/intro.html

HTTP also uses Latin-1 as default:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1

But this is getting off-topic.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 14 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-07-01: EuroPython 2013, Florence, Italy ...           17 days to go
2013-07-16: Python Meeting Duesseldorf ...                 32 days to go

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-ideas mailing list