[Python-ideas] Add "htmlcharrefreplace" error handler
M.-A. Lemburg
mal at egenix.com
Fri Jun 14 11:58:44 CEST 2013
On 14.06.2013 11:25, Masklinn wrote:
> On 2013-06-14, at 10:49 , Antoine Pitrou wrote:
>> On Fri, 14 Jun 2013 09:44:09 +0200
>> "M.-A. Lemburg" <mal at egenix.com> wrote:
>>>
>>>> IMHO character references (named or numerical) should never be used in
>>>> HTML (with the exception of " > and <).
>>>> They exist mainly for three reasons:
>>>> 1) provide a way to include characters that are not available in the
>>>> used encoding (e.g. if you are using an obsolete encoding like
>>>> windows-1252 but still want to use "fancy" characters);
>>>> 2) to keep the HTML source ASCII-only;
>>>
>>> This is the main reason for using them. HTML's default encoding
>>> is Latin-1, unlike XML.
>>
>> I'd like to know which good reasons there are to not use utf-8 for HTML
>> pages in 2013.
>> "Keeping the HTML source ASCII-only" is just silly IMO, and it doesn't
>> warrant special support in Python's codec error handlers.
>
> As far as I know M.A. is technically wrong, there is no such thing as
> a default HTML encoding (browsers have their own possibly configurable[0]
> defaults with "proprietary" heuristics, but no HTML spec defines
> any kind of default only a sequence of encoding extraction before
> falling back on heuristics).
AFAIK, this was first defined in HTML 2.0, perhaps even earlier:
http://tools.ietf.org/html/draft-ietf-html-spec-05#section-6.1
http://tools.ietf.org/html/draft-ietf-html-spec-05#section-9.5
It's still part of HTML 4.0:
http://www.w3.org/TR/html401/sgml/intro.html
HTTP also uses Latin-1 as default:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1
But this is getting off-topic.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Jun 14 2013)
>>> Python Projects, Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2013-07-01: EuroPython 2013, Florence, Italy ... 17 days to go
2013-07-16: Python Meeting Duesseldorf ... 32 days to go
::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-ideas
mailing list