[Python-ideas] Add "htmlcharrefreplace" error handler

Serhiy Storchaka storchaka at gmail.com
Fri Jun 14 17:09:16 CEST 2013


14.06.13 11:49, Antoine Pitrou написав(ла):
> I'd like to know which good reasons there are to not use utf-8 for HTML
> pages in 2013.

Russian text requires 2 bytes per character in utf-8 (not counting 
spaces, punctuation and markup) and only 1 byte per character in any 
special encoding (cp1251/cp866/koi8-r). Same for other European non 
latin-based alphabets. Some old databases contains data in one of this 
8-bit encoding and generating html page in the same encoding does not 
requires encoding/decoding at all.

> "Keeping the HTML source ASCII-only" is just silly IMO, and it doesn't
> warrant special support in Python's codec error handlers.

"xmlcharrefreplace" is so good as "htmlentityreplace" and even better 
for this purpose.



More information about the Python-ideas mailing list