[Python-ideas] Add "htmlcharrefreplace" error handler

Serhiy Storchaka storchaka at gmail.com
Sat Jun 15 08:16:44 CEST 2013


14.06.13 18:25, Antoine Pitrou написав(ла):
> On Fri, 14 Jun 2013 18:09:16 +0300
> Serhiy Storchaka <storchaka at gmail.com>
> wrote:
>> 14.06.13 11:49, Antoine Pitrou написав(ла):
>>> I'd like to know which good reasons there are to not use utf-8 for HTML
>>> pages in 2013.
>>
>> Russian text requires 2 bytes per character in utf-8 (not counting
>> spaces, punctuation and markup) and only 1 byte per character in any
>> special encoding (cp1251/cp866/koi8-r). Same for other European non
>> latin-based alphabets.
>
> And even latin-based (e.g. latin-1), but if you really care about this,
> it's certainly more efficient to compress your HTTP response than
> trying to save space at the character level.

In languages with latin-based alphabet usually only small part of 
characters are non-ascii. A utf-8 encoding adds only 5-10% to size.

>> Some old databases contains data in one of this
>> 8-bit encoding and generating html page in the same encoding does not
>> requires encoding/decoding at all.
>
> If it doesn't require encoding/decoding, how are you going to specify
> an encoding error handler?

Main part of the page can generated without encoding, but small part can 
contain encoded text.




More information about the Python-ideas mailing list