[Python-ideas] Support WHATWG versions of legacy encodings

Wed Jan 17 13:13:55 EST 2018

I'm going to push back on the idea that this should only be used for
decoding, not encoding.

The use case I started with -- showing people how to fix mojibake using
Python -- would *only* use these codecs in the encoding direction. To fix
the most common case of mojibake, you encode it as web-1252 and decode it
as UTF-8 (because you got the data from someone who did the opposite).

I have implemented some decode-only codecs (such as CESU-8), for exactly
the reason of "why would you want more text in this encoding", but the
situation is different here.

On Wed, 17 Jan 2018 at 13:00 Chris Barker <chris.barker at noaa.gov> wrote:

> On Tue, Jan 16, 2018 at 9:30 PM, Stephen J. Turnbull <
> turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
>
>> In what context?  WHAT-WG's encoding standard is *all about browsers*.
>> If a codec is feeding text into a process that renders them all as
>> glyphs for a human to look at, that's one thing.  The codec doesn't
>> want to fatal there, and the likely fallback glyph is something from
>> the control glyphs block if even windows-125x doesn't have a glyph
>> there.  I guess it sort of makes sense.
>>
>
> sure it does -- and python is not a browser, and python itself has
> nothigni visual -- but we sure want to be abel to write code that produces
> visual representations of maybe messy text...
>
> if you're feeding a program
>
> ...
>
>> the codec has no idea when or how that's
>> going to get interpreted.
>
>
> sure -- which is why others have suggested that if WATWG is supported,
> then it *should* only be used for encoding, not encoding. But we are
> supposed to be consenting adults here -- I see no reason to prevent
> encoding -- maybe it would be useful for testing???
>
> (as with JSON data, which I believe is
>> "supposed" to be UTF-8, but many developers use the legacy charsets
>> they're used to and which are often embedded in the underlying
>> databases etc, ditto XML),
>
>
> OK -- if developers do the wrong thing, then they do the wrong thing -- we
> can't prevent that!
>
> And Python's lovely "text is unicode" model actually makes that hard to do
> wong. But we do need a way to decode messy text, and then send it off to
> JSON or whatever properly encoded.
>
> -CHB
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180117/016fdae9/attachment-0001.html>