[Python-ideas] Chaining coders

Rob Speer rspeer at luminoso.com
Fri Jan 19 18:05:57 EST 2018


I see how this is another way to get what I was asking for: a way to decode
some unfortunately common text encodings, ones that Web browsers use, in
Python without having to import additional modules.

I appreciate other ideas about how to solve this problem, but the
generality here seems pretty unnecessary. The world isn't making any
_novel_ legacy encodings. There are 8 legacy encodings that Python has
missed, and there's no reason to expect there to be any more of them.

It's worrisome to support arbitrary compositions of encodings. Most of
these possible hybrid encodings haven't been used before, and using them
would be a bad idea because there would be no reason to expect any other
software in existence to be compatible with them.

Some of these legacy encodings (like the webbish version of windows-1255)
are not the composition of two encodings that already exist in Python. So
you'd have to define new encodings anyway.

On Fri, 19 Jan 2018 at 17:09 Soni L. <fakedme+py at gmail.com> wrote:

> windows-1252 is based on iso-8859-1. Thus, I'd like to be able to chain
> coders as follows:
>
> bytes.decode("windows-1252-ext", else=lambda r: r.decode("iso-8859-1"))
>
> What this "else" does is that it's a lambda, and it gets passed an
> object with a decode method identical to the bytes decode method, except
> that it doesn't affect already-decoded characters. In this case,
> "windows-1252-ext" only includes things in the \x80-\x9F range, leaving
> it up to "iso-8859-1" to handle the rest.
>
> A similar process would happen for encoding: encode with
> "windows-1252-ext", else = "iso-8859-1".
>
> (Technically, "windows-1252-ext" isn't needed - you can use the existing
> "windows-1252" and combine it with the "iso-8859-1" to get
> "windows-1252-c1".)
>
> This would be a novel way to think of encodings as not just flat
> translation tables but highly composable translation tables. I have a
> thing for composition.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180119/127f48b0/attachment-0001.html>


More information about the Python-ideas mailing list