[Python-ideas] .from and .to instead of .encode and .decode

Sat Mar 7 12:41:35 CET 2015

On Fri, Mar 6, 2015 at 12:40 AM, anatoly techtonik <techtonik at gmail.com> wrote:
> Hi,
>
> While looking at the code like:
>
>     'os': sysinfo['os'].decode('utf-8'),
>     'hostname': sysinfo['hostname'].decode('utf-8'),
>
> I can't really read if the result will be unicode or binary string in
> utf-8.

If it says "decode", the result is a Unicode string. If it says
"encode", the result is bytes. I'm not sure what is difficult here.

> .encode/.decode are confusing, because it Python 2 it was:
>
>     str.encode(encoding) -> str
>     str.decode(encoding) -> str
>
> with no encoding info attached.

$ python
Python 2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "test".decode("utf-8")
u'test'
>>> u"test".encode("utf-8")
'test'

Looks to me like str.decode() -> unicode, and unicode.encode() -> str,
at least for UTF-8 and other encodings that apply to Unicode. Yes,
there are some oddities in Py2:

>>> "74657374".decode("hex")
'test'

in which str.decode returns a str, but AFAIK all of those are buried
away in Python 3:

>>> codecs.decode(b"74657374","hex")
b'test'

So where's the confusion?

ChrisA