[Python-ideas] .from and .to instead of .encode and .decode

Sat Mar 7 14:42:59 CET 2015

On Sat, Mar 7, 2015 at 8:41 AM, Chris Angelico <rosuav at gmail.com> wrote:
> If it says "decode", the result is a Unicode string. If it says
> "encode", the result is bytes. I'm not sure what is difficult here.

Yep. When I teach, I use this mnemonic, which I can now quote from my
book [1] ;-)

[TIP]
====
If you need a memory aid to distinguish `.decode()` from `.encode()`,
convince yourself that a Unicode `str` contains "human" text, while
byte sequences can be cryptic machine core dumps. Therefore, it makes
sense that we *decode* `bytes` to `str` to get human readable text,
and we *encode* text to `bytes` for storage or transmission.
====

[1] http://shop.oreilly.com/product/0636920032519.do

Cheers,

Luciano

>
>> .encode/.decode are confusing, because it Python 2 it was:
>>
>>     str.encode(encoding) -> str
>>     str.decode(encoding) -> str
>>
>> with no encoding info attached.
>
> $ python
> Python 2.7.3 (default, Mar 13 2014, 11:03:55)
> [GCC 4.7.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> "test".decode("utf-8")
> u'test'
>>>> u"test".encode("utf-8")
> 'test'
>
> Looks to me like str.decode() -> unicode, and unicode.encode() -> str,
> at least for UTF-8 and other encodings that apply to Unicode. Yes,
> there are some oddities in Py2:
>
>>>> "74657374".decode("hex")
> 'test'
>
> in which str.decode returns a str, but AFAIK all of those are buried
> away in Python 3:
>
>>>> codecs.decode(b"74657374","hex")
> b'test'
>
> So where's the confusion?
>
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

-- 
Luciano Ramalho
Twitter: @ramalhoorg

Professor em: http://python.pro.br
Twitter: @pythonprobr