[Python-ideas] .from and .to instead of .encode and .decode
Chris Angelico
rosuav at gmail.com
Sat Mar 7 12:41:35 CET 2015
On Fri, Mar 6, 2015 at 12:40 AM, anatoly techtonik <techtonik at gmail.com> wrote:
> Hi,
>
> While looking at the code like:
>
> 'os': sysinfo['os'].decode('utf-8'),
> 'hostname': sysinfo['hostname'].decode('utf-8'),
>
> I can't really read if the result will be unicode or binary string in
> utf-8.
If it says "decode", the result is a Unicode string. If it says
"encode", the result is bytes. I'm not sure what is difficult here.
> .encode/.decode are confusing, because it Python 2 it was:
>
> str.encode(encoding) -> str
> str.decode(encoding) -> str
>
> with no encoding info attached.
$ python
Python 2.7.3 (default, Mar 13 2014, 11:03:55)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "test".decode("utf-8")
u'test'
>>> u"test".encode("utf-8")
'test'
Looks to me like str.decode() -> unicode, and unicode.encode() -> str,
at least for UTF-8 and other encodings that apply to Unicode. Yes,
there are some oddities in Py2:
>>> "74657374".decode("hex")
'test'
in which str.decode returns a str, but AFAIK all of those are buried
away in Python 3:
>>> codecs.decode(b"74657374","hex")
b'test'
So where's the confusion?
ChrisA
More information about the Python-ideas
mailing list