[Python-Dev] email package status in 3.X

Michael Urman murman at gmail.com
Tue Jun 22 15:24:28 CEST 2010


On Tue, Jun 22, 2010 at 00:28, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Michael Urman writes:
>
>  > It is somewhat troublesome that there doesn't appear to be an obvious
>  > built-in idempotent-when-possible function that gives back the
>  > provided bytes/str,
>
> If you want something idempotent, it's already the case that
> bytes(b'abc') => b'abc'.  What might be desirable is to make
> bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII
> (or maybe ISO 8859/1).

By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding,
errors) that would pass an instance of bytes through, or encode an
instance of str. And of course a to_str that performs similarly,
passing str through and decoding bytes. While bytes(b'abc') will give
me b'abc', neither bytes('abc') nor bytes(b'abc', 'latin-1') get me
the b'abc' I want to see.

These are trivial functions; I just don't fully understand why the
capability isn't baked in. A one argument call is idempotent capable;
a two argument call isn't as it only converts.

It's not a completely made-up requirement either. A cross-platform
piece of software may need to present to a user items that are
sometimes str and sometimes bytes - particularly filenames.

> Unfortunately, str(b'abc') already does work, but
>
> steve at uwakimon ~ $ python3.1
> Python 3.1.2 (release31-maint, May 12 2010, 20:15:06)
> [GCC 4.3.4] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> str(b'abc')
> "b'abc'"
>>>>
>
> Oops.  You can see why that probably "should" be the case

Sure, and I love having this there for debugging. But this is hardly
good enough for presenting to a user once you leave ascii.
>>> u = '日本語'
>>> sjis = bytes(u, 'shift-jis')
>>> utf8 = bytes(u, 'utf-8')
>>> str(sjis), str(utf8)
("b'\\x93\\xfa\\x96{\\x8c\\xea'",
"b'\\xe6\\x97\\xa5\\xe6\\x9c\\xac\\xe8\\xaa\\x9e'")

When I happen to know the encoding, I can reverse it much more cleanly.
>>> str(sjis, 'shift-jis'), str(utf8, 'utf-8')
('日本語', '日本語')

But I can't mix this approach with str instances without writing a
different invocation.
>>> str(u, 'argh')
TypeError: decoding str is not supported

-- 
Michael Urman


More information about the Python-Dev mailing list