[Python-Dev] email package status in 3.X
Michael Urman
murman at gmail.com
Tue Jun 22 15:24:28 CEST 2010
On Tue, Jun 22, 2010 at 00:28, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Michael Urman writes:
>
> > It is somewhat troublesome that there doesn't appear to be an obvious
> > built-in idempotent-when-possible function that gives back the
> > provided bytes/str,
>
> If you want something idempotent, it's already the case that
> bytes(b'abc') => b'abc'. What might be desirable is to make
> bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII
> (or maybe ISO 8859/1).
By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding,
errors) that would pass an instance of bytes through, or encode an
instance of str. And of course a to_str that performs similarly,
passing str through and decoding bytes. While bytes(b'abc') will give
me b'abc', neither bytes('abc') nor bytes(b'abc', 'latin-1') get me
the b'abc' I want to see.
These are trivial functions; I just don't fully understand why the
capability isn't baked in. A one argument call is idempotent capable;
a two argument call isn't as it only converts.
It's not a completely made-up requirement either. A cross-platform
piece of software may need to present to a user items that are
sometimes str and sometimes bytes - particularly filenames.
> Unfortunately, str(b'abc') already does work, but
>
> steve at uwakimon ~ $ python3.1
> Python 3.1.2 (release31-maint, May 12 2010, 20:15:06)
> [GCC 4.3.4] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> str(b'abc')
> "b'abc'"
>>>>
>
> Oops. You can see why that probably "should" be the case
Sure, and I love having this there for debugging. But this is hardly
good enough for presenting to a user once you leave ascii.
>>> u = '日本語'
>>> sjis = bytes(u, 'shift-jis')
>>> utf8 = bytes(u, 'utf-8')
>>> str(sjis), str(utf8)
("b'\\x93\\xfa\\x96{\\x8c\\xea'",
"b'\\xe6\\x97\\xa5\\xe6\\x9c\\xac\\xe8\\xaa\\x9e'")
When I happen to know the encoding, I can reverse it much more cleanly.
>>> str(sjis, 'shift-jis'), str(utf8, 'utf-8')
('日本語', '日本語')
But I can't mix this approach with str instances without writing a
different invocation.
>>> str(u, 'argh')
TypeError: decoding str is not supported
--
Michael Urman
More information about the Python-Dev
mailing list