<p dir="ltr">To port OpenStack to Python 3, I wrote 4 (2x2) helper functions which accept bytes *and* Unicode as input. xxx_as_bytes() functions return bytes, xxx_as_text() return Unicode:<br>

<a href="http://docs.openstack.org/developer/oslo.serialization/api.html">http://docs.openstack.org/developer/oslo.serialization/api.html</a></p>

<p dir="ltr">Victor</p>

<div class="gmail_quote">Le 14 juin 2016 5:21 PM, "Steven D'Aprano" <<a href="mailto:steve@pearwood.info">steve@pearwood.info</a>> a écrit :<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Normally I'd take a question like this to Python-List, but this question<br>

has turned out to be quite diversive, with people having strong opinions<br>

but no definitive answer. So I thought I'd ask here and hope that some<br>

of the core devs would have an idea.<br>

<br>

Why does base64 encoding in Python return bytes?<br>

<br>

base64.b64encode take bytes as input and returns bytes. Some people are<br>

arguing that this is wrong behaviour, as RFC 3548 specifies that Base64<br>

should transform bytes to characters:<br>

<br>

<a href="https://tools.ietf.org/html/rfc3548.html" rel="noreferrer" target="_blank">https://tools.ietf.org/html/rfc3548.html</a><br>

<br>

albeit US-ASCII characters. E.g.:<br>

<br>

    The encoding process represents 24-bit groups of input bits<br>

    as output strings of 4 encoded characters.<br>

    [...]<br>

    Each 6-bit group is used as an index into an array of 64 printable<br>

    characters.  The character referenced by the index is placed in the<br>

    output string.<br>

<br>

Are they misinterpreting the standard? Has Python got it wrong? Is there<br>

a good reason for returning bytes?<br>

<br>

I see that other languages choose different strategies. Microsoft's<br>

languages C#, F# and VB (plus their C++ compiler) take an array of bytes<br>

as input, and outputs a UTF-16 string:<br>

<br>

<a href="https://msdn.microsoft.com/en-us/library/dhx0d524%28v=vs.110%29.aspx" rel="noreferrer" target="_blank">https://msdn.microsoft.com/en-us/library/dhx0d524%28v=vs.110%29.aspx</a><br>

<br>

Java's base64 encoder takes and returns bytes:<br>

<br>

<a href="https://docs.oracle.com/javase/8/docs/api/java/util/Base64.Encoder.html" rel="noreferrer" target="_blank">https://docs.oracle.com/javase/8/docs/api/java/util/Base64.Encoder.html</a><br>

<br>

and Javascript's Base64 encoder takes input as UTF-16 encoded text and<br>

returns the same:<br>

<br>

<a href="https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding" rel="noreferrer" target="_blank">https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding</a><br>

<br>

I'm not necessarily arguing that Python's strategy is the wrong one, but<br>

I am interested in what (if any) reasons are behind it.<br>

<br>

<br>

Thanks in advance,<br>

<br>

<br>

<br>

<br>

Steve<br>

_______________________________________________<br>

Python-Dev mailing list<br>

<a href="mailto:Python-Dev@python.org">Python-Dev@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/python-dev" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/python-dev</a><br>

Unsubscribe: <a href="https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com" rel="noreferrer" target="_blank">https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com</a><br>

</blockquote></div>