[Web-SIG] The write callable (vs. file-like object)
Alan Kennedy
py-web-sig at xhaus.com
Tue Aug 31 19:35:43 CEST 2004
[Phillip J. Eby]
> If Python currently had a "byte array" type, we'd be using that instead
> of strings. Direct writing of Unicode isn't intended to ever be
> directly supported by the standard, although in principle you could
> create some kind of "encoding middleware" that sits directly atop the
> application. (An application or framework written to it would
> technically not be WSGI-compliant.)
>
> I guess I need to add something about byte arrays to the spec,
> especially since Java/Jython may have this issue today (i.e. strings are
> Unicode, but for HTTP a byte array is needed).
Hmmm: looking under the jython covers, I think there is no problem with
binary strings.
org.python.core.PyFile implements the write method for *binary* data by
transcoding the Unicode string using the
java.lang.String.getBytes(int,int,byte[],int) method (which is
deprecated because it doesn't transcode unicode characters properly).
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes(int,%20int,%20byte[],%20int)
The javadoc says: "Copies characters from this string into the
destination byte array. Each byte receives the 8 low-order bits of the
corresponding character. The eight high-order bits of each character are
not copied and do not participate in the transfer in any way."
Which, AFAICT, is not a problem, because (I'm presuming) jython stores
binary data as one byte per character of a string, i.e. the low byte. So
the above transcoding would be fine, when you're dealing with bytes, not
actual characters.
When the output is *character* data (i.e. the "if (binary)" clause is
false, see below), the java.lang.String.getBytes() method is used, which
transcodes properly to bytes, according to the "platform's default
charset", which is set at JVM startup time.
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html#getBytes()
If anyone is interested, here is the code for the
PyFile.getBytes(String) method, called by PyFile.write().
protected byte[] getBytes(String s)
{
// Yes, I known the method is depricated, but it is the fastest
// way of converting between between byte[] and String
if (binary)
{
byte[] buf = new byte[s.length()];
s.getBytes(0, s.length(), buf, 0);
return buf;
}
else
return s.getBytes();
}
So, I think all is well here: jython knows how to properly manage byte
strings vs. python strings.
Regards,
Alan.
P.S. The spelling mistakes in the code comments above are verbatim from
the jython 2.1 codebase. All other speeling misteaks are my own ;-)
More information about the Web-SIG
mailing list