[I18n-sig] Pre-PEP: Proposed Python Character Model

Paul Prescod paulp@ActiveState.com
Thu, 08 Feb 2001 15:14:40 -0800

"Martin v. Loewis" wrote:
> ...
> For other files, the ratio may vary. In general, I believe "binary"
> strings in source code, as many of the strings are typically processed
> by some other program which expects a specific byte sequence, rather
> than a character string.

I think that your counting methodology is highly suspect. I consider a
binary string to be a string that contains elements that the author did
not think of in terms of some subset of Unicode.

So for example:

sys_version = "Python/" + string.split(sys.version)[0]

Nobody would ever expect sys_version to have anything other than Unicode
characters in it. The pattern of strings produced here will always be
composed only of Unicode-legal elements. A GIF file is binary because
most bytes are not intended to be Unicode characters.

According to your definition, an XML document comprising a SOAP message
is "binary" rather than "text" despite what the XML specification says.
After all, what could be more "protocol" than SOAP.

Things like the Python version and SOAP messages are designed to be both
protocol and text. Thats a major part of what distinguishes SOAP from
DCOM or IIOP for example.

 Paul Prescod