[I18n-sig] Strawman Proposal: Binary Strings

Paul Prescod paulp@ActiveState.com
Thu, 8 Feb 2001 09:44:08 -0800 (PST)


A binary string is a string that is declared by the user to be a carrier
of binary data and not (directly) of textual data (Unicode characters).

In order to get a rapid adoption of binary strings, they are designed to
be as similar to Python strings as is possible. This means that they have
all of the same methods, are immutable and so forth. They also follow
Python's existing string->Unicode coercion rules.

These rules are arguably too "loose" but experience shows that coercion
rules are often highly personal and the arguments one way or the other
tend to be philosophical rather than practical. For example, Java and
JavaScript automatically coerce objects to strings when they are added to
strings. Python does not. Neither choice seems a large mistake.

Binary strings differ from regular strings in the following ways:

 a) they have a unique type object named types.BinaryString

 b) they are constucted in Python code in one of three ways:

     1. using a "b" prefix on string literals

     2. using a function called binary()

     3. from some other C-coded function such as a file i/o library

 c) they repr() themselves with a b"" prefix as per Unicode strings

One reason to add the binary data type is because at some point in the
future may deprecate the construction of binary data in ordinary string
literals. Although details remain to be worked out, it is a goal that in
the future string literals will always be interpreted as character
strings. That might mean that non-ASCII characters will some day be
disallowed or that they wil be interpeted according to a declared Unicode
transformation encoding.

Conventions for binary file I/O will be worked out in a separate proposal.