[I18n-sig] Binary data b"strings" (Pre-PEP: Proposed Python Character Model)

Paul Prescod paulp@ActiveState.com
Thu, 08 Feb 2001 15:23:50 -0800


I've thought about this coercion issue more...I think we need to
auto-coerece these binary strings using some well-defined rule (NOT a
default encoding!).

"M.-A. Lemburg" wrote:
> 
> > ...
> >
> > I would want to avoid the need for a 2.0-style 'default encoding', so I
> > suggest it shouldnt be possible to mix this type with other strings:
> >
> > >>> "1"+b"2"
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in ?
> > TypeError: cannot add type "binary" to string
> > >>> "3"==b"3"
> > 0
> 
> Right. This will cause people to rethink whether they are
> using the object for text data or binary data. I still think that
> at the interface level, b"" and "" should be treated the same (except
> that b""-strings should not implement the char buffer interface).

If C functions auto-convert these things then people will coerce them by
passing them through C functions. e.g. the regular expression engine or
null encoding functions or whatever.

If we do NOT auto-coerce these things then they will not be compatible
with many parts of the Python infrastructure, the regular expression
engine and codecs being the most important examples. A clear requirement
from Andy Robinson was that string-like code should work on binary data
because often binary strings are "really" un-decoded strings. I think he
is speaking on behalf of a lot of serious internationalizers there.

> OTOH, these b""-strings should implement the same methods as the
> array type and probably seemlessly interact with it too. I don't
> know which type should be considered "better" in coercion
> though, b""-strings or arrays (I guess b""-strings).

Let's keep arrays separate. Arrays are mutable! If users ask for some
particular features from arrays to be also implemented on byte strings,
so be it. Let's only add magic after we know we really need it.

 Paul Prescod