[I18n-sig] Strawman Proposal: Binary Strings

M.-A. Lemburg mal@lemburg.com
Sat, 10 Feb 2001 15:56:10 +0100

Toby Dickenson wrote:
> On Thu, 08 Feb 2001 12:24:49 -0800, Paul Prescod
> <paulp@ActiveState.com> wrote:
> >> What if string.encode() returned a binary string.... would we need a
> >> 'binary()' builtin at all?

binary() is needed one way or another. It is standard Python
philosophy that all types need to have an exposed constructor and
these should do some form of implicit or explicit but well-defined
coercion from other data types to binary strings.

About changing .encode() or the existing codecs to return binary
strings instead of normal strings: I'm -1 on this one since it
will break existing code. The outcome of .encode() is totally
up the codec doing the work, BTW (and this is by design), so
new codecs could choose to return binary strings.

For converting strings or Unicode to binary data, I'd suggest
to add a "binary" codec which then returns the raw bytes of the
string ior Unicode object in question as binary string.

Note that changing e.g. .encode('latin-1') to return a binary string
doesn't really make sense, since here we know the encoding ! Instead,
strings should probably carry along the encoding information in an
additional attribute (it is not always useful, but can help in
a few situations) provided that it is known.

This would give us three string types:

1. standard 8-bit strings with encoding attribute
2. binary 8-bit strings without encoding attribute or a constant
   value of 'binary' for this attribute
3. Unicode strings which don't need an encoding attribute :-)

Hmm, getting all these to properly interoperate without breaking
existing code will be troublesome...

Marc-Andre Lemburg
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/