[Python-3000] Thoughts on new I/O library and bytecode

Sat Mar 3 17:02:52 CET 2007

On Tuesday 27 February 2007 00:39, Greg Ewing wrote:

> I can't help feeling the people arguing for b"..." as the
> repr format haven't really accepted the fact that text and
> binary data will be distinct things in py3k, and are thinking
> of bytes as being a replacement for the old string type. But
> that's not true -- most of the time, *unicode* will be the
> replacement for str when it is used to represent characters,
> and bytes will mostly be used only for non-text.
[etc.]

... but Guido prefers to use b"..." as the repr format,
on the grounds that byte-sequences quite often are
lightly encoded text, and that when that's true it
can be *much* better to report them as such.

Here's an ugly, impure, but possibly practical answer:
give each bytes object a single-bit flag meaning something
like "mostly textual"; make the bytes([1,2,3,4]) constructor
set it to false, the b"abcde" constructor set it to true,
and arbitrary operations on bytes objects do ... well,
something plausible :-). (Textuality/non-textuality is
generally preserved; combining texual and non-textual
yields non-textual.) Then repr() can look at that flag
and decide what to do on the basis of it.

This would mean that x==y ==> repr(x)==repr(y) would fail;
it can already fail when x,y are of different types (3==3.0;
1==True) and perhaps in some weird situations where they are
of the same type (signed IEEE zeros). It would make the behaviour
of repr() less predictable, and that's probably bad; it would
mean (unlike the examples I gave above) that you can have
x==y, with x and y of different types, but have repr(x)
and repr(y) not look at all similar.

Obviously the flag wouldn't affect comparisons or hashing.

I can't say I like this much -- it's exactly the sort of
behaviour I've found painful in Perl, with too much magic
happening behind the scenes for perhaps-insufficient
reason -- but it still might be the best available
compromise. (The other obvious compromise approach
would be to sniff the contents of the bytes object
and see whether it "looks" like a lightly-encoded
string. That's a bit too much magic for fuzzy reasons
too.)

-- 
Gareth McCaughan