[Python-ideas] Python 3000 TIOBE -3%

Thu Feb 16 20:34:16 CET 2012

Stephen J. Turnbull wrote:
> Ethan Furman writes:
>> The above is not arguing with the 'latin-1' nor 'surrogateescape' 
>> techniques, but only commenting on a different data type with probably 
>> different uses.
> 
> But there really aren't any uses that aren't equally well dealt with
> by 'surrogateescape' that I can see.  You have to process it code unit
> by code unit (just like surrogateescape) and if you find a non-
> character code unit, you then have an ad hoc decision to make about
> what to do with it.
> 
> surrogateescape makes one particular treatment blazingly efficient
> (namely, turning the surrogate back into a byte with no known
> meaning).  What other treatment of a byte of by-definition unknown
> semantics deserves the blazing efficiency that a new (presumably
> builtin) type could give?

It wasn't the 'unknown semantics' that I was responding to (latin-1 and 
surrogateescape deal with that just fine), but rather a new data type 
with a mixture of valid unicode (0-127) and raw bytes (128-255) -- I 
don't think that would be common enough to justify, and I can see 
confusion again creeping in when somebody (like myself ;) sees a 
datatype which seemingly supports a mixture of unicode and raw bytes 
only to find out that 'uni_raw(...)[5] != 32' because a u' ' was 
returned and an integer (or raw byte) was expected at that location.

~Ethan~