[Python-ideas] Python 3000 TIOBE -3%
Ethan Furman
ethan at stoneleaf.us
Thu Feb 16 20:34:16 CET 2012
Stephen J. Turnbull wrote:
> Ethan Furman writes:
>> The above is not arguing with the 'latin-1' nor 'surrogateescape'
>> techniques, but only commenting on a different data type with probably
>> different uses.
>
> But there really aren't any uses that aren't equally well dealt with
> by 'surrogateescape' that I can see. You have to process it code unit
> by code unit (just like surrogateescape) and if you find a non-
> character code unit, you then have an ad hoc decision to make about
> what to do with it.
>
> surrogateescape makes one particular treatment blazingly efficient
> (namely, turning the surrogate back into a byte with no known
> meaning). What other treatment of a byte of by-definition unknown
> semantics deserves the blazing efficiency that a new (presumably
> builtin) type could give?
It wasn't the 'unknown semantics' that I was responding to (latin-1 and
surrogateescape deal with that just fine), but rather a new data type
with a mixture of valid unicode (0-127) and raw bytes (128-255) -- I
don't think that would be common enough to justify, and I can see
confusion again creeping in when somebody (like myself ;) sees a
datatype which seemingly supports a mixture of unicode and raw bytes
only to find out that 'uni_raw(...)[5] != 32' because a u' ' was
returned and an integer (or raw byte) was expected at that location.
~Ethan~
More information about the Python-ideas
mailing list