[Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?]

Ethan Furman ethan at stoneleaf.us
Tue Jan 7 06:51:11 CET 2014


On 01/06/2014 09:05 PM, Stephen J. Turnbull wrote:
> Ethan Furman writes:
>>
>> The binary data I deal with occupies the full 0-255 range,
>
> My proposal deals with such data.  It simply prevents the program from
> interpreting the 128-255 range as Unicode characters.  You can still
> use regexps etc on the full range 0-255.
>
>> some of which is actually encoded text (and I decode it before
>> passing it back to the user), some of which is simple binary data,
>> and some of which is simple ASCII (metadata about fields and
>> whatnot).
>
> You're wrong, it would help you.  Encoded text must be decoded, and in
> that case it doesn't help you.  Unless you can treat it as a single
> ASCII-compatible encoding (eg, this works for ISO-8859 or KOI8), when
> the proposal wins for you.  Binary data and pure ASCII, the proposal
> wins for you, unless you're worried about spurious recognition of the
> binary data as ASCII metadata.  In that last case, again, nothing is
> going to help you as it's a domain problem.  My proposal is undefeated
> in your use case.

I just read your proposal again, and must admit I don't understand how it would help me, but I look forward to testing 
an implementation!

One wrinkle, though -- the data is binary, and if read would have to be read using the latin1 codec... although, I 
suppose I could open it, read the first 32 bytes, close it, figure out the encoding, reopen with the encoding.... hmmmm 
-- yup, still not sure how it would all work, but looking forward to testing it.

--
~Ethan~


More information about the Python-ideas mailing list