[Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?]
Ethan Furman
ethan at stoneleaf.us
Tue Jan 7 06:51:11 CET 2014
On 01/06/2014 09:05 PM, Stephen J. Turnbull wrote:
> Ethan Furman writes:
>>
>> The binary data I deal with occupies the full 0-255 range,
>
> My proposal deals with such data. It simply prevents the program from
> interpreting the 128-255 range as Unicode characters. You can still
> use regexps etc on the full range 0-255.
>
>> some of which is actually encoded text (and I decode it before
>> passing it back to the user), some of which is simple binary data,
>> and some of which is simple ASCII (metadata about fields and
>> whatnot).
>
> You're wrong, it would help you. Encoded text must be decoded, and in
> that case it doesn't help you. Unless you can treat it as a single
> ASCII-compatible encoding (eg, this works for ISO-8859 or KOI8), when
> the proposal wins for you. Binary data and pure ASCII, the proposal
> wins for you, unless you're worried about spurious recognition of the
> binary data as ASCII metadata. In that last case, again, nothing is
> going to help you as it's a domain problem. My proposal is undefeated
> in your use case.
I just read your proposal again, and must admit I don't understand how it would help me, but I look forward to testing
an implementation!
One wrinkle, though -- the data is binary, and if read would have to be read using the latin1 codec... although, I
suppose I could open it, read the first 32 bytes, close it, figure out the encoding, reopen with the encoding.... hmmmm
-- yup, still not sure how it would all work, but looking forward to testing it.
--
~Ethan~
More information about the Python-ideas
mailing list