[Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?]
Ethan Furman
ethan at stoneleaf.us
Tue Jan 7 20:43:49 CET 2014
On 01/07/2014 11:32 AM, MRAB wrote:
> On 2014-01-07 18:38, Ethan Furman wrote:
>> On 01/07/2014 10:22 AM, MRAB wrote:
>>>> On Jan 7, 2014, at 7:44, Steven D'Aprano <steve at pearwood.info> wrote:
>>>>
>>>>> Suppose we take a byte-string with a non-ASCII byte:
>>>>>
>>>>> b'abc\xFF'.decode('ascii-compatible')
>>>>>
>>> That would be:
>>>
>>> bytestring(b'abc\xFF')
>>>
>>> Bytes outside the ASCII range would be mapped to Unicode low
>>> surrogates:
>>>
>>> bytestring(b'abc\xFF') == bytestring('abc\uDCFF')
>>
>> Not sure what you mean here. The resulting bytes should be 'abc\xFF' and of length 4.
>>
> 'abc\xFF' is a Unicode string, but you wouldn't be able to convert it
> to a bytestring because '\xFF' is a codepoint outside the ASCII range
> and not a low surrogate.
I can see terminology is going to be a pain in this thread. ;)
My vision for a bytestring type (more refined):
- made up of single bytes in the range 0 - 255 (no unicode anywhere)
- indexing returns a bytestring of length 1, not an integer (as bytes does)
- `bytestring(7)` either fails, or returns 'bytestring('\x07')' not 'bytestring(0, 0, 0, 0, 0, 0, 0)'
So my statement above of 'abc\xFF' should not be interpreted as a unicode string... I guess I'll use 'y' as an
abbreviation for now: y'abc\xFF'.
--
~Ethan~
More information about the Python-ideas
mailing list