[Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?]

Tue Jan 7 20:43:49 CET 2014

On 01/07/2014 11:32 AM, MRAB wrote:
> On 2014-01-07 18:38, Ethan Furman wrote:
>> On 01/07/2014 10:22 AM, MRAB wrote:
>>>> On Jan 7, 2014, at 7:44, Steven D'Aprano <steve at pearwood.info> wrote:
>>>>
>>>>> Suppose we take a byte-string with a non-ASCII byte:
>>>>>
>>>>>    b'abc\xFF'.decode('ascii-compatible')
>>>>>
>>> That would be:
>>>
>>>      bytestring(b'abc\xFF')
>>>
>>> Bytes outside the ASCII range would be mapped to Unicode low
>>> surrogates:
>>>
>>>      bytestring(b'abc\xFF') == bytestring('abc\uDCFF')
>>
>> Not sure what you mean here.  The resulting bytes should be 'abc\xFF' and of length 4.
>>
> 'abc\xFF' is a Unicode string, but you wouldn't be able to convert it
> to a bytestring because '\xFF' is a codepoint outside the ASCII range
> and not a low surrogate.

I can see terminology is going to be a pain in this thread.  ;)

My vision for a bytestring type (more refined):

   - made up of single bytes in the range 0 - 255 (no unicode anywhere)

   - indexing returns a bytestring of length 1, not an integer (as bytes does)

   - `bytestring(7)` either fails, or returns 'bytestring('\x07')' not 'bytestring(0, 0, 0, 0, 0, 0, 0)'

So my statement above of 'abc\xFF' should not be interpreted as a unicode string... I guess I'll use 'y' as an 
abbreviation for now: y'abc\xFF'.

--
~Ethan~