On Jun 22, 2010, at 08:03 AM, Nick Coghlan wrote:
On Tue, Jun 22, 2010 at 6:16 AM, P.J. Eby <pje@telecommunity.com> wrote:
True, but making it a separate type with a required encoding gets rid of the magical "I don't know" - the "I don't know" encoding is just a plain old bytes object.
So, to boil down the ebytes idea, it is basically a request for a second string type that holds an octet stream plus an encoding name, rather than a Unicode character stream. Calling it "ebytes" seems to emphasise the wrong parallel in that case (you have a 'str' object with a different internal structure, not any kind of bytes object). For now I'll call it an "altstr". Then the idea can be described as
Actually no. We're introducing a second bytes type that holds an octet stream plus an encoding name. See the toy implementation I included in a previous message. As opposed to say a bytes object that represented an image, which would make almost no sense to decode to a unicode, this ebytes type would help bridge the gap between a pure bytes object and a pure unicode object. It would know how to accurately convert to a unicode (i.e. __str__()) because it would know the encoding of the bytes. Obviously, it could convert to a pure bytes object. Because it can be accurately stringified, it can have the most if not all of the str API. -Barry