<p dir="ltr"><br>

On 6 Jan 2014 19:16, "Andrew Barnert" <<a href="mailto:abarnert@yahoo.com">abarnert@yahoo.com</a>> wrote:<br>

><br>

> From: Nick Coghlan <<a href="mailto:ncoghlan@gmail.com">ncoghlan@gmail.com</a>><br>

> Sent: Sunday, January 5, 2014 2:57 PM<br>

><br>

><br>

> >I actually expected someone to have experimented with an "encodedstr" type by now. This would be a type that behaved like the Python 2 str type, but had an encoding attribute. On encountering Unicode text strings, it would encode then appropriately.<br>


><br>

> I did something like this when I was first playing with 3.0, and I managed to find it. <br>

><br>

> I tried two different implementations, a bytes subclass that fakes being a str as well as possible by decoding on the fly (or, in some cases, by encoding its arguments on the fly), and a str that fakes being a bytes as well as possible by doing the opposite.<br>


><br>

> >However, people have generally instead followed the model of decoding to text and operating in that domain, since it avoids a lot of subtle issues (like accidentally embedding byte order marks when concatenating strings).<br>


><br>

><br>

> It's also conceptually cleaner to work with text as text instead of as bytes that you can sort of use as text.<br>

><br>

> Also, one major reason people resist working with text (or upgrading to 3.x) is the perceived performance costs of dealing with Unicode. But if you want to do any kind of string processing on your text beyond searching for ASCII header names and the like, you pretty much have to do it as Unicode or it's wrong. So, you'd need something that allows you to do those ASCII header searches in 8-bit-land, but either doesn't allow full string processing, or automatically decodes and re-encodes on the fly (which obviously isn't going to be faster).<br>


><br>

> >This is likely encouraged by the fact that str, bytes and bytearray don't currently implement type coercion correctly (which in turn is due to a long standing bug in the way the abstract C API handles sequence types defined in C rather than Python), so an encodedstr type would need to inherit from str or bytes to get interoperability, and then wouldn't interoperate with the other one.<br>


><br>

><br>

> What's the bug?</p>

<p dir="ltr"><a href="http://bugs.python.org/issue11477">http://bugs.python.org/issue11477</a></p>

<p dir="ltr">CPython doesn't check for NotImplemented results from sq_concat or sq_repeat, so the sequence implementations raise TypeError directly and the RHS doesn't get consulted to see if it can handle the operation. Subclassing works anyway because subclasses are always checked first even when they're the RHS.</p>


<p dir="ltr">Thanks for the info on your experiences with attempting to implement an encodedstr type. I still feel there is potential merit to the concept, but it's certainly going to take some thought.</p>

<p dir="ltr">Cheers,<br>

Nick.<br>

</p>