On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum <guido@python.org> wrote:
But you'd still have to validate it, right? You wouldn't want to go onOn Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz
> I'd like a version of 'decode' which would give me a type that was, in every
> respect, unicode, and responded to all protocols exactly as other
> unicode objects (or "str objects", if you prefer py3 nomenclature ;-)) do,
> but wouldn't actually copy any of that memory unless it really needed to
> (for example, to pass to a C API that expected native wide characters), and
> that would hold on to the original bytes so that it could produce them on
> demand if encoded to the same encoding again. So, as others in this thread
> have mentioned, the 'ABC' really implies some stuff about C APIs as well.
> I'm not sure about the exact performance impact of such a class, which is
> why I'd like the ability to implement it *outside* of the stdlib and see how
> it works on a project, and return with a proposal along with some data.
> There are also different ways to implement this, and other optimizations
> (like ropes) which might be better.
> You can almost do this today, but the lack of things like the hypothetical
> "__rcontains__" does make it impossible to be totally transparent about it.
using what you thought was wrapped UTF-8 if it wasn't actually valid
UTF-8 (or you'd be worse off than in Python 2). So you're really just
worried about space consumption. I'd like to see a lot of hard memory
profiling data before I got overly worried about that.