[Python-Dev] bytes / unicode

Wed Jun 23 02:34:31 CEST 2010

On Jun 22, 2010, at 7:23 PM, Ian Bicking wrote:

> This is a place where bytes+encoding might also have some benefit.  XML is someplace where you might load a bunch of data but only touch a little bit of it, and the amount of data is frequently large enough that the efficiencies are important.

Different encodings have different characteristics, though, which makes them amenable to different types of optimizations.  If you've got an ASCII string or a latin1 string, the optimizations of unicode are pretty obvious; if you've got one in UTF-16 with no multi-code-unit sequences, you could also hypothetically cheat for a while if you're on a UCS4 build of Python.

I suspect the practical problem here is that there's no CharacterString ABC in the collections module for third-party libraries to provide their own peculiarly-optimized implementations that could lazily turn into real 'str's as needed.  I'd volunteer to write a PEP if I thought I could actually get it done :-\.  If someone else wants to be the primary author though, I'll try to help out.