BTW, is the memory burden really such a big argument these days ?
You should consider that malloc overhead is often 16 bytes per object. Given that PyUnicodeObject is 24 bytes in 2.2, system malloc will allocate 48 bytes per Unicode object on modern architectures. I would think 100% overhead *is* a big argument.
If you relate this to the actual data, it gets worse: A Unicode string of length 1 would still require 32 bytes on an allocator that aligns to 16. Therefore, to store 2 bytes of real data, you need 80 bytes of memory.
I don't know how much overhead pymalloc adds, though; I believe it is significantly less expensive.