Python usage numbers

Steven D'Aprano steve+comp.lang.python at
Sun Feb 12 18:29:51 EST 2012

On Sun, 12 Feb 2012 17:27:34 -0500, Roy Smith wrote:

> In article <mailman.5739.1329084873.27778.python-list at>,
>  Chris Angelico <rosuav at> wrote:
>> On Mon, Feb 13, 2012 at 9:07 AM, Terry Reedy <tjreedy at> wrote:
>> > The situation before ascii is like where we ended up *before*
>> > unicode. Unicode aims to replace all those byte encoding and
>> > character sets with *one* byte encoding for *one* character set,
>> > which will be a great simplification. It is the idea of ascii applied
>> > on a global rather that local basis.
>> Unicode doesn't deal with byte encodings; UTF-8 is an encoding, but so
>> are UTF-16, UTF-32. and as many more as you could hope for. But broadly
>> yes, Unicode IS the solution.
> I could hope for one and only one, but I know I'm just going to be
> disapointed.  The last project I worked on used UTF-8 in most places,
> but also used some C and Java libraries which were only available for
> UTF-16.  So it was transcoding hell all over the place.

Um, surely the solution to that is to always call a simple wrapper 
function to the UTF-16 code to handle the transcoding? What do the Design 
Patterns people call it, a facade? No, an adapter. (I never remember the 

Instead of calling which only outputs UTF-16, write a 
wrapper myfoo() which calls foo, captures its output and transcribes to 
UTF-8. You have to do that once (per function), but now it works from 
everywhere, so long as you remember to always call myfoo instead of foo.

> Hopefully, we will eventually reach the point where storage is so cheap
> that nobody minds how inefficient UTF-32 is and we all just start using
> that.  Life will be a lot simpler then.  No more transcoding, a string
> will just as many bytes as it is characters, and everybody will be happy
> again.

I think you mean 4 times as many bytes as characters. Unless you have 32 
bit bytes :)


More information about the Python-list mailing list