Python usage numbers
steve+comp.lang.python at pearwood.info
Mon Feb 13 00:29:51 CET 2012
On Sun, 12 Feb 2012 17:27:34 -0500, Roy Smith wrote:
> In article <mailman.5739.1329084873.27778.python-list at python.org>,
> Chris Angelico <rosuav at gmail.com> wrote:
>> On Mon, Feb 13, 2012 at 9:07 AM, Terry Reedy <tjreedy at udel.edu> wrote:
>> > The situation before ascii is like where we ended up *before*
>> > unicode. Unicode aims to replace all those byte encoding and
>> > character sets with *one* byte encoding for *one* character set,
>> > which will be a great simplification. It is the idea of ascii applied
>> > on a global rather that local basis.
>> Unicode doesn't deal with byte encodings; UTF-8 is an encoding, but so
>> are UTF-16, UTF-32. and as many more as you could hope for. But broadly
>> yes, Unicode IS the solution.
> I could hope for one and only one, but I know I'm just going to be
> disapointed. The last project I worked on used UTF-8 in most places,
> but also used some C and Java libraries which were only available for
> UTF-16. So it was transcoding hell all over the place.
Um, surely the solution to that is to always call a simple wrapper
function to the UTF-16 code to handle the transcoding? What do the Design
Patterns people call it, a facade? No, an adapter. (I never remember the
Instead of calling library.foo() which only outputs UTF-16, write a
wrapper myfoo() which calls foo, captures its output and transcribes to
UTF-8. You have to do that once (per function), but now it works from
everywhere, so long as you remember to always call myfoo instead of foo.
> Hopefully, we will eventually reach the point where storage is so cheap
> that nobody minds how inefficient UTF-32 is and we all just start using
> that. Life will be a lot simpler then. No more transcoding, a string
> will just as many bytes as it is characters, and everybody will be happy
I think you mean 4 times as many bytes as characters. Unless you have 32
bit bytes :)
More information about the Python-list