[Python-ideas] Python 3000 TIOBE -3%

Greg Ewing greg.ewing at canterbury.ac.nz
Thu Feb 16 02:37:12 CET 2012


On 16/02/12 02:39, Oleg Broytman wrote:
> On Wed, Feb 15, 2012 at 11:15:36AM +1100, Ben Finney wrote:
>> If people want to remain wilfully ignorant of text encoding in the third
>> millennium
>
>     This returns us to the very beginning of the thread. The original
> complain was: Python3 requires users to learn too much about unicode,
> more than they really need.

I don't think it's helpful to label everyone who wants to use the
techniques being discussed here as lazy or ignorant. As we've seen,
there are cases where you truly *can't* know the true encoding,
and at the same time it *doesn't matter*, because all you want to
do is treat the unknown bytes as opaque data. To tell someone in
that position that they're being lazy is both wrong and insulting.

It seems to me that what surrogateescape is effectively doing is
creating a new data type that consists of a mixture of ASCII
characters and raw bytes, and enables you to tell which is which.

Maybe there should be a real data type like this, or a flag on
the unicode type. The data would be stored in the same way as a
latin1-decoded string, but anything with the high bit set would
be regarded as a byte instead of a character. This might make it
easier to interoperate with external libraries that expect
well-formed unicode.

-- 
Greg



More information about the Python-ideas mailing list