Cult-like behaviour [was Re: Kindness]
python at bladeshadow.org
Mon Jul 16 21:52:51 EDT 2018
On Mon, Jul 16, 2018 at 08:56:11PM +0100, Rhodri James wrote:
> The problem everyone is having with you, Marko, is that you are
> using the terminology incorrectly. [...] When you call UTF-32 a
> variable-width encoding, you are incorrect.
But please don't overlook that the "terminology" is in fact rather
specialized jargon, far less common than even most computer jargon.
Unless you're uncommonly familiar with the subject matter, you simply
don't have this vocabulary. Under the circumstances it seems not
horribly unreasonable to expect such a person to consider the bytes
required to represent a glyph as an encoding's width, and you as
"experts" rightly should expect, let's call them lay people, to make
this mistake and adjust for it, or politely correct it, without the
> You are of course welcome to use whatever terminology you personally
> like, like Humpty Dumpty. However when you point to a duck and say
> "That's a gnu," people are likely to stop taking you seriously.
Shouldn't experts "be generous in what they accept, but conservative
in what they emit?" If your goal here is to educate, and come to a
common understanding, rather than to simply prove how superior (the
generic) you are, then perhaps both you and the community would be
better served if you strived to understand Marko's points, rather than
just point out how horribly wrong he is? The tone here is often
extremely adversarial, which I think mostly serves to incite others to
respond adversarialy. I certainly know I've fallen into that trap
more than once, myself.
I work primarily in Unix environments, and I daresay the way Unix
treats text as bytes--barring certain very specialized applications,
which require knowledge of what bytes correspond to what units of
linguistic representations, like reversing strings (which FWIW I've
never found a use for, other than academic ones)--works just fine.
You can--and I do (or have, at least)--write non-ASCII unicode strings
as bytes in your Python-2.7 code, or read them from a file, or
whatever other input your program desires, and send them to whatever
terminal or GUI program you want to, and they will appear as they
should to the user, provided the system is configured appropriately
(which these days mostly means configured to use UTF-8, and which
these days is generally the case).
It's reasonable to assume users either know what encoding their
systems are using, or don't have a clue but won't change it, so it
will always be "right." And if the system is configured correctly,
and you sensibly used UTF-8 encoded byte strings in your program, but
the system is configured in some other encoding, it's a fairly trivial
matter to use iconv to convert to the system's encoding (which I have
also done, but perhaps not in Python--I can't recall), assuming the
data can be converted (and if not you're kinda screwed anyway). In
the overwhelming majority of cases, this gets you everything you need,
and the language internally understanding Unicode (especially if that
understanding requires more work from the programmer to deal with it)
mostly gets you very little. Yes, of course there are specific
applications for which that intelligence is neccessary, and in those
cases it should be made use of. The rest of the time--the
overwhelming majority of the time--it's just superfluous complexity.
So, sure, in uncommon cases knowing about Unicode may reduce (but not
eliminate) complications dealing with different languages, but in the
common cases it may only serve to make more work for the programmer.
I don't know about you, but I prefer to do less, if less is required.
If these features exist because Windows needs them in order to
reliably get the common cases right, then maybe, just maybe, Unix
really did get it right after all.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
More information about the Python-list