Cult-like behaviour [was Re: Kindness]

Chris Angelico rosuav at
Sat Jul 14 23:38:01 EDT 2018

On Sun, Jul 15, 2018 at 12:55 PM, Steven D'Aprano
<steve+comp.lang.python at> wrote:
> On Sun, 15 Jul 2018 09:07:17 +1000, Chris Angelico wrote:
>> On Sun, Jul 15, 2018 at 8:15 AM, Marko Rauhamaa <marko at>
>> wrote:
>>> Chris Angelico <rosuav at>:
>>>> On Sun, Jul 15, 2018 at 5:54 AM, Marko Rauhamaa <marko at>
>>>> wrote:
>>>>> True enough. Modern-day protocols as well as Linux file formats and
>>>>> commands intentionally blur the line between strings and bytes. The
>>>>> software in question deals with all of the above. It is virtually
>>>>> impossible to keep track of what is "really" text and what is
>>>>> "really" binary.
> Of course we have no idea what Marko's software is, or what it is doing,
> but frankly that seems pretty implausible to me. On the face of it, it
> seems as ridiculous as the claim that he can't tell which variables are
> quote-unquote "really" lists of weights and which are lists of distances.
> On the face of things, this really sounds more like an admission that
> Marko is working with a shitty code base, not a fundamental problem with
> Python. But dealing with shitty code bases is the reality.

Fair point - but doesn't justify hating on Python 3 for making it
easier to work with good code than with bad code. I've had to work
with ridiculous data formats before (forty-ish lines of block comment
concluding with "Cthulhu's got nothing on a determined bank"), but
when that happens, I *know* that my code is being warped to fit the
requirements. It's not something to replicate elsewhere.

>>>>> In the end, the Gordian Knot was sliced by using
>>>>> Python3's strings for everything and restricting oneself to Latin-1
>>>>> codepoints (almost) everywhere.
> [...]
> I wonder whether Marko's Python 2.7 code base was ever actually tested
> with non-Latin1 text. I suspect that if Marko had (let's say) Japanese
> users expecting to use CJK characters in the application, his affection
> for the 2.7 version would be a lot less.

I very much doubt it has. He *restricted* to Latin-1, which means that
he threw away all the support Python offers, restricting to one
seventy-thousandth of the available characters, or a thousandth of the
allocated ones.

>> Text strings and sequences of bytes *are different*.
> At an implementation level, everything is bytes. People do so insist on
> conflating implementation with interface, even when they don't need to...

And at a different implementation level, everything is electrical signals.

> (Sometimes I think people should be required to implement algorithms on
> analogue computing devices before they're allowed to write code for
> digital computers, just to drive home the point that neither bytes nor
> bits are fundamental to computing, but are mere implementation details.)

Every week, I live-stream a workshop on data structures and
algorithms. (You're all most welcome to come by; it's Friday lunchtime
in the US, or Saturday early morning in Australia.) I use JavaScript
(because a lot of people know it), Python (because it's a really
expressive language), or a deck of cards. Have you ever seen
merge-sort implemented on a deck of cards? It's beautifully simple and
elegant. Interestingly, quick-sort looks very different from
merge-sort when implemented in C, but they're fairly similar when
implemented in cards.

Bytes? Bits? Arrays? If you want them, you have to first implement them.

> At a semantic level, byte strings and text strings represent
> fundamentally different things, as distinct as weights and lengths.

Or, as I keep running into when I try to mod Team Fortress 2, entities
and clients and users. They're all represented by the "int" data type,
and I have to spend an insane amount of effort trying to keep them
straight - does this function take a user or a client? Oh wait, this
isn't the user at all, it's the entity ID of that user's gun. But it's
still just an int... *sigh* SourcePawn (the language in question)
lacks a type system strong enough to handle this.

Having different data types for fundamentally different types of data
is not a weakness. It is a strength.

> One might as well say that bytes b'@=<\xed\x91hr\xb0' really is the
> number 29.238 and expect to multiple your name by 12.5 and get your
> height in seconds.


>>> No, as a large number of Python3 facilities require str objects as
>>> arguments. Consider urllib.request.urlopen(), for example, which
>>> requires a URL to be an str object.
> That's because URLs are fundamentally text strings.
> Quick quiz: which of the following are real URLs?
> (a)  http://правительство.рф
> (b)  http://παράδειγμα.δοκιμή
> (c)  http://실례.테스트
> (d)  All of the above.

I had to actually check two of those to be sure they really truly were
*real* URLs, not merely *correctly formatted* URLs. But yes, URLs are
fundamentally text. For hysterical raisins, DNS has some oddities to
it, so when you dive into how these are actually represented, the
Korean example is actually http://xn--9n2bp8q.xn--9t4b11yi5a - but I
don't believe there are any byte-based encodings involved. This is
encoding text using other text, where the encoded form uses an
extremely restricted alphabet (a-z 0-9 and hyphen).


More information about the Python-list mailing list