Cult-like behaviour [was Re: Kindness]
steve+comp.lang.python at pearwood.info
Sat Jul 14 22:55:27 EDT 2018
On Sun, 15 Jul 2018 09:07:17 +1000, Chris Angelico wrote:
> On Sun, Jul 15, 2018 at 8:15 AM, Marko Rauhamaa <marko at pacujo.net>
>> Chris Angelico <rosuav at gmail.com>:
>>> On Sun, Jul 15, 2018 at 5:54 AM, Marko Rauhamaa <marko at pacujo.net>
>>>> True enough. Modern-day protocols as well as Linux file formats and
>>>> commands intentionally blur the line between strings and bytes. The
>>>> software in question deals with all of the above. It is virtually
>>>> impossible to keep track of what is "really" text and what is
>>>> "really" binary.
Of course we have no idea what Marko's software is, or what it is doing,
but frankly that seems pretty implausible to me. On the face of it, it
seems as ridiculous as the claim that he can't tell which variables are
quote-unquote "really" lists of weights and which are lists of distances.
On the face of things, this really sounds more like an admission that
Marko is working with a shitty code base, not a fundamental problem with
Python. But dealing with shitty code bases is the reality.
>>>> In the end, the Gordian Knot was sliced by using
>>>> Python3's strings for everything and restricting oneself to Latin-1
>>>> codepoints (almost) everywhere.
I wonder whether Marko's Python 2.7 code base was ever actually tested
with non-Latin1 text. I suspect that if Marko had (let's say) Japanese
users expecting to use CJK characters in the application, his affection
for the 2.7 version would be a lot less.
>> What I'm saying is that I'm using Python3
>> strings as holders for bytes. Since every byte is a valid Unicode code
>> point, a Python3 string can hold any sequence of bytes.
> Since every byte is also a valid IEEE 754 64-bit binary floating point
> value, a sequence of floats can hold any sequence of bytes, too. Is it a
> good idea to use floats to represent bytes?
3.6e-322 1.6e-322 4.8e-322 5.1e-322 5.63e-322 5e-322 5e-322 1.63e-322
> Text strings and sequences of bytes *are different*.
At an implementation level, everything is bytes. People do so insist on
conflating implementation with interface, even when they don't need to...
(Sometimes I think people should be required to implement algorithms on
analogue computing devices before they're allowed to write code for
digital computers, just to drive home the point that neither bytes nor
bits are fundamental to computing, but are mere implementation details.)
At a semantic level, byte strings and text strings represent
fundamentally different things, as distinct as weights and lengths.
Unfortunately, due to the long influence of ASCII in computing, a lot of
people have internalised that "byte 0x41 *really is* the letter A" when
that's just a mere encoding convention. You wouldn't add 5kg to 5cm and
expect to get a meaningful result, but people expect to combine bytes and
text and "just make it work".
One might as well say that bytes b'@=<\xed\x91hr\xb0' really is the
number 29.238 and expect to multiple your name by 12.5 and get your
height in seconds.
>> Couldn't you use bytes objects everywhere for the same purpose?
>> Yes and no.
>> Yes, but it would be ugly as hell and would involve changing a large
>> percentage of the source code.
It would also require re-inventing the entire Unicode infrastructure
already provided -- unless you intended to just say No to 99% of human
languages in the world, including English, in favour of restricting
everyone, including English speakers, to an artificial subset of the
characters they use in real life.
(Even Latin1 doesn't cover all the English punctuation marks I expect to
be able to use in text.)
It's not 1970 any more. Under what circumstances is that acceptable?
>> No, as a large number of Python3 facilities require str objects as
>> arguments. Consider urllib.request.urlopen(), for example, which
>> requires a URL to be an str object.
That's because URLs are fundamentally text strings.
Quick quiz: which of the following are real URLs?
(d) All of the above.
> Well, duh. It also doesn't accept a list of floats, just because you
> COULD represent a text string that way.
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson
More information about the Python-list