Unicode [was Re: Cult-like behaviour]

Jim Lee jlee54 at gmail.com
Mon Jul 16 14:43:27 EDT 2018

On 07/16/18 11:31, Steven D'Aprano wrote:
> On Mon, 16 Jul 2018 10:27:18 -0700, Jim Lee wrote:
>> Had you actually read my words with *intent* rather than *reaction*, you
>> would notice that I suggested the *option* of turning off Unicode.
> Yes, I know what you wrote, and I read it with intent.
> Jim, you seem to be labouring under the misapprehension that anytime
> somebody spots a flaw in your argument, or an unpleasant implication of
> your words, it can only be because they must not have read your words
> carefully. Believe me, that is not the case.
> YOU are the one who raised the specter of politically correct groupthink,
> not me. That's dog-whistle politics. But okay, let's move on from that.
> You say that all you want is a switch to turn off Unicode (and replace it
> with what? Kanji strings? Cyrillic? Shift_JS? no of course not, I'm being
> absurd -- replace it with ASCII, what else could any right-thinking
> person want, right?). Let's look at this from a purely technical
> perspective:
> Python already has two string data types, bytes and text. You want
> something that is almost functionally identical to bytes, but to call it
> text, presumably because you don't want to have to prefix your strings
> with a b"" (that was also Marko's objection to byte strings).
> Let's say we do it. Now we have three string implementations that need to
> be added, documented, tested, maintained, instead of two.
> (Are you volunteering to do this work?)
> Now we need to double the testing: every library needs to be tested
> twice, once with the "Unicode text" switch on, once with it off, to
> ensure that features behave as expected in the appropriate mode.
> Is this switch a build-time option, so that we have interpreters built
> with support for Unicode and interpreters built without it? We've been
> there: it's a horribly bad idea. We used to have Python builds with
> threading support, and others without threading support. We used to have
> Python builds with "wide Unicode" and others with "narrow Unicode".
> Nothing good comes of this design.
> Or perhaps the switch is a runtime global option?
> Surely you can imagine the opportunities for bugs, both obvious crashing
> bugs and non-obvious silent failure bugs, that will occur when users run
> libraries intended for one mode under the other mode. Not every library
> is going to be fully tested under both modes.
> Perhaps it is a compile-time option that only affects the current module,
> like the __future__ imports. That's a bit more promising, it might even
> use the __future__ infrastructure -- but then you have the problem of
> interaction between modules that have this switch enabled and those that
> have it disabled.
> More complexity, more cruft, more bugs.
> It's not clear that your switch gives us *any* advantage at all, except
> the warm fuzzy feelings that no dirty foreign characters might creep into
> our pure ASCII strings. Hmm, okay, but frankly apart from when I copy and
> paste code from the internet and it ends up bringing in en-dashes and
> curly quotes instead of hyphens and type-writer quotes, that never
> happens to me by accident, and I'm having a lot of trouble seeing how it
> could.
> If you want ASCII byte strings, you have them right now -- you just have
> to use the b"" string syntax.
> If you want ASCII strings without the b prefix, you have them right now.
> Just use only ASCII characters in your strings.
> I'm simply not seeing the advantage of:
>      from __future__ import no_unicode
>      print("Hello World!")  # stand in for any string handling on ASCII
> over
>      print("Hello World!")
> which works just as well if you control the data you are working with and
> know that it is pure ASCII.

Had you spoken this way from the start instead of ridiculing and name 
calling, perhaps we could have reached an agreement.

However, the point is moot, as I have unsubscribed from the list. The 
conversations here (especially yours) are too condescending to waste 
more time with.

More information about the Python-list mailing list