Python 3 is killing Python
steve+comp.lang.python at pearwood.info
Wed Jul 16 14:10:16 CEST 2014
On Wed, 16 Jul 2014 13:46:45 +0300, Marko Rauhamaa wrote:
> Python 3 really is on a mission to elevate text into the mainstream at
> the expense of bytes. I'm guessing this is done primarily to promote the
> cross-platform transparency of Python code.
Ahead of bytes? Possibly. At the expense of bytes? Certainly not. If
there is anything that you cannot conveniently do with bytes, that you
could do in Python 2, it's likely a bug, or at least an obviously missing
feature. The core devs recognise that they missed some use-cases (e.g.
mixed bytes and text) which is now harder than it should be, and are on a
mission to rectify that as much as possible within the constraints of
E.g. having b"abc" return 97 instead of b"a" was probably a mistake,
but there are four versions of Python 3.x that do it that way and it's
too late to change until Python 5000. (Python 4 is unlikely to break
backwards compatibility in a big way.)
> For me, a linux system and network programmer, that layer of frosting
> only gets in my way and I need to wash it off.
Linux, like all Unixes, is primarily a text-based platform. With a few
exceptions, /etc is filled with text files, not binary files, and half
the executables on the system are text (Python, Perl, bash, sh, awk,
To say that *dealing with text* gets in your way on a Linux system is
rather like saying that you love Mac OS X except for its gosh-awful GUI
Of course, as a network programmer, you have to deal with bytes, so I'll
give you a bit of leeway.
>> Most programming languages I know of default to opening files in text
>> mode, not binary mode, and I don't see any strong reason for Python to
>> go against the tide there.
> In unix and linux, there never was a separate text mode for files. When
> you open a file, you open a file -- and stuff bytes in it. There is no
> commonly accepted text file encoding. UTF-8 comes close to being a
> standard, but I know somebody who sticks to an ISO-8859-1 locale.
And they should be dragged out into the street and beaten with a Clue
Stick. They're the sort of people who are holding us back from the
shining utopia of UTF-8 everywhere!
(only half joking)
But seriously, I cannot imagine any *rational* reason for using a legacy
encoding, but I'm willing to give this person the benefit of the doubt
that he's not a raving lunatic or old West European-centric curmudgeon
trying to deny the existence of the rest of the world.
That being the case, then good luck to him. As far as everyone else:
>> Having len('λ') == 1 is not an advanced text processing feature.
> There are (relative rare) occasions where you'd like to treat text as
Relatively rare. Like, um, email, news, html, Unix config files, Windows
ini files, source code in just about every language ever, SMSes, XML,
JSON, YAML, instant messenger apps, word processors... even *graphic*
applications invariably have a text tool. Now, it may be true that some
of those things may not use text under the hood, but even so, text is
Even binary protocols often include chunks of recognisable human-readable
text in them:
[steve at ando Pictures]$ hexdump -n 64 -C picture.jpg
00000000 ff d8 ff e0 00 10 4a 46 49 46 00 01 01 00 00 01 |......JFIF......|
00000010 00 01 00 00 ff e2 0f 38 49 43 43 5f 50 52 4f 46 |.......8ICC_PROF|
00000020 49 4c 45 00 01 01 00 00 0f 28 61 70 70 6c 02 10 |ILE......(appl..|
00000030 00 00 6d 6e 74 72 52 47 42 20 58 59 5a 20 07 de |..mntrRGB XYZ ..|
> Then, it's nice to be able to move the data on the operating table
> with .decode() and when the patient has been sewn back together, you can
> release them with .encode().
> More often, len(b'λ') is what I want.
Oh really? Are you sure? What exactly is b'λ'?
I couldn't have made up a better example of the confusion between bytes
and text if I had tried. Thank you.
More information about the Python-list