[Python-Dev] Python3 "complexity" (was RFC: PEP 460: Add bytes...)
breamoreboy at yahoo.co.uk
Thu Jan 9 02:04:19 CET 2014
On 09/01/2014 00:12, Kristján Valur Jónsson wrote:
> Just to avoid confusion, let me state up front that I am very well aware of encodings and all that, having internationalized one largish app in python 2.x. I know the problems that 2.x had with tracking down the source of errors and understand the beautiful concept of encodings on the boundary.
> For a lot of data processing and tools, encoding isn't an issue. Either you assume ascii, or you're working with something like latin1. A single byte encoding. This is because you're working with a text file that _you_ wrote. And you're not assigning any semantics to the characters. If there is actual "text" in there it is just english, not Norwegian or Turkish. A byte read at code 0xfa doesn't mean anything special. It's just that, a byte with that value. The file system doesn't have any default encoding. A file on disk is just a file on disk consisting of bytes. There can never be any wrong encoding, no mojibake.
> With python 2, you can read that file into a string object. You can scan for your field delimiter, e.g. a comma, split up your string, interpolate some binary data, spit it out again. All without ever thinking about encodings.
> Even though the file is conceptually encoded in something, if you insist on attaching a particular semantic meaning to every ordinal value, whatever that meaning is is in many cases irrelevant to the program.
> I understand that surrogateescape allows you to do this. But it is an awkward extra step and forces an extra layer of needles semantics on to that guy that just wants to read a file. Sure, vegetarians and alergics like to read the list of ingredients on everything that they eat. But others are just omnivores and want to be able to eat whatever is on the table, and not worry about what it is made of.
> And yes, you can read the file in binary mode but then you end up with those bytes objects that we have just found that are tedious to work with.
All I can say is that I've been using python 3 for years and wouldn't
know what a surrogateescape was if you were to hit me around the head
with it. I open my files, I process them, and Python kindly closes them
for me via a context manager. So if you're not bothered about encoding,
where has the "awkward extra step and forces an extra layer of needles
semantics" bit come from?
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
More information about the Python-Dev