Python is a cross-platform language, but I often find myself writing sections specific for Windows and for Linux and sometimes even OS setting specific code. In these moments I that Python is not more cross-platform that C, for example. What could be done? Normalized Python - a set of default, standard behaviors that backup common user expectations about cross-platform and system-independent behavior regardless of backward compatibility and code compatibility concerns. This is needed, for example, to collect these two features: 1. open files in binary mode by default why? because "text file" is a human abstraction, for operating system it is just another format of binary data, so default operation is to read this data without any preprocessing 2. open text files in utf-8 encoding why? because users can not know the encoding of operating system, their programs can not choose right encoding, therefore a best guess is to expect the most widely used standard 3. threat stdout/stdin streams as binary why? because you don't want you data to be corrupt when you pass it in and out of Python via standard streams Having a separate "Normalized Python" concept is needed to set the context for developing and engineering ideas, instead of concentrating on the sad reality of backward compatibility curse. -- anatoly t.
On 29/01/2014 09:11, anatoly techtonik wrote:
Python is a cross-platform language, but I often find myself writing sections specific for Windows and for Linux and sometimes even OS setting specific code. In these moments I that Python is not more cross-platform that C, for example.
What could be done?
Normalized Python - a set of default, standard behaviors that backup common user expectations about cross-platform and system-independent behavior regardless of backward compatibility and code compatibility concerns.
This is needed, for example, to collect these two features: 1. open files in binary mode by default why? because "text file" is a human abstraction, for operating system it is just another format of binary data, so default operation is to read this data without any preprocessing
2. open text files in utf-8 encoding why? because users can not know the encoding of operating system, their programs can not choose right encoding, therefore a best guess is to expect the most widely used standard
3. threat stdout/stdin streams as binary why? because you don't want you data to be corrupt when you pass it in and out of Python via standard streams
Having a separate "Normalized Python" concept is needed to set the context for developing and engineering ideas, instead of concentrating on the sad reality of backward compatibility curse.
I support what Chris Angelico has said on another thread, fork Python and if it's good enough everybody will flock to it. This also avoids the problem with the CLA. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
On Wed, Jan 29, 2014 at 8:11 PM, anatoly techtonik
Normalized Python - a set of default, standard behaviors that backup common user expectations about cross-platform and system-independent behavior regardless of backward compatibility and code compatibility concerns.
Having a separate "Normalized Python" concept is needed to set the context for developing and engineering ideas, instead of concentrating on the sad reality of backward compatibility curse.
You can achieve the first two simply by opening files with parameters. There is NOTHING Windows-specific or Linux-specific in that. As of Python 3, opening in text mode is the default... but you can override that so easily. Why change the default (which breaks back compat) when you can just change your code? And I believe you can reopen stdin/stdout as binary, if you really want to, but that is a little harder. It's still not going to have any platform-specific code in it. (As I've never written a filter for binary files in Python, I've never had the need to read/write standard streams in binary. But I've no doubt that someone who has can show you how easy it is - I'd guess it's less than five lines of code, knowing Python.)
This is needed, for example, to collect these two features:
(Among our features are such diverse elements as... oh, wrong Pythons.)
1. open files in binary mode by default why? because "text file" is a human abstraction, for operating system it is just another format of binary data, so default operation is to read this data without any preprocessing
A reasonably plausible argument. C++ follows that sort of model (you shouldn't pay for anything you're not using). SQL mostly follows that model (it generally takes more keywords to get the database to do more work - compare "SELECT x FROM y" and "SELECT x FROM y ORDER BY z", where the latter adds a sort phase; there are exceptions to this, like UNION ALL vs UNION, but they're notable _because_ they're exceptions). But it's nothing like a strong enough argument for changing. Creating two subtly different languages is a major problem, especially when the exact same syntax means different things. Imagine if I create a fork of Python that's absolutely identical except that you create a set with [1,2,3] and a list with {1,2,3}. All your code will be syntactically correct, but suddenly it does something quite different. That is a BAD idea. It would have to be *immensely* better to justify the breakage; and this is only "arguably better". (The most obvious contrary argument is that the default should do the thing most people want most often, which is working with text files. This same argument justifies the use of arbitrary-precision integers by default, instead of requiring an explicit "long" type; I'm sure you'll agree that the Py3 unification of these types was an advantage.)
2. open text files in utf-8 encoding why? because users can not know the encoding of operating system, their programs can not choose right encoding, therefore a best guess is to expect the most widely used standard
Yes, this one is an issue. Python lets the OS recommend a default encoding, on the expectation that a Python script should fit into its host platform, rather than that all platforms should conform to what Python wants. A judgment call, and I'm sure there can be endless debates about what Python should do, but since it can be overridden with a single parameter on the open call, not a big deal IMO.
3. threat stdout/stdin streams as binary why? because you don't want you data to be corrupt when you pass it in and out of Python via standard streams
Most definitely NOT. The standard streams should, by default, be text streams, and should have their encodings set according to what the other side wants. If there's a way for the OS and Python to communicate an encoding, that's absolutely perfect. Yes, there'll be a few edge cases involving redirection, but that's pretty much unsolvable anyway. The normal usage of Python MUST include Unicode; and that means the most obvious way to produce output (the print function) needs to write Unicode. So if stdout is a binary stream, what's print going to do with a str? Encode it? If so, you just move the issue - and print can send to multiple streams, so it'd need to know which are text and which are binary, etc, etc. Or should it throw an error, and force the programmer to do stuff like this: CONSOLE_ENCODING = "utf-8" # add some logic for guessing this s = "Hello, world!" print(s.encode(CONSOLE_ENCODING)) just to ensure that every programmer has to battle with the encodings manually, in lots of places, instead of configuring it once (or, more likely, having the default be right) and then having clean code everywhere? The only way that opening stdin/out as binary will prevent the corruption of your data is if your data is fundamentally bytes. Most programs, in any language, work with data that's fundamentally text; granted, a lot of languages don't distinguish, but if you look at what the programmer's doing, it's still text. Anything that prints "Hello, world!" is printing text, not bytes, and if the console's encoding is UTF-16, that should emit 26 bytes (plus any newline that's appropriate). Forcing the programmer to think about this is completely unnecessary. How many times do you actually come across these issues in porting? How much effort would you really save if these measures were implemented? If it's that important to you, fork CPython and create this "Normalized Python" that does everything you want (and then, linking this with the other thread, continue development of Normalized Python according to an Agile model and see if people join you rather than CPython). Good luck. ChrisA
Chris, I pretty much agree with you, but there are two major additional points you didn't mention.
On Jan 29, 2014, at 6:33, Chris Angelico
On Wed, Jan 29, 2014 at 8:11 PM, anatoly techtonik
wrote: 3. threat stdout/stdin streams as binary why? because you don't want you data to be corrupt when you pass it in and out of Python via standard streams
Most definitely NOT. The standard streams should, by default, be text streams, and should have their encodings set according to what the other side wants.
Note that when the other side is a Windows console, what it _really_ wants is for you not to use stdio, but to instead use the separate UTF-16-specific console APIs. Fitting this into Python 3's cross-platform io model is a bit challenging, and not yet done, but certainly doable. (It's been discussed multiple times, both on this list and elsewhere.) Fitting this into a Python 2-style io model as Anatoly suggests is completely impossible. Instead, every single program would have to either check that stdout.isatty and platform is Windows and explicitly use something other than stdout, or figure out the console encoding (which is hard to do from inside Python if you take away the stdout.encoding that Python provides for the text stdout today) and explicitly encoding every string to be printed. There's also the fact that the print function implicitly converts everything to a str for you, which wouldn't do any good if stdout were a binary file. Unlike Python 2, Python 3 has no way to convert arbitrary objects to bytes strings, which means you would need a mandatory encoding keyword arg on every call to print that took any args that weren't bytes-compatible. Between these two issues, the proposal would effectively give Python 3 all of the stdio/print problems that Python 2 had, and more, without any of Python 2's partial solutions to those problems.
On Thu, Jan 30, 2014 at 4:24 AM, Andrew Barnert
Note that when the other side is a Windows console, what it _really_ wants is for you not to use stdio, but to instead use the separate UTF-16-specific console APIs.
Fitting this into Python 3's cross-platform io model is a bit challenging, and not yet done, but certainly doable. (It's been discussed multiple times, both on this list and elsewhere.)
In the theoretical ideal, all that should be buried within the definition of the print function (or what it calls on). I should be able to write a program that says: print("Copyright © 2014 My Name") even if my name includes non-ASCII, even non-BMP, characters; and that program should produce that output in whatever way is appropriate to the platform. (If it's running on a printer, that should produce a hard copy.) Now, maybe that ideal can't be attained, due to some platforms' limitations or stupidity, and clean code is of value too, but certainly the notion of "write a Unicode string to the most obvious place of output" is one that ought *conceptually* to be supported equally on all platforms, without my having to figure out one from another. Obviously if your terminal expects one encoding but announces another, there's going to be a mess. The theoretical ideal works only when negotiations are done properly. But again, that's outside of Python; and if the next version of SomeWeirdOS introduces a new means of announcing its console encoding, it should simply be a matter of coding that into Python, *not* into every single script. ChrisA
On Wed, Jan 29, 2014, at 12:24, Andrew Barnert wrote:
Fitting this into a Python 2-style io model as Anatoly suggests is completely impossible. Instead, every single program would have to either check that stdout.isatty
As a sidenote, isatty is broken on windows: it considers NUL to be a tty. This is because it wraps a C function which in MSVC has the same flaw.
participants (5)
-
anatoly techtonik
-
Andrew Barnert
-
Chris Angelico
-
Mark Lawrence
-
random832@fastmail.us