I asked Guido to provide comments on one of the chapters in our book: I was discussing appending the mode ("t" or "b") to the open() call
p.10, bottom: text mode is the default -- I've never seen the 't' option described! (So even if it exists, better be silent about it.) You need to append 'b' to get binary mode instead.
This brings up an interesting issue. MSVC exposes a global variable that contains the default mode - ie, you can change the default to binary. (_fmode for those with the docs) This has some implications and questions: * Will Guido ever bow to pressure (when it arrives :) to expose this via the "msvcrt" module? I can imagine where it may be useful in a limited context. A reasonable argument would be that, like _setmode and other MS specific stuff, if it exists it should be exposed. * But even if not, due to the shared CRTL, in COM and other worlds we really cant predict what the default is. Although Python does not touch it, that does not stop someone else touching it. A web-server built using MSVC on Windows may use it? Thus, it appears that to be 100% sure what mode you are using, you should not rely on the default, but should _always_ use "b" or "t" on the file mode. Any thoughts or comments? The case for abandoning the CRTL's text mode gets stronger and stronger! Mark.
[Mark Hammond]
... MSVC exposes a global variable that contains the default [fopen] mode - ie, you can change the default to binary. (_fmode for those with the docs)
This has some implications and questions: * Will Guido ever bow to pressure (when it arrives :) to expose this via the "msvcrt" module?
No. It changes the advertised semantics of Python builtins, and no option ever does that. If it went in at all, it would have to be exposed as a Python-level feature that changed the semantics similarly on all platforms -- and even then Guido wouldn't put it in <wink>.
... Thus, it appears that to be 100% sure what mode you are using, you should not rely on the default, but should _always_ use "b" or "t" on the file mode.
And on platforms that have libc options to treat "t" as if it were "b"? There's no limit to how perverse platform options can get! There's no fully safe ground to stand on, so Python stands on the minimal guarantees libc provides. If a user violates those, tough, they can't use Python. Unless, of course, they contribute a lot of money to the PSA <wink>.
... Any thoughts or comments? The case for abandoning the CRTL's text mode gets stronger and stronger!
C's text mode is, alas, a bad joke. The only thing worse is Microsoft's half-assed implementation of it <0.5 wink>. ctrl-z-=-eof-even-gets-in-the-way-under-windows!-ly y'rs - tim
I asked Guido to provide comments on one of the chapters in our book:
I was discussing appending the mode ("t" or "b") to the open() call
p.10, bottom: text mode is the default -- I've never seen the 't' option described! (So even if it exists, better be silent about it.) You need to append 'b' to get binary mode instead.
In addition, 't' probably isn't even supported on many Unix systems!
This brings up an interesting issue.
MSVC exposes a global variable that contains the default mode - ie, you can change the default to binary. (_fmode for those with the docs)
The best thing to do with this variable is to ignore it. In large programs like Python that link together pieces of code that never ever heard about each other, making global changes to the semantics of standard library functions is a bad thing. Code that sets it or requires you to set it is broken.
This has some implications and questions: * Will Guido ever bow to pressure (when it arrives :) to expose this via the "msvcrt" module? I can imagine where it may be useful in a limited context. A reasonable argument would be that, like _setmode and other MS specific stuff, if it exists it should be exposed.
No. (And I've never bought that argument before -- I always use "is there sufficient need and no other way.")
* But even if not, due to the shared CRTL, in COM and other worlds we really cant predict what the default is. Although Python does not touch it, that does not stop someone else touching it. A web-server built using MSVC on Windows may use it?
But would be stupid for it to do so, and I would argue that the web server was broken. Since they should know better than this, I doubt they do this (this option is more likely to be used in small, self-contained programs). Until you find a concrete example, let's ignore the possibility.
Thus, it appears that to be 100% sure what mode you are using, you should not rely on the default, but should _always_ use "b" or "t" on the file mode.
Stop losing sleep over it.
Any thoughts or comments? The case for abandoning the CRTL's text mode gets stronger and stronger!
OK, you write the code :-) --Guido van Rossum (home page: http://www.python.org/~guido/)
I was discussing appending the mode ("t" or "b") to the open() call
In addition, 't' probably isn't even supported on many Unix systems!
't' is not ANSI C, so there's no guarantee that it's portable. Hate to say it, but Python should really strip t out before passing a mode string to fopen!
Tim Peters wrote:
I was discussing appending the mode ("t" or "b") to the open() call
In addition, 't' probably isn't even supported on many Unix systems!
't' is not ANSI C, so there's no guarantee that it's portable. Hate to say it, but Python should really strip t out before passing a mode string to fopen!
Should we also filter the socket type when creating sockets? Or the address family? What if I pass "bamboozle" as the fopen mode? Should that become "bab" after filtering? Oh, but what about those two "b" characters? Maybe just reduce it to one? We also can't forget to filter chmod() arguments... can't have unknown bits set. etc etc In other words, I think the idea of "stripping out the t" is bunk. Python is not fatherly. It gives you the rope and lets you figure it out for yourself. You should know that :-) Cheers, -g -- Greg Stein, http://www.lyra.org/
[Tim]
't' is not ANSI C, so there's no guarantee that it's portable. Hate to say it, but Python should really strip t out before passing a mode string to fopen!
[Greg Stein]
Should we also filter the socket type when creating sockets? Or the address family?
Filtering 't' is a matter of increasing portability by throwing out an option that doesn't do anything on the platforms that accept it, yet can cause a program to die on platforms that don't -- despite that it says nothing. So it's helpful to toss it, not restrictive.
What if I pass "bamboozle" as the fopen mode? Should that become "bab" after filtering? Oh, but what about those two "b" characters?
Those go far beyond what I suggested, Greg. Even so <wink>, it would indeed help a great many non-C programmers if Python defined the mode strings it accepts & barfed on others by default. The builtin open is impossible for a non-C weenie to understand from the docs (as a frustrated sister delights in reminding me). It should be made friendlier. Experts can use a new os.fopen if they need to pass "bamboozle"; fine by me; I do think the builtins should hide as much ill-defined libc crap as possible (btw, "open" is unique in this respect).
Maybe just reduce it to one? We also can't forget to filter chmod() arguments... can't have unknown bits set.
I at least agree that chmod has a miserable UI <wink>.
etc etc
In other words, I think the idea of "stripping out the t" is bunk. Python is not fatherly. It gives you the rope and lets you figure it out for yourself. You should know that :-)
So should Mark -- but we have his testimony that, like most other people, he has no idea what's "std C" and what isn't. In this case he should have noticed that Python's "open" docs don't admit to "t"'s existence either, but even so I see no reason to take comfort in the expectation that he'll eventually be hanged for this sin. ypu-i'd-rather-"open"-died-when-passed-"t"-ly y'rs - tim
't' is not ANSI C, so there's no guarantee that it's portable. Hate to say it, but Python should really strip t out before passing a mode string to fopen!
OK - thanks all - it is clear that this MS aberration is not, and never will be supported by Python. Not being a standards sort of guy <wink> I must admit I assumed both the "t" and "b" were standards. Thanks for the clarifications! Mark.
[Mark]
I asked Guido to provide comments on one of the chapters in our book:
I was discussing appending the mode ("t" or "b") to the open() call
[Guido]
p.10, bottom: text mode is the default -- I've never seen the 't' option described! (So even if it exists, better be silent about it.) You need to append 'b' to get binary mode instead.
I hadn't either, until I made the mistake of helping Mr took-6-exchanges-before-he-used-the-right-DLL Embedder, who used it in his code. Certainly not mentioned in man fopen on my Linux box.
This brings up an interesting issue.
MSVC exposes a global variable that contains the default mode - ie, you can change the default to binary. (_fmode for those with the docs)
Mentally prepend another underscore. This is something for that other p-language.
... The case for abandoning the CRTL's text mode gets stronger and stronger!
If you're tying this in with Tim's Icon worship, note that in these days of LANS, the issue is yet more complex. It would be dandy if I could read text any old text file and have it look sane, but I may be writing it to a different machine without any way of knowing that. When I bother to manipulate these things, I usually choose to use *nix style text files. But I don't deal with Macs, and the only common Windows tool that can't deal with plain \n is Notepad. and-stripcr.py-is-everywhere-available-on-my-Linux-box-ly y'rs - Gordon
[Mark]
... The case for abandoning the CRTL's text mode gets stronger and stronger!
[Gordon]
If you're tying this in with Tim's Icon worship,
note that in these days of LANS, the issue is yet more complex. It would be dandy if I could read text any old text file and have it look sane, but I may be writing it to a different machine without any way of knowing
Icon inherits stdio behavior-- for the most part --too. It does define its own mode string characters, though (like "t" for translated and "u" for untranslated); Icon has been ported to platforms that can't even spell libc, let alone support it. that. So where's the problem? No matter *what* machine you end up on, Python could read the thing fine. Or are you assuming some fantasy world in which people sometimes run software other than Python <wink>? Caveat: give the C std a close reading. It guarantees much less about text mode than anyone who hasn't studied it would believe; e.g., text mode doesn't guarantee to preserve chars with the high bit set, or most control chars either (MS's treatment of CTRL-Z as EOF under text mode conforms to the std!). Also doesn't guarantee to preserve a line-- even if composed of nothing but printable chars --if it's longer than 509(!) characters. That's what I mean when I say stdio's text mode is a bad joke.
When I bother to manipulate these things, I usually choose to use *nix style text files. But I don't deal with Macs, and the only common Windows tool that can't deal with plain \n is Notepad.
I generally create text files in binary mode, faking the \n convention by hand. Of course, I didn't do this before I became a Windows Guy <0.5 wink>.
and-stripcr.py-is-everywhere-available-on-my-Linux-box-ly y'rs
A plug for my linefix.py (Python FTP contrib, under System), which converts among Unix/Windows/Mac in any direction (by default, from any to Unix). who-needs-linux-when-there's-a-python-in-the-window-ly y'rs - tim
participants (5)
-
Gordon McMillan
-
Greg Stein
-
Guido van Rossum
-
Mark Hammond
-
Tim Peters