Hello everyone I'm currently working on a reimplementation of io.FileIO, which would allow cross-platform file range locking and all kinds of other safety features ; however I'm slightly stuck due to some specification fuzziness in the IO docs. CF http://bugs.python.org/issue6939 The main points that annoy me at the moment : - it is unclear what truncate() methods do with the file pointer, and even if the current implementation simply moves it to the truncation point, it's very contrary to the standard way of doing under unix, where the file pointer is normally left unchanged. Shouldn't we specify that the file pointer remains unmoved, and fix the _fileio module accordingly ? - exceptions are not always specified, and even if most of them are IOErrors, weirdly, in some cases, an OSError is raised instead (ie, if we try to wrap a wrong file descriptor when instanciating a new FileIO). This might lead to bad program crashes if some people don't "refuse the temptation to guess" and only get prepared to catch IOErrors - the doc sometimes says that when we receive an empty string from a read() operation, without exceptions, it means the file is empty. However, with the current implementation, if we call file.read(0), we simply receive "", even though it doesn't mean that we're at EOF. Shouldn't we avoid this (rare, I admit) ambiguity on the return value, by preventing read(0) ? Or at least, note in the doc that (we receive an empty string) <-> (the file is at EOF OR we called read with 0 as parameter) ? Are there some arguments that I don't know, which lead to this or that particular implementation choice ? I'd strongly advocate very detailled specifications, letting no room for cross-platform subtilities (that's also a strong goal of my reimplemntation), since that new IO system (which saved me a lot of coding time, by the way) should become the base of many programs. So wouldn't it be a godo idea to write some kind of mini-pep, just to fix the corner cases of the current IO documentation ? I might handle it, if no more-knowledgeable people feels like it. Regards, Pascal
Why not propose an update to the existing PEP to clarify the vaguenesses?
On Fri, Sep 18, 2009 at 12:17 PM, Pascal Chambon
Hello everyone
I'm currently working on a reimplementation of io.FileIO, which would allow cross-platform file range locking and all kinds of other safety features ; however I'm slightly stuck due to some specification fuzziness in the IO docs. CF http://bugs.python.org/issue6939
The main points that annoy me at the moment : - it is unclear what truncate() methods do with the file pointer, and even if the current implementation simply moves it to the truncation point, it's very contrary to the standard way of doing under unix, where the file pointer is normally left unchanged. Shouldn't we specify that the file pointer remains unmoved, and fix the _fileio module accordingly ? - exceptions are not always specified, and even if most of them are IOErrors, weirdly, in some cases, an OSError is raised instead (ie, if we try to wrap a wrong file descriptor when instanciating a new FileIO). This might lead to bad program crashes if some people don't "refuse the temptation to guess" and only get prepared to catch IOErrors - the doc sometimes says that when we receive an empty string from a read() operation, without exceptions, it means the file is empty. However, with the current implementation, if we call file.read(0), we simply receive "", even though it doesn't mean that we're at EOF. Shouldn't we avoid this (rare, I admit) ambiguity on the return value, by preventing read(0) ? Or at least, note in the doc that (we receive an empty string) <-> (the file is at EOF OR we called read with 0 as parameter) ?
Are there some arguments that I don't know, which lead to this or that particular implementation choice ? I'd strongly advocate very detailled specifications, letting no room for cross-platform subtilities (that's also a strong goal of my reimplemntation), since that new IO system (which saved me a lot of coding time, by the way) should become the base of many programs.
So wouldn't it be a godo idea to write some kind of mini-pep, just to fix the corner cases of the current IO documentation ? I might handle it, if no more-knowledgeable people feels like it.
Regards, Pascal _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
Pascal Chambon wrote:
Hello everyone
I'm currently working on a reimplementation of io.FileIO, which would allow cross-platform file range locking and all kinds of other safety features ; however I'm slightly stuck due to some specification fuzziness in the IO docs. CF http://bugs.python.org/issue6939
The main points that annoy me at the moment : - it is unclear what truncate() methods do with the file pointer, and even if the current implementation simply moves it to the truncation point, it's very contrary to the standard way of doing under unix, where the file pointer is normally left unchanged. Shouldn't we specify that the file pointer remains unmoved, and fix the _fileio module accordingly ?
I think that this should be an invariant: 0 <= file pointer <= file size so the file pointer might sometimes have to be moved.
- exceptions are not always specified, and even if most of them are IOErrors, weirdly, in some cases, an OSError is raised instead (ie, if we try to wrap a wrong file descriptor when instanciating a new FileIO). This might lead to bad program crashes if some people don't "refuse the temptation to guess" and only get prepared to catch IOErrors - the doc sometimes says that when we receive an empty string from a read() operation, without exceptions, it means the file is empty. However, with the current implementation, if we call file.read(0), we simply receive "", even though it doesn't mean that we're at EOF. Shouldn't we avoid this (rare, I admit) ambiguity on the return value, by preventing read(0) ? Or at least, note in the doc that (we receive an empty string) <-> (the file is at EOF OR we called read with 0 as parameter) ?
If you ask for 0 bytes then you get 0 bytes.
Are there some arguments that I don't know, which lead to this or that particular implementation choice ? I'd strongly advocate very detailled specifications, letting no room for cross-platform subtilities (that's also a strong goal of my reimplemntation), since that new IO system (which saved me a lot of coding time, by the way) should become the base of many programs.
So wouldn't it be a godo idea to write some kind of mini-pep, just to fix the corner cases of the current IO documentation ? I might handle it, if no more-knowledgeable people feels like it.
[Oops! Hit Send to soon] Pascal Chambon wrote:
Hello everyone
I'm currently working on a reimplementation of io.FileIO, which would allow cross-platform file range locking and all kinds of other safety features ; however I'm slightly stuck due to some specification fuzziness in the IO docs. CF http://bugs.python.org/issue6939
The main points that annoy me at the moment : - it is unclear what truncate() methods do with the file pointer, and even if the current implementation simply moves it to the truncation point, it's very contrary to the standard way of doing under unix, where the file pointer is normally left unchanged. Shouldn't we specify that the file pointer remains unmoved, and fix the _fileio module accordingly ?
I think that this should be an invariant: 0 <= file pointer <= file size so the file pointer might sometimes have to be moved.
- exceptions are not always specified, and even if most of them are IOErrors, weirdly, in some cases, an OSError is raised instead (ie, if we try to wrap a wrong file descriptor when instanciating a new FileIO). This might lead to bad program crashes if some people don't "refuse the temptation to guess" and only get prepared to catch IOErrors - the doc sometimes says that when we receive an empty string from a read() operation, without exceptions, it means the file is empty. However, with the current implementation, if we call file.read(0), we simply receive "", even though it doesn't mean that we're at EOF. Shouldn't we avoid this (rare, I admit) ambiguity on the return value, by preventing read(0) ? Or at least, note in the doc that (we receive an empty string) <-> (the file is at EOF OR we called read with 0 as parameter) ?
If you ask for 0 bytes then you get 0 bytes.
Are there some arguments that I don't know, which lead to this or that particular implementation choice ? I'd strongly advocate very detailled specifications, letting no room for cross-platform subtilities (that's also a strong goal of my reimplemntation), since that new IO system (which saved me a lot of coding time, by the way) should become the base of many programs.
So wouldn't it be a godo idea to write some kind of mini-pep, just to fix the corner cases of the current IO documentation ? I might handle it, if no more-knowledgeable people feels like it.
As for the question of whether 'truncate' should be able to lengthen a file, the method name suggests no; if the method name were 'resize', for example, then maybe yes, zeroing the new bytes for security.
On Sep 18, 2009, at 3:55 PM, MRAB wrote:
I think that this should be an invariant:
0 <= file pointer <= file size
so the file pointer might sometimes have to be moved.
As for the question of whether 'truncate' should be able to lengthen a file, the method name suggests no; if the method name were 'resize', for example, then maybe yes, zeroing the new bytes for security.
Why are you just making things up? There is a *vast* amount of precedent for how file operations should work. Python should follow that precedent and do like POSIX unless there's a compelling reason not to. Quoting: If fildes refers to a regular file, the ftruncate() function shall cause the size of the file to be truncated to length. If the size of the file previously exceeded length, the extra data shall no longer be available to reads on the file. If the file previously was smaller than this size, ftruncate() shall either increase the size of the file or fail. XSI-conformant systems shall increase the size of the file. If the file size is increased, the extended area shall appear as if it were zero-filled. The value of the seek pointer shall not be modified by a call to ftruncate(). James
James Y Knight wrote:
On Sep 18, 2009, at 3:55 PM, MRAB wrote:
I think that this should be an invariant:
0 <= file pointer <= file size
so the file pointer might sometimes have to be moved.
As for the question of whether 'truncate' should be able to lengthen a file, the method name suggests no; if the method name were 'resize', for example, then maybe yes, zeroing the new bytes for security.
Why are you just making things up? There is a *vast* amount of precedent for how file operations should work. Python should follow that precedent and do like POSIX unless there's a compelling reason not to. Quoting:
If fildes refers to a regular file, the ftruncate() function shall cause the size of the file to be truncated to length. If the size of the file previously exceeded length, the extra data shall no longer be available to reads on the file. If the file previously was smaller than this size, ftruncate() shall either increase the size of the file or fail. XSI-conformant systems shall increase the size of the file. If the file size is increased, the extended area shall appear as if it were zero-filled. The value of the seek pointer shall not be modified by a call to ftruncate().
"making things up"? I'm just expressing an opinion!
James Y Knight
Why are you just making things up? There is a *vast* amount of precedent for how file operations should work. Python should follow that precedent and do like POSIX unless there's a compelling reason not to.
Actually, Python is cross-platform and therefore does not necessarily follow POSIX behaviour, especially when it is desired to hide inconsistencies between different platform. (I do agree, of course, that the IO APIs are quite heavily inspired by the POSIX APIs) Regards Antoine.
Antoine Pitrou writes:
James Y Knight
writes: Why are you just making things up? There is a *vast* amount of precedent for how file operations should work. Python should follow that precedent and do like POSIX unless there's a compelling reason not to.
Actually, Python is cross-platform and therefore does not necessarily follow POSIX behaviour, especially when it is desired to hide inconsistencies between different platform.
That's what *James* said, except that I prefer his standard, because I believe POSIX documentation to be more accessible to a variety of Python developers than other system's, and it's better documented: rationales are included, history is available, etc.
Le Sat, 19 Sep 2009 09:19:53 +0900, Stephen J. Turnbull a écrit :
Antoine Pitrou writes:
James Y Knight
writes: Why are you just making things up? There is a *vast* amount of precedent for how file operations should work. Python should follow that precedent and do like POSIX unless there's a compelling reason not to.
Actually, Python is cross-platform and therefore does not necessarily follow POSIX behaviour, especially when it is desired to hide inconsistencies between different platform.
That's what *James* said, except that I prefer his standard,
I don't believe that POSIX compliance is a sufficient argument to ask someone to shut up in the discussion of a cross-platform API. Which is more or less what James' answer was trying to do. So, no, not exactly the same thing that I said.
I believe POSIX documentation to be more accessible to a variety of Python developers than other system's, and it's better documented: rationales are included, history is available, etc.
I'm not sure that's true. Various Unix/Linux man pages are readily available on the Internet, but they regard specific implementations, which often depart from the spec in one way or another. POSIX specs themselves don't seem to be easily reachable; you might even have to pay for them. cheers Antoine.
On Sep 18, 2009, at 8:58 PM, Antoine Pitrou wrote:
I'm not sure that's true. Various Unix/Linux man pages are readily available on the Internet, but they regard specific implementations, which often depart from the spec in one way or another. POSIX specs themselves don't seem to be easily reachable; you might even have to pay for them.
The POSIX specs are quite easily accessible, without payment. I got my quote by doing: man 3p ftruncate I had previously done: apt-get install manpages-posix-dev to install the posix manpages. That package contains the POSIX standard as of 2003. Which is good enough for most uses. It seems to be available here, if you don't have a debian system: http://www.kernel.org/pub/linux/docs/man-pages/man-pages-posix/ There's also a webpage, containing the official POSIX 2008 standard: http://www.opengroup.org/onlinepubs/9699919799/ And to navigate to ftruncate from there, click "System Interfaces" in the left pane, "System Interfaces" in the bottom pane, and then "ftruncate" in the bottom pane. James
@pitrou: non-blocking IO in python ? which ones are you thinking about ? I have currently no plan to work on asynchronous IO like win32's readFileEx() etc. (too many troubles for the benefit), however I'd be interested by getting non-blocking operations on IPC pipes (I've crossed several people in trouble with that, having a process never end on some OSes because they couldn't stop threads blocked on pipes). This reimplementation is actually necessary to get file locking, because advanced win32 operations only work on real file handles, not the handles that are underlying the C API layer. Furthermore, some interesting features (like O_EXCL | O_CREAT) are not possible with the current io implementations. So well, reimplementation required ^^ Else, allright, I'll try to summarize the various points in a PEP-update. Concerning the "truncate" method however, on second thought I feel we might take distance from Posix API for naming, precisely since it's anyway too "platform-specific" (windows knows nothing about Posix, and even common unix-like systems modify it in a way or another - several systems don't zero-fill files when extending them). When seeing "truncate", in my opinion, most people will think "it's only to reduce file size" (for beginners), or will immediately get in mind all the tips of posix-like systems (for more experienced developers). Shouldn't we, like other cross-platform APIs, use a more unambiguous notion, like "setLength" (java) or "resize" (Qt) ? And let the filepointer untouched, simply because there are no reasons to move it, especially when extending the file (yep, on windows we're forced to move the pointer, but it's easy to fix) ? If it's too late to modify the IO API, too bad, but I don't feel comfortable with the "truncate" word. And I don't like the fact that we move the filepointer to prevent it from exceeding the file size, whereas on the other hand we can seek() anywhere without getting exceptions (and so, set the filepointer past the end of file). Having 0 <= filepointer <= EOF is OK to me, but then we have to enforce it for all functions, not just truncate. Concerning exceptions, which one is raised is not so important to me, as long as it's well documented and not tricky (eg. WindowsErrors are OK to me, because they subclass OSError, so most cross-platform programs wont even have to know about them). I had the feeling that IOErrors were for operations on file streams (opening, writing/reading, closing...), whereas OSErrors were for manipulations on filesystems (renaming, linking, stating...) and processes. This semantic would be perfect for me, and it's already 95% here, we would just have to fix some unwelcomed OSErrors exceptions in the iomodule. Isn't that worth it ? It'd simplify programmers' job a lot, and allow a more subtle treatment of exceptions (if everyone just catches Environment errors, without being sure of which subcless is actually raised, we miss the point of IOError and OSError). Regards, Pascal James Y Knight a écrit :
On Sep 18, 2009, at 8:58 PM, Antoine Pitrou wrote:
I'm not sure that's true. Various Unix/Linux man pages are readily available on the Internet, but they regard specific implementations, which often depart from the spec in one way or another. POSIX specs themselves don't seem to be easily reachable; you might even have to pay for them.
The POSIX specs are quite easily accessible, without payment.
I got my quote by doing: man 3p ftruncate
I had previously done: apt-get install manpages-posix-dev to install the posix manpages. That package contains the POSIX standard as of 2003. Which is good enough for most uses. It seems to be available here, if you don't have a debian system: http://www.kernel.org/pub/linux/docs/man-pages/man-pages-posix/
There's also a webpage, containing the official POSIX 2008 standard: http://www.opengroup.org/onlinepubs/9699919799/
And to navigate to ftruncate from there, click "System Interfaces" in the left pane, "System Interfaces" in the bottom pane, and then "ftruncate" in the bottom pane.
James _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/firephoenix%40wanadoo.fr
@pitrou: non-blocking IO in python ? which ones are you thinking about ? I have currently no plan to work on asynchronous IO like win32's readFileEx() etc. (too many troubles for the benefit), however I'd be interested by getting non-blocking operations on IPC pipes (I've crossed several people in trouble with that, having a process never end on some OSes because they couldn't stop threads blocked on pipes). This reimplementation is actually necessary to get file locking, because advanced win32 operations only work on real file handles, not the handles that are underlying the C API layer. Furthermore, some interesting features (like O_EXCL | O_CREAT) are not possible with the current io implementations. So well, reimplementation required ^^ Else, allright, I'll try to summarize the various points in a PEP-update. Concerning the "truncate" method however, on second thought I feel we might take distance from Posix API for naming, precisely since it's anyway too "platform-specific" (windows knows nothing about Posix, and even common unix-like systems modify it in a way or another - several systems don't zero-fill files when extending them). When seeing "truncate", in my opinion, most people will think "it's only to reduce file size" (for beginners), or will immediately get in mind all the tips of posix-like systems (for more experienced developers). Shouldn't we, like other cross-platform APIs, use a more unambiguous notion, like "setLength" (java) or "resize" (Qt) ? And let the filepointer untouched, simply because there are no reasons to move it, especially when extending the file (yep, on windows we're forced to move the pointer, but it's easy to fix) ? If it's too late to modify the IO API, too bad, but I don't feel comfortable with the "truncate" word. And I don't like the fact that we move the filepointer to prevent it from exceeding the file size, whereas on the other hand we can seek() anywhere without getting exceptions (and so, set the filepointer past the end of file). Having 0 <= filepointer <= EOF is OK to me, but then we have to enforce it for all functions, not just truncate. Concerning exceptions, which one is raised is not so important to me, as long as it's well documented and not tricky (eg. WindowsErrors are OK to me, because they subclass OSError, so most cross-platform programs wont even have to know about them). I had the feeling that IOErrors were for operations on file streams (opening, writing/reading, closing...), whereas OSErrors were for manipulations on filesystems (renaming, linking, stating...) and processes. This semantic would be perfect for me, and it's already 95% here, we would just have to fix some unwelcomed OSErrors exceptions in the iomodule. Isn't that worth it ? It'd simplify programmers' job a lot, and allow a more subtle treatment of exceptions (if everyone just catches Environment errors, without being sure of which subcless is actually raised, we miss the point of IOError and OSError). Regards, Pascal
James Y Knight a écrit :
On Sep 18, 2009, at 8:58 PM, Antoine Pitrou wrote:
I'm not sure that's true. Various Unix/Linux man pages are readily available on the Internet, but they regard specific implementations, which often depart from the spec in one way or another. POSIX specs themselves don't seem to be easily reachable; you might even have to pay for them.
The POSIX specs are quite easily accessible, without payment.
I got my quote by doing: man 3p ftruncate
I had previously done: apt-get install manpages-posix-dev to install the posix manpages. That package contains the POSIX standard as of 2003. Which is good enough for most uses. It seems to be available here, if you don't have a debian system: http://www.kernel.org/pub/linux/docs/man-pages/man-pages-posix/
There's also a webpage, containing the official POSIX 2008 standard: http://www.opengroup.org/onlinepubs/9699919799/
And to navigate to ftruncate from there, click "System Interfaces" in the left pane, "System Interfaces" in the bottom pane, and then "ftruncate" in the bottom pane.
James _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/firephoenix%40wanadoo.fr
On Sat, Sep 19, 2009 at 2:46 AM, Pascal Chambon
This reimplementation is actually necessary to get file locking, because advanced win32 operations only work on real file handles, not the handles that are underlying the C API layer. Furthermore, some interesting features (like O_EXCL | O_CREAT) are not possible with the current io implementations. So well, reimplementation required ^^
Concerning exceptions, which one is raised is not so important to me, as long as it's well documented and not tricky (eg. WindowsErrors are OK to me, because they subclass OSError, so most cross-platform programs wont even have to know about them).
If you use real Windows file handles (instead of the POSIX-ish Windows API), won't you need to return WindowsErrors?
I had the feeling that IOErrors were for operations on file streams (opening, writing/reading, closing...), whereas OSErrors were for manipulations on filesystems (renaming, linking, stating...) and processes.
If that were documented and a firm rule, that would certainly be great. It's not too hard to find counterexamples in the current codebase. Also, I'm not sure how one could avoid needing to raise WindowsError in some cases. Maybe someone with more knowledge of the history of IOError vs. OSError could chime in. Python 2.6:
os.write(f.fileno(), 'blah') Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError: [Errno 9] Bad file descriptor f.write('blah') Traceback (most recent call last): File "<stdin>", line 1, in <module> IOError: [Errno 9] Bad file descriptor
-- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
Good example with "os.write(f.fileno(), 'blah')" - and you obtain the same error if you try to open an io.FileIo by providing a file descriptor instead of a file name as first argument. This would really deserve an unification. Actually, since Windows Error Codes concern any possible error (IO, file permissions, memory problems...), I thought the best would be to convert them to the most appropriate python standard exception, only defaulting to WindowsError (i.e, OSError's hierarchy) when no other exception type matches. So at the moment, I use a decorator to automatically convert all errors on stream operations into IOErrors. Error codes are not the same as unix ones indeed, but I don't know if it's really important (imo, most people just want to know if the operation was successful, I don't know if many developers scan error codes to act accordingly). For IOError types that really matter (eg. file already locked, buffer full), the easiest is actually to use subclasses of IOError (the io module already does that, even though I'll maybe have to create new exceptions for errors like "file already exists" or "file already locked by another process") Regards, Pascal Daniel Stutzbach a écrit :
On Sat, Sep 19, 2009 at 2:46 AM, Pascal Chambon
mailto:chambon.pascal@gmail.com> wrote: This reimplementation is actually necessary to get file locking, because advanced win32 operations only work on real file handles, not the handles that are underlying the C API layer. Furthermore, some interesting features (like O_EXCL | O_CREAT) are not possible with the current io implementations. So well, reimplementation required ^^
Concerning exceptions, which one is raised is not so important to me, as long as it's well documented and not tricky (eg. WindowsErrors are OK to me, because they subclass OSError, so most cross-platform programs wont even have to know about them).
If you use real Windows file handles (instead of the POSIX-ish Windows API), won't you need to return WindowsErrors?
I had the feeling that IOErrors were for operations on file streams (opening, writing/reading, closing...), whereas OSErrors were for manipulations on filesystems (renaming, linking, stating...) and processes.
If that were documented and a firm rule, that would certainly be great. It's not too hard to find counterexamples in the current codebase. Also, I'm not sure how one could avoid needing to raise WindowsError in some cases.
Maybe someone with more knowledge of the history of IOError vs. OSError could chime in.
Python 2.6:
os.write(f.fileno(), 'blah') Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError: [Errno 9] Bad file descriptor f.write('blah') Traceback (most recent call last): File "<stdin>", line 1, in <module> IOError: [Errno 9] Bad file descriptor
-- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
------------------------------------------------------------------------
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/firephoenix%40wanadoo.fr
On Sat, Sep 19, 2009 at 5:31 AM, Pascal Chambon
Actually, since Windows Error Codes concern any possible error (IO, file permissions, memory problems...), I thought the best would be to convert them to the most appropriate python standard exception, only defaulting to WindowsError (i.e, OSError's hierarchy) when no other exception type matches. So at the moment, I use a decorator to automatically convert all errors on stream operations into IOErrors. Error codes are not the same as unix ones indeed, but I don't know if it's really important (imo, most people just want to know if the operation was successful, I don't know if many developers scan error codes to act accordingly).
I don't often need to check the error code at runtime but seeing the corresponding message is often critical for debugging. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
On Sat, 19 Sep 2009 at 12:31, Pascal Chambon wrote:
stream operations into IOErrors. Error codes are not the same as unix ones indeed, but I don't know if it's really important (imo, most people just want to know if the operation was successful, I don't know if many developers scan error codes to act accordingly). For IOError types that really matter (eg.
Doesn't matter if it isn't very many, I think, just that it can be done. But I suspect it is fairly common. I know I have inspected OSError codes (though I can't remember if I've done it for file operations). --David
Pascal Chambon wrote:
(imo, most people just want to know if the operation was successful, I don't know if many developers scan error codes to act accordingly).
And as a user of applications written by those developers, it is a practice I detest with a passion. Debugging environmental problems is painful enough without further encouraging applications that lie to their users as to what has actually gone wrong. For example, a file not existing, a file being locked by another process, and the user not having write permissions to the file are problems that demand very different responses from the user. Applications that give the same error message for all three problems are far too common. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan wrote:
For example, a file not existing, a file being locked by another process, and the user not having write permissions to the file are problems that demand very different responses from the user.
You can display an error-specific message without having to inspect the error code, e.g. try: something_with_file(path) except EnvironmentError, e: report_error("Couldn't do that with %s: %s" % (path, e)) This is a pattern I use a lot, and it seems to work pretty well. -- Greg
On Sat, 19 Sep 2009 08:31:15 pm Pascal Chambon wrote:
Error codes are not the same as unix ones indeed, but I don't know if it's really important (imo, most people just want to know if the operation was successful, I don't know if many developers scan error codes to act accordingly).
I do. Please don't throw away useful information. -- Steven D'Aprano
Pascal Chambon wrote:
And let the filepointer untouched, simply because there are no reasons to move it,
On some systems it may not be possible for the file pointer to be positioned past the end of the file (I think Classic MacOS was like that).
I had the feeling that IOErrors were for operations on file streams (opening, writing/reading, closing...), whereas OSErrors were for manipulations on filesystems (renaming, linking, stating...)
I always thought the distinction was that IOError contains a C library errno value, whereas OSError contains an OS-specific error code. So system calls that aren't part of the C stdlib need to use OSError, at least on some platforms. I don't see that file errors vs. everything else is a very useful distinction to make when catching exceptions. I almost always end up catching EnvironmentError to make sure I get both, particularly when working cross-platform. What we could do with is better platform-independent ways of distinguishing particular error conditions, such as file not found, out of space, etc., either using subclasses of IOError or mapping error codes to a set of platform-independent ones. -- Greg
What we could do with is better platform-independent ways of distinguishing particular error conditions, such as file not found, out of space, etc., either using subclasses of IOError or mapping error codes to a set of platform-independent ones.
Well, mapping all errors (including C ones and windows-specific ones) to a common set would be extremely useful for developers indeed. I guess some advanced windows errors will never have equivalents elsewhere, but does anyone know an error code set which would be relevant to cover all memorty, filesystem, io and locking aspects ? Regards, Pascal
Hello,
Pascal Chambon
@pitrou: non-blocking IO in python ? which ones are you thinking about ?
I was talking about the existing support for non-blocking IO in the FileIO class (look up EAGAIN in fileio.c), as well as in the Buffered* objects.
If it's too late to modify the IO API, too bad, but I don't feel comfortable with the "truncate" word.
It's certainly too late to modify the IO API only for naming purposes.
And I don't like the fact that we move the filepointer to prevent it from exceeding the file size,
I don't see what you mean:
with open('foobar', 'wb') as f: ... f.truncate(0) ... 0 os.stat('foobar').st_size 0 with open('foobar', 'wb') as f: ... f.truncate(16) ... f.tell() ... 16 16 os.stat('foobar').st_size 16
I had the feeling that IOErrors were for operations on file streams (opening, writing/reading, closing...), whereas OSErrors were for manipulations on filesystems (renaming, linking, stating...) and processes.
Ok, but the distinction is certainly fuzzy in many cases. I have no problem with trying to change the corner cases you mention, though. Regards Antoine.
Antoine Pitrou a écrit :
Hello,
Pascal Chambon
writes: @pitrou: non-blocking IO in python ? which ones are you thinking about ?
I was talking about the existing support for non-blocking IO in the FileIO class (look up EAGAIN in fileio.c), as well as in the Buffered* objects.
Allright, I'll check that EAGAIN stuff, that I hadn't even noticed :)
And I don't like the fact that we move the filepointer to prevent it from exceeding the file size,
I don't see what you mean:
Well the sample code you showed is not shocking, but I'd like to have a coherency the with file.seek(), because if truncate() prevents out-of-bound file pointer, other methods should do the same as well (and raise IOError when seeking out of file bounds).
I had the feeling that IOErrors were for operations on file streams (opening, writing/reading, closing...), whereas OSErrors were for manipulations on filesystems (renaming, linking, stating...) and processes.
Ok, but the distinction is certainly fuzzy in many cases. I have no problem with trying to change the corner cases you mention, though.
The case which could be problematic there is the file opening, because it can involve problems at all levels of the OS (filesystem not existing, permission problems, file locking...), so we should keep it in the "EnvironmentError" area. But as soon as a file is open, I guess only IOErrors can be involved (no space left, range locked etc), so enforcing all this to raise IOError would be OK I think.
Antoine Pitrou writes:
I don't believe that POSIX compliance is a sufficient argument to ask someone to shut up in the discussion of a cross-platform API. Which is more or less what James' answer was trying to do.
No, as I read it, James said, "when there's precedent, even a standard, don't make stuff up". He then referred to the POSIX standard, which I assume means that's a standard he likes. But he didn't say it had to be POSIX, or else shut up. He said, "respect precedent" (and implied that ignorance is no excuse, I guess, which is a little harsh on the Internet<wink>). And I think he'd agree to weaken his dictum to add "and if there's good reason to vary from the standard, either quote other standards or give rationale referring to why you want to vary from the standard."
I'm not sure that's true [that POSIX is readily available]. [...] POSIX specs themselves don't seem to be easily reachable; you might even have to pay for them.
Five minutes with Google gives http://standards.ieee.org/catalog/olis/arch_posix.html (IEEE members only) http://www.unix.org/version3/ (registration but no fee required) http://www.opengroup.org/onlinepubs/000095399/toc.htm (title page for above, lets you sneak past the registration, and has higher Googlefu) Older versions also seem to be readily available.
I believe POSIX documentation to be more accessible to a variety of Python developers than other system's, and it's better documented: rationales are included, history is available, etc.
I'm not sure that's true. Various Unix/Linux man pages are readily available on the Internet, but they regard specific implementations, which often depart from the spec in one way or another. POSIX specs themselves don't seem to be easily reachable; you might even have to pay for them.
Then please rest assured that it actually *is* true: - the Linux man pages are often a literal copy of the POSIX man pages, so when you look at a Linux man page, there is a good chance that it either is the wording of the POSIX spec, or points out what details are specific to POSIX. - The Open Group publishes POSIX free of charge, and these days also free of registration. - these days, specific implementations typically do *not* deviate from POSIX. Some provide additional features, but in a way that does not harm compatibility. Regards, Martin
On Sat, 19 Sep 2009 06:08:23 am James Y Knight wrote:
On Sep 18, 2009, at 3:55 PM, MRAB wrote:
I think that this should be an invariant:
0 <= file pointer <= file size
so the file pointer might sometimes have to be moved.
As for the question of whether 'truncate' should be able to lengthen a file, the method name suggests no; if the method name were 'resize', for example, then maybe yes, zeroing the new bytes for security.
Why are you just making things up?
Well, why not? The POSIX standard wasn't derived from the physical laws of the universe, somebody -- or more likely, some committee -- made them up. It's not like we are forced to accept MRAB's suggestions, but surely he's allowed to make them?
There is a *vast* amount of precedent for how file operations should work. Python should follow that precedent and do like POSIX unless there's a compelling reason not to. Quoting:
If fildes refers to a regular file, the ftruncate() function shall cause the size of the file to be truncated to length. If the size of the file previously exceeded length, the extra data shall no longer be available to reads on the file. If the file previously was smaller than this size, ftruncate() shall either increase the size of the file or fail. XSI-conformant systems shall increase the size of the file. If the file size is increased, the extended area shall appear as if it were zero-filled. The value of the seek pointer shall not be modified by a call to ftruncate().
Standard or not, it has a seriously misleading name. Truncate means "to cut off", and is a standard English word and mathematical term. In English and mathematics, truncating something can NEVER lead it to increase in size, the object being truncated can only decrease or remain the same size. Using "truncate" to mean "increase in size" makes about as much sense as having a list method called "remove" used to insert items. I can't imagine what the committee who approved this were thinking. Java's FileChannel object uses truncate to only shrink a file, never increase it. To increase it, you have to explicitly write to the file. http://java.sun.com/j2se/1.4.2/docs/api/java/nio/channels/FileChannel.html#t...) In any case, some platforms don't allow the file pointer to exceed the size of the file, so it's not clear that the POSIX standard is appropriate for Python. -- Steven D'Aprano
Steven D'Aprano wrote:
Using "truncate" to mean "increase in size" makes about as much sense as having a list method called "remove" used to insert items. I can't imagine what the committee who approved this were thinking.
I expect the reason is historical. Some time back in the early days of Unix, someone wanted a way of chopping back files, so they added a truncate() system call. Then someone else noticed that it would happily accept an argument greater than the existing length, and it seemed like that could be useful behaviour, so they documented it and left it that way. Then the POSIX committee came along and incorporated it into the standard so as to be compatible with existing practice. -- Greg
Well, system compatibility argues strongl in favor of not letting filepointer > EOF. However, is that really necessary to move the pointer to EOF in ANY case ? I mean, if I extend the file, or if I reduce it without going lower than my current filepointer, I really don't expect at all the io system to move my pointer to the end of file, "just for fun". In these patterns, people would have to remember their current filepointer, to come back to where they were, and that's not pretty imo... If we agree on the simple mandatory expression 0 <= filepointer <= EOF (for cross-platform safety), then we just have to enforce it when the rule is broken : reducing the size lower than the filepointer, and seeking past the end of file. All other conditions should leav the filepointer where the user put it. Shouldnt it be so ? Concerning the naming of truncate(), would it be possible to deprecate it and alias it to "resize()" ? It's not very gratifying to have duplicated methods at the beginning of a major release, but I feel too that "truncate" is a misleading term, that had better be replaced asap. Regards, Pascal
Pascal Chambon wrote:
Concerning the naming of truncate(), would it be possible to deprecate it and alias it to "resize()" ? It's not very gratifying to have duplicated methods at the beginning of a major release, but I feel too that "truncate" is a misleading term, that had better be replaced asap.
There's something to be said for that, but there's also something to be said for following established conventions, and there's a long-established precedent from the C library for having a function called truncate() that behaves this way. -- Greg
Hello After weighing up here and that, here is what I have come with. Comments and issue notifications more than welcome, of course. The exception thingy is not yet addressed. Regards, Pascal *Truncate and file pointer semantics* Rationale : The current implementation of truncate() always move the file pointer to the new end of file. This behaviour is interesting for compatibility, if the file has been reduced and the file pointer is now past its end, since some platforms might require 0 <= filepointer <= filesize. However, there are several arguments against this semantic: * Most common standards (posix, win32...) allow the file pointer to be past the end of file, and define the behaviour of other stream methods in this case * In many cases, moving the filepointer when truncating has no reasons to happen (if we're extending the file, or reducing it without going beneath the file pointer) * Making 0 <= filepointer <= filesize a global rule of the python IO module doesn't seems possible, since it would require modifications of the semantic of other methods (eg. seek() should raise exceptions or silently disobey when asked to move the filepointer past the end of file), and lead to incoherent situations when concurrently accessing files without locking (what if another process truncates to 0 bytes the file you're writing ?) So here is the proposed semantic, which matches established conventions: *RawIOBase.truncate(n: int = None) -> int* *(same for BufferedIOBase.truncate(pos: int = None) -> int)* Resizes the file to the size specified by the positive integer n, or by the current filepointer position if n is None. The file must be opened with write permissions. If the file was previously larger than n, the extra data is discarded. If the file was previously shorter than n, its size is increased, and the extended area appears as if it were zero-filled. In any case, the file pointer is left unchanged, and may point beyond the end of file. Note: trying to read past the end of file returns an empty string, and trying to write past the end of file extends it by zero-ing the gap. On rare platforms which don't support file pointers to be beyond the end of file, all these behaviours shall be faked thanks to internal storage of the "wanted" file pointer position (silently extending the file, if necessary, when a write operation occurs). *Proposition of doc update* *RawIOBase*.read(n: int) -> bytes Read up to n bytes from the object and return them. Fewer than n bytes may be returned if the operating system call returns fewer than n bytes. If 0 bytes are returned, and n was not 0, this indicates end of file. If the object is in non-blocking mode and no bytes are available, the call returns None. *RawIOBase*.readinto(b: bytes) -> int Read up to len(b) bytes from the object and stores them in b, returning the number of bytes read. Like .read, fewer than len(b) bytes may be read, and 0 indicates end of file if b is not 0. None is returned if a non-blocking object has no bytes available. The length of b is never changed.
Pascal Chambon wrote:
Hello
After weighing up here and that, here is what I have come with. Comments and issue notifications more than welcome, of course. The exception thingy is not yet addressed.
Regards, Pascal
*Truncate and file pointer semantics*
Rationale :
The current implementation of truncate() always move the file pointer to the new end of file.
This behaviour is interesting for compatibility, if the file has been reduced and the file pointer is now past its end, since some platforms might require 0 <= filepointer <= filesize.
However, there are several arguments against this semantic:
* Most common standards (posix, win32…) allow the file pointer to be past the end of file, and define the behaviour of other stream methods in this case * In many cases, moving the filepointer when truncating has no reasons to happen (if we’re extending the file, or reducing it without going beneath the file pointer) * Making 0 <= filepointer <= filesize a global rule of the python IO module doesn’t seems possible, since it would require modifications of the semantic of other methods (eg. seek() should raise exceptions or silently disobey when asked to move the filepointer past the end of file), and lead to incoherent situations when concurrently accessing files without locking (what if another process truncates to 0 bytes the file you’re writing ?)
So here is the proposed semantic, which matches established conventions:
*RawIOBase.truncate(n: int = None) -> int*
*(same for BufferedIOBase.truncate(pos: int = None) -> int)*
Resizes the file to the size specified by the positive integer n, or by the current filepointer position if n is None.
The new size could be positive or zero.
The file must be opened with write permissions.
If the file was previously larger than n, the extra data is discarded. If the file was previously shorter than n, its size is increased, and the extended area appears as if it were zero-filled.
In any case, the file pointer is left unchanged, and may point beyond the end of file.
Note: trying to read past the end of file returns an empty string, and trying to write past the end of file extends it by zero-ing the gap. On rare platforms which don’t support file pointers to be beyond the end of file, all these behaviours shall be faked thanks to internal storage of the “wanted” file pointer position (silently extending the file, if necessary, when a write operation occurs).
*Proposition of doc update*
*RawIOBase*.read(n: int) -> bytes
Read up to n bytes from the object and return them. Fewer than n bytes may be returned if the operating system call returns fewer than n bytes. If 0 bytes are returned, and n was not 0, this indicates end of file. If the object is in non-blocking mode and no bytes are available, the call returns None.
*RawIOBase*.readinto(b: bytes) -> int
Read up to len(b) bytes from the object and stores them in b, returning the number of bytes read. Like .read, fewer than len(b) bytes may be read, and 0 indicates end of file if b is not 0. None is returned if a non-blocking object has no bytes available. The length of b is never changed.
I thought 'bytes' was immutable! If you're going to read into a list or array, it would be nice to also be able to give the start index and either the end index or the count (start defaults to 0, end defaults to len).
Hello,
*Truncate and file pointer semantics* [snip]
The major problem here is that you are changing the current semantics. I don't think the argument you are making for it is strong enough to counter-balance the backwards compatibility issue. The situation would be different if 3.0 hadn't been released yet. Besides, we already broke compatibility with 3.0/3.1, let's not give users the impression that we don't care about compatibility anymore. Regards Antoine.
On Sun, Sep 20, 2009 at 4:48 AM, Pascal Chambon
*RawIOBase*.readinto(b: bytes) -> int
"bytes" are immutable. The signature is: *RawIOBase*.readinto(b: bytearray) -> int Your efforts in working on clarifying these important corner cases is appreciated. :-) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
Daniel Stutzbach a écrit :
On Sun, Sep 20, 2009 at 4:48 AM, Pascal Chambon
mailto:chambon.pascal@gmail.com> wrote: *RawIOBase*.readinto(b: bytes) -> int
"bytes" are immutable. The signature is:
*RawIOBase*.readinto(b: bytearray) -> int
Your efforts in working on clarifying these important corner cases is appreciated. :-)
You're welcome B-) Indeed my copy/paste of the current pep was an epic fail - you'll all have recognized readinto actually dealt with bytearrays, contrarily to what the current pep tells -> http://www.python.org/dev/peps/pep-3116/ RawIOBase.read(int) takes a positive-or-zero integer indeed (I am used to understanding this, as opposed to "strictly positive") Does MRAb's suggestion of providing beginning and end offsets for the bytearray meets people's expectations ? Personnaly, I feel readinto is a very low-level method, mostly used by read() to get a result from low-level native functions (fread, readfile), and read() always provides a buffer with the proper size... are there cases in which these two additional arguments would provide some real gain ? Concerning the "backward compatibility" problem, I agree we should not break specifications, but breaking impelmentation details is another thing for me. It's a golden rule in programmers' world : thou shalt NEVER rely on implementation details. Programs that count on these (eg. thinking that listdir() will always returns "." and ".." as first item0... until it doesnt anymore) encounter huge problems when changing of platform or API version. When programming with the current truncate(), I would always have moved the file pointer after truncating the file, simply because I have no idea of what might happen to it (nothing was documented on this at the moment, and looking at the sources is really not a sustainable behaviour). So well, it's a pity if some early 3.1 users relied on it, but if we stick to the current semantic we still have a real coherency problem - seek() is not limited in range, and some experienced programmers might be trapped by this non-conventionnal truncate() if they rely on posix or previous python versions... I really dislike the idea that truncate() might move my file offset even when there are no reasons for it. Regards, Pascal
Pascal Chambon wrote:
Daniel Stutzbach a écrit :
On Sun, Sep 20, 2009 at 4:48 AM, Pascal Chambon
mailto:chambon.pascal@gmail.com> wrote: *RawIOBase*.readinto(b: bytes) -> int
"bytes" are immutable. The signature is:
*RawIOBase*.readinto(b: bytearray) -> int
Your efforts in working on clarifying these important corner cases is appreciated. :-)
You're welcome B-)
Indeed my copy/paste of the current pep was an epic fail - you'll all have recognized readinto actually dealt with bytearrays, contrarily to what the current pep tells -> http://www.python.org/dev/peps/pep-3116/
RawIOBase.read(int) takes a positive-or-zero integer indeed (I am used to understanding this, as opposed to "strictly positive")
Does MRAb's suggestion of providing beginning and end offsets for the bytearray meets people's expectations ? Personnaly, I feel readinto is a very low-level method, mostly used by read() to get a result from low-level native functions (fread, readfile), and read() always provides a buffer with the proper size... are there cases in which these two additional arguments would provide some real gain ?
It's useful if you want to fill the buffer but 'read' might return fewer bytes than you asked for because it returns only what's available. That might not happen for files, but it might for other forms of I/O. Other languages, like Delphi and Java, which read into pre-existing arrays, have or allow the extra parameters.
Concerning the "backward compatibility" problem, I agree we should not break specifications, but breaking impelmentation details is another thing for me. It's a golden rule in programmers' world : thou shalt NEVER rely on implementation details. Programs that count on these (eg. thinking that listdir() will always returns "." and ".." as first item0... until it doesnt anymore) encounter huge problems when changing of platform or API version. When programming with the current truncate(), I would always have moved the file pointer after truncating the file, simply because I have no idea of what might happen to it (nothing was documented on this at the moment, and looking at the sources is really not a sustainable behaviour). So well, it's a pity if some early 3.1 users relied on it, but if we stick to the current semantic we still have a real coherency problem - seek() is not limited in range, and some experienced programmers might be trapped by this non-conventionnal truncate() if they rely on posix or previous python versions... I really dislike the idea that truncate() might move my file offset even when there are no reasons for it.
Well, if it's consistent and documented (and not totally stupid), I can't really complain. :-)
Hello Below is a corrected version of the PEP update, adding the start/end indexes proposition and fixing functions signatures. Does anyone disagree with these specifications ? Or can we consider it as a target for the next versions of the io module ? I would have no problem to implement this behaviour in my own pure python FileIO system, however if someone is willing to patch the _fileio implementation, it'd save a lot of time - I most probably won't have the means to setup a C compilation environment under windows and linux, and properly update/test this, before January (when I get freelance...). I launch another thread on other to-be-discussed IO points B-) Regards, Pascal ================ PEP UPDATE for new I/O system - v2 =========== **Truncate and file pointer semantics** Rationale : The current implementation of truncate() always move the file pointer to the new end of file. This behaviour is interesting for compatibility, if the file has been reduced and the file pointer is now past its end, since some platforms might require 0 <= filepointer <= filesize. However, there are several arguments against this semantic: * Most common standards (posix, win32…) allow the file pointer to be past the end of file, and define the behaviour of other stream methods in this case * In many cases, moving the filepointer when truncating has no reasons to happen (if we’re extending the file, or reducing it without going beneath the file pointer) * Making 0 <= filepointer <= filesize a global rule of the python IO module doesn’t seems possible, since it would require modifications of the semantic of other methods (eg. seek() should raise exceptions or silently disobey when asked to move the filepointer past the end of file), and lead to incoherent situations when concurrently accessing files without locking (what if another process truncates to 0 bytes the file you’re writing ?) So here is the proposed semantic, which matches established conventions: *IOBase.truncate(n: int = None) -> int* Resizes the file to the size specified by the positive integer n, or by the current filepointer position if n is None. The file must be opened with write permissions. If the file was previously larger than size, the extra data is discarded. If the file was previously shorter than size, its size is increased, and the extended area appears as if it were zero-filled. In any case, the file pointer is left unchanged, and may point beyond the end of file. Note: trying to read past the end of file returns an empty string, and trying to write past the end of file extends it by zero-ing the gap. On rare platforms which don't support file pointers to be beyond the end of file, all these behaviours shall be faked thanks to internal storage of the "wanted" file pointer position (silently extending the file, if necessary, when a write operation occurs). *Propositions of doc update* *RawIOBase*.read(n: int) -> bytes Read up to n bytes from the object and return them. Fewer than n bytes may be returned if the operating system call returns fewer than n bytes. If 0 bytes are returned, and n was not 0, this indicates end of file. If the object is in non-blocking mode and no bytes are available, the call returns None. *RawIOBase*.readinto(b: bytearray, [start: int = None], [end: int = None]) -> int start and end are used as slice indexes, so that the bytearray taken into account is actually range = b[start:end] (or b[start:], b[:end] or b[:], depending on the arguments which are not None). Read up to len(range) bytes from the object and store them in b, returning the number of bytes read. Like .read, fewer than len(range) bytes may be read, and 0 indicates end of file if len(range) is not 0. None is returned if a non-blocking object has no bytes available. The length of b is never changed.
Hello,
So here is the proposed semantic, which matches established conventions:
*IOBase.truncate(n: int = None) -> int* [...]
I still don't think there is a sufficient benefit in breaking compatibility. If you want the file pointer to remain the same, you can save it first and restore it afterwards manually.
*Propositions of doc update*
Please open tracker issues for these kinds of suggestions. Regards Antoine.
On Sun, Sep 27, 2009 at 4:33 AM, Antoine Pitrou
So here is the proposed semantic, which matches established conventions:
*IOBase.truncate(n: int = None) -> int* [...]
I still don't think there is a sufficient benefit in breaking compatibility. If you want the file pointer to remain the same, you can save it first and restore it afterwards manually.
What compatibility, though? f.truncate() behaves different in 2.x than in 3.x, and in 2.x it seems to match the POSIX semantics (i.e. the seek position is unchanged even though the file size is). Perhaps the changed semantics was an oversight or a mistake? -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Le Sun, 27 Sep 2009 14:24:52 -0700, Guido van Rossum a écrit : [truncate()]
What compatibility, though?
Compatibility accross the 3.x line.
f.truncate() behaves different in 2.x than in 3.x, and in 2.x it seems to match the POSIX semantics (i.e. the seek position is unchanged even though the file size is). Perhaps the changed semantics was an oversight or a mistake?
Perhaps it was, indeed. I don't know who made that decision in the first place. Regards Antoine.
On Sun, Sep 27, 2009 at 3:44 PM, Antoine Pitrou
Le Sun, 27 Sep 2009 14:24:52 -0700, Guido van Rossum a écrit : [truncate()]
What compatibility, though?
Compatibility accross the 3.x line.
Well, in this case, maybe compatibility with 2.x is more important -- this isn't something we can easily warn about in 2.6+. In addition there's the POSIX rules.
f.truncate() behaves different in 2.x than in 3.x, and in 2.x it seems to match the POSIX semantics (i.e. the seek position is unchanged even though the file size is). Perhaps the changed semantics was an oversight or a mistake?
Perhaps it was, indeed. I don't know who made that decision in the first place.
It might well have been me (when implementing the earliest version of io.py), and I might well have *though* that I implemented the same rules as 2.x, and never bothered to check. :-( All in all I think we should change this before it's too late; it will affect a very small number of apps (perhaps none?), but I would rather have the right semantics in the future. Also, it's trivial to write code that doesn't care (in fact code running under 2.x and 3.x probably will have to be written so that it doesn't care) so it's not like changing this is going to make life harder for people wanting multiple-version support. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum
All in all I think we should change this before it's too late; it will affect a very small number of apps (perhaps none?), but I would rather have the right semantics in the future. Also, it's trivial to write code that doesn't care (in fact code running under 2.x and 3.x probably will have to be written so that it doesn't care) so it's not like changing this is going to make life harder for people wanting multiple-version support.
Ok, it sounds reasonable to me :) Now somebody just has to write a patch (it shouldn't be too difficult, since the position restoring code exists in the 2.x file object). Regards Antoine.
Antoine Pitrou a écrit :
Hello,
So here is the proposed semantic, which matches established conventions:
*IOBase.truncate(n: int = None) -> int*
[...]
I still don't think there is a sufficient benefit in breaking compatibility. If you want the file pointer to remain the same, you can save it first and restore it afterwards manually.
Sure, but won't this truncate become some kind of a burden for py3k, if it's twice misleading (it's not a real truncation since it can extend the file, and it's not even a truncation or resizing in posix/win32 style, since the filepointer is moved) ? Since it was an undocumented behaviour, and py3k doesn't seem to be present yet in production environments (or is it ?), I'd promote this late-but-maybe-not-too-late change. But if the consensus prefers the current behaviour, well, it'll be OK to me too, as long as it's sufficiently documented and advertised.
*Propositions of doc update*
Please open tracker issues for these kinds of suggestions.
Is the tracker Ok for simple suggestions too ? I thought it was rather for obvious bugfixes, and to-be-discused propositions had better be in mailing-lists... OK then, I'll open bugtracker issues for these. B-)
Instead of "than size", perhaps "than n".
Whoups, indeed >_< Actually the signature would rather be: *IOBase.truncate(size: int = None) -> int* And I forgot to mention that truncate returns the new file size (according to the current PEP)...
Should an exception be raised if start and/or end are out of range? I'd advocate it yep, for the sake of "explicit errors". However, should it be a ValueError (the ones io functions normally use) or an IndexError (which is technically more accurate, but might confuse the user) ?
Regards, Pascal
Pascal Chambon wrote:
Hello
Below is a corrected version of the PEP update, adding the start/end indexes proposition and fixing functions signatures. Does anyone disagree with these specifications ? Or can we consider it as a target for the next versions of the io module ? I would have no problem to implement this behaviour in my own pure python FileIO system, however if someone is willing to patch the _fileio implementation, it'd save a lot of time - I most probably won't have the means to setup a C compilation environment under windows and linux, and properly update/test this, before January (when I get freelance...).
I launch another thread on other to-be-discussed IO points B-)
Regards, Pascal
================ PEP UPDATE for new I/O system - v2 ===========
**Truncate and file pointer semantics**
Rationale :
The current implementation of truncate() always move the file pointer to the new end of file.
This behaviour is interesting for compatibility, if the file has been reduced and the file pointer is now past its end, since some platforms might require 0 <= filepointer <= filesize.
However, there are several arguments against this semantic:
* Most common standards (posix, win32…) allow the file pointer to be past the end of file, and define the behaviour of other stream methods in this case * In many cases, moving the filepointer when truncating has no reasons to happen (if we’re extending the file, or reducing it without going beneath the file pointer) * Making 0 <= filepointer <= filesize a global rule of the python IO module doesn’t seems possible, since it would require modifications of the semantic of other methods (eg. seek() should raise exceptions or silently disobey when asked to move the filepointer past the end of file), and lead to incoherent situations when concurrently accessing files without locking (what if another process truncates to 0 bytes the file you’re writing ?)
So here is the proposed semantic, which matches established conventions:
*IOBase.truncate(n: int = None) -> int*
Resizes the file to the size specified by the positive integer n, or by the current filepointer position if n is None.
The file must be opened with write permissions.
If the file was previously larger than size, the extra data is discarded. If the file was previously shorter than size, its size is increased, and the extended area appears as if it were zero-filled.
Instead of "than size", perhaps "than n".
In any case, the file pointer is left unchanged, and may point beyond the end of file.
Note: trying to read past the end of file returns an empty string, and trying to write past the end of file extends it by zero-ing the gap. On rare platforms which don't support file pointers to be beyond the end of file, all these behaviours shall be faked thanks to internal storage of the "wanted" file pointer position (silently extending the file, if necessary, when a write operation occurs).
*Propositions of doc update*
*RawIOBase*.read(n: int) -> bytes
Read up to n bytes from the object and return them. Fewer than n bytes may be returned if the operating system call returns fewer than n bytes. If 0 bytes are returned, and n was not 0, this indicates end of file. If the object is in non-blocking mode and no bytes are available, the call returns None.
*RawIOBase*.readinto(b: bytearray, [start: int = None], [end: int = None]) -> int
start and end are used as slice indexes, so that the bytearray taken into account is actually range = b[start:end] (or b[start:], b[:end] or b[:], depending on the arguments which are not None).
Read up to len(range) bytes from the object and store them in b, returning the number of bytes read. Like .read, fewer than len(range) bytes may be read, and 0 indicates end of file if len(range) is not 0. None is returned if a non-blocking object has no bytes available. The length of b is never changed.
Should an exception be raised if start and/or end are out of range?
Greg Ewing wrote:
Pascal Chambon wrote:
Concerning the naming of truncate(), would it be possible to deprecate it and alias it to "resize()" ? It's not very gratifying to have duplicated methods at the beginning of a major release, but I feel too that "truncate" is a misleading term, that had better be replaced asap.
There's something to be said for that, but there's also something to be said for following established conventions, and there's a long-established precedent from the C library for having a function called truncate() that behaves this way.
But Python isn't C. :-)
On Fri, Sep 18, 2009 at 2:17 PM, Pascal Chambon
- it is unclear what truncate() methods do with the file pointer, and even if the current implementation simply moves it to the truncation point, it's very contrary to the standard way of doing under unix, where the file pointer is normally left unchanged. Shouldn't we specify that the file pointer remains unmoved, and fix the _fileio module accordingly ?
+1 on having consistent, documented behavior (regardless of what that behavior is :) ).
- exceptions are not always specified, and even if most of them are IOErrors, weirdly, in some cases, an OSError is raised instead (ie, if we try to wrap a wrong file descriptor when instanciating a new FileIO). This might lead to bad program crashes if some people don't "refuse the temptation to guess" and only get prepared to catch IOErrors
I'd wager that you may also get a WindowsError in some cases, on Windows systems. Part of the reason for having several different, but similar, exceptions is that they may contain operating system specific error codes and the type of exception helps the programmer figure out how to interpret those codes. I'm not really clear on when IOError is preferred over OSError, but I know that WindowsError necessarily uses a completely different error numbering system. The careful programmer should catch EnvironmentError, which is the base class of all of these different kinds of related errors. +1 on documenting that the methods may raise an EnvironmentError.
- the doc sometimes says that when we receive an empty string from a read() operation, without exceptions, it means the file is empty. However, with the current implementation, if we call file.read(0), we simply receive "", even though it doesn't mean that we're at EOF. Shouldn't we avoid this (rare, I admit) ambiguity on the return value, by preventing read(0) ? Or at least, note in the doc that (we receive an empty string) <-> (the file is at EOF OR we called read with 0 as parameter) ?
Some programs may rely on read(0) and the behavior matches the behavior of POSIX, so I'm -1 on changing the behavior. +1 on documenting the exception to the rule.
Are there some arguments that I don't know, which lead to this or that particular implementation choice ?
The original I/O PEP and some of the code was put together during a sprint at PyCon 2007. Our primary goal was to get a decent first cut of the new I/O system put together quickly. For nitty-gritty details and corner-cases like these, there's a good chance that the current undocumented behavior is simply an accident of implementation. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
Le Fri, 18 Sep 2009 21:17:29 +0200, Pascal Chambon a écrit : Hello, First, thanks for experimenting with this. (as a sidenote, we lack real-world testing of non-blocking features, perhaps you want to take a look)
I'm currently working on a reimplementation of io.FileIO, which would allow cross-platform file range locking and all kinds of other safety features ;
Hmm, do you *have* to reimplement it in order to add these new features, or is it just a personal preference?
- it is unclear what truncate() methods do with the file pointer, and even if the current implementation simply moves it to the truncation point, it's very contrary to the standard way of doing under unix, where the file pointer is normally left unchanged. Shouldn't we specify that the file pointer remains unmoved, and fix the _fileio module accordingly ?
Well, first Python and its IO library are cross-platform, so the behaviour is not always identical to POSIX behaviour, especially where Windows and Unix have different specs. Second, now that 3.1 is in the wild, we should be reluctant to change the behaviour just to make it more conformant to what POSIX people can expect. What might be convincing would be an actual use case where POSIX- like behaviour would be significantly more useful than the current on.
- exceptions are not always specified, and even if most of them are IOErrors, weirdly, in some cases, an OSError is raised instead (ie, if we try to wrap a wrong file descriptor when instanciating a new FileIO).
This is not different than with 2.x here. If you want to trap both OSError and IOError, use EnvironmentError (which is the common base class for both). I agree it is slightly annoying and not well-defined, however. Also, Python can hardly determine by itself whether an error is caused by IO problems or an OS-level malfunction, so the distinction is a bit fallacious.
However, with the current implementation, if we call file.read(0), we simply receive "", even though it doesn't mean that we're at EOF. Shouldn't we avoid this (rare, I admit) ambiguity on the return value, by preventing read(0) ?
Well, if you are asking for 0 bytes, it returns 0 bytes. It's not that ambiguous, and it helps avoid special-casing the 0 case :)
So wouldn't it be a godo idea to write some kind of mini-pep, just to fix the corner cases of the current IO documentation ?
Improvements, either to the docs or to the implementation, are always welcome. I think you already know where to post them! Regards Antoine.
participants (14)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Daniel Stutzbach
-
Greg Ewing
-
Guido van Rossum
-
James Y Knight
-
Matthew Barnett
-
MRAB
-
Nick Coghlan
-
Pascal Chambon
-
Pascal Chambon
-
R. David Murray
-
Stephen J. Turnbull
-
Steven D'Aprano