For some time now, I've wanted to suggest a better abstraction for the <file> type in Python. It currently uses an antiquated, low-level C-style interface for moving around in a file, with methods like tell() and seek(). But after attributes were introduced to Python, it seems like it should be re-evaluated. Let file-type have an attribute .pos for position. Now you can get rid of the seek() and tell() methods and manipulate the file pointer by the more standard and familiar arithmetic operations:
file.pos = x0ae1 #move file pointer to an absolute address file.pos += 1 #increment the file pointer one byte curr_pos = file.pos #read current file pointer
You've now simplified the API by the removal of two obscure legacy methods (where one has to learn the additional concept of "absolute" and "relative" addressing) and replaced them with a more basic one called "position". Thoughts? markj
On 24 September 2012 18:49, Mark Adam <dreamingforward@gmail.com> wrote:
For some time now, I've wanted to suggest a better abstraction for the <file> type in Python. It currently uses an antiquated, low-level C-style interface for moving around in a file, with methods like tell() and seek(). But after attributes were introduced to Python, it seems like it should be re-evaluated.
Let file-type have an attribute .pos for position. Now you can get rid of the seek() and tell() methods and manipulate the file pointer by the more standard and familiar arithmetic operations:
file.pos = x0ae1 #move file pointer to an absolute address file.pos += 1 #increment the file pointer one byte curr_pos = file.pos #read current file pointer
You've now simplified the API by the removal of two obscure legacy methods (where one has to learn the additional concept of "absolute" and "relative" addressing) and replaced them with a more basic one called "position".
Thoughts?
-1 This is not so distant from what can be achieved trivially by tell and seek. Moreover, event though changes in attributes _can_ be made to have side effects in Python objects, it does not mean it is easier to read and maintain in every case. What I think we need is a better way of dealing with constants - the "whence" attribute for "seek" takes raw ints for "from start", "from end" and "relative" - but that is an entirely other subject. js -><-
markj _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On 9/24/12, Mark Adam <dreamingforward@gmail.com> wrote:
For some time now, I've wanted to suggest a better abstraction for the <file> type in Python. It currently uses an antiquated, low-level C-style interface for moving around in a file, with methods like tell() and seek().
I agree, but I'm not sure the improvement can be *enough* of an improvement to justify the cost of change.
file.pos = x0ae1 #move file pointer to an absolute address file.pos += 1 #increment the file pointer one byte
For text files, I would expect it to be a character count rather than a byte count. So this particular proposal might end up adding as much confusion as it hopes to remove. -jJ
Also you can't express lseek()'s "relative to end of file" mode using the proposed API. -1 on the whole thing. On Thu, Sep 27, 2012 at 12:40 PM, Jim Jewett <jimjjewett@gmail.com> wrote:
On 9/24/12, Mark Adam <dreamingforward@gmail.com> wrote:
For some time now, I've wanted to suggest a better abstraction for the <file> type in Python. It currently uses an antiquated, low-level C-style interface for moving around in a file, with methods like tell() and seek().
I agree, but I'm not sure the improvement can be *enough* of an improvement to justify the cost of change.
file.pos = x0ae1 #move file pointer to an absolute address file.pos += 1 #increment the file pointer one byte
For text files, I would expect it to be a character count rather than a byte count. So this particular proposal might end up adding as much confusion as it hopes to remove.
-jJ _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido)
On Thu, Sep 27, 2012 at 4:00 PM, Guido van Rossum <guido@python.org> wrote:
Also you can't express lseek()'s "relative to end of file" mode using the proposed API. -1 on the whole thing.
You could use negative indexes, which is consistent with subscript and slice interfaces. I still don't know that this is a good idea, but I'm just saying. If someone wants a more sequence-like interface to files, they should use mmap
On Thu, Sep 27, 2012 at 12:40 PM, Jim Jewett <jimjjewett@gmail.com> wrote:
On 9/24/12, Mark Adam <dreamingforward@gmail.com> wrote:
For some time now, I've wanted to suggest a better abstraction for the <file> type in Python. It currently uses an antiquated, low-level C-style interface for moving around in a file, with methods like tell() and seek().
I agree, but I'm not sure the improvement can be *enough* of an improvement to justify the cost of change.
file.pos = x0ae1 #move file pointer to an absolute address file.pos += 1 #increment the file pointer one byte
For text files, I would expect it to be a character count rather than a byte count. So this particular proposal might end up adding as much confusion as it hopes to remove.
-jJ _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy
On 28/09/12 06:00, Guido van Rossum wrote:
Also you can't express lseek()'s "relative to end of file" mode using the proposed API. -1 on the whole thing.
For what it's worth, there was extensive discussion on comp.lang.python that eventually decided that while you could express all the various invocations of seek using file.pos, at best you save two characters of typing and the whole thing isn't worth the change. http://mail.python.org/pipermail/python-list/2012-September/thread.html#6315... Personally, I think the proposal has died a natural death, but if anyone wants to resuscitate it, I encourage them to read the above thread before doing so. -- Steven
On 2012-09-27 20:40, Jim Jewett wrote:
On 9/24/12, Mark Adam <dreamingforward@gmail.com> wrote:
For some time now, I've wanted to suggest a better abstraction for the <file> type in Python. It currently uses an antiquated, low-level C-style interface for moving around in a file, with methods like tell() and seek().
I agree, but I'm not sure the improvement can be *enough* of an improvement to justify the cost of change.
file.pos = x0ae1 #move file pointer to an absolute address file.pos += 1 #increment the file pointer one byte
For text files, I would expect it to be a character count rather than a byte count. So this particular proposal might end up adding as much confusion as it hopes to remove.
In the talk about how to seek to the end of the file with file.pos, it was suggested that negative positions and None could be used. I wonder whether they could be used with seek. For example: file.seek(-10) # Seek 10 bytes from the end. file.seek(None) # Seek to the end.
On Sep 27, 2012, at 1:07 PM, MRAB wrote:
On 2012-09-27 20:40, Jim Jewett wrote:
On 9/24/12, Mark Adam <dreamingforward@gmail.com> wrote:
For some time now, I've wanted to suggest a better abstraction for the <file> type in Python. It currently uses an antiquated, low-level C-style interface for moving around in a file, with methods like tell() and seek().
I agree, but I'm not sure the improvement can be *enough* of an improvement to justify the cost of change.
file.pos = x0ae1 #move file pointer to an absolute address file.pos += 1 #increment the file pointer one byte
For text files, I would expect it to be a character count rather than a byte count. So this particular proposal might end up adding as much confusion as it hopes to remove.
In the talk about how to seek to the end of the file with file.pos, it was suggested that negative positions and None could be used.
I wonder whether they could be used with seek. For example:
file.seek(-10) # Seek 10 bytes from the end. file.seek(None) # Seek to the end.
file.seek(0, os.SEEK_END) is a lot clearer than file.seek(None). -- Philip Jenvey
Jim Jewett wrote:
For text files, I would expect it to be a character count rather than a byte count. So this particular proposal might end up adding as much confusion as it hopes to remove.
I'm given to understand that the file positions used by the C standard library are supposed to be treated as opaque tokens -- you're not guaranteed to be able to perform arithmetic on them. -- Greg
On 2012-09-28 03:07, Greg Ewing wrote:
Jim Jewett wrote:
For text files, I would expect it to be a character count rather than a byte count. So this particular proposal might end up adding as much confusion as it hopes to remove.
I'm given to understand that the file positions used by the C standard library are supposed to be treated as opaque tokens -- you're not guaranteed to be able to perform arithmetic on them.
Yet you're allowed to do relative seeks? Does that mean that the file position basically works with some undefined units (bytes, characters, whatever)?
2012/9/28 MRAB <python@mrabarnett.plus.com>:
On 2012-09-28 03:07, Greg Ewing wrote:
Jim Jewett wrote:
For text files, I would expect it to be a character count rather than a byte count. So this particular proposal might end up adding as much confusion as it hopes to remove.
I'm given to understand that the file positions used by the C standard library are supposed to be treated as opaque tokens -- you're not guaranteed to be able to perform arithmetic on them.
Yet you're allowed to do relative seeks? Does that mean that the file position basically works with some undefined units (bytes, characters, whatever)?
See the documentation: http://docs.python.org/library/io.html#io.TextIOBase.seek With text streams, SEEK_CUR and SEEK_END only accept offset=0 (i.e. no move, or go to EOF) and SEEK_SET accepts a "cookie" which was returned a previous tell(). This cookie will often look like the absolute file position, but it also has to contain the codec status, which will be nontrivial for variable-length encodings. -- Amaury Forgeot d'Arc
participants (11)
-
Amaury Forgeot d'Arc
-
Calvin Spealman
-
Devin Jeanpierre
-
Greg Ewing
-
Guido van Rossum
-
Jim Jewett
-
Joao S. O. Bueno
-
Mark Adam
-
MRAB
-
Philip Jenvey
-
Steven D'Aprano