[docs] [issue26158] File truncate() not defaulting to current position as documented

Eryk Sun report at bugs.python.org
Tue Jan 19 18:34:46 EST 2016


Eryk Sun added the comment:

FYI, you can parse the cookie using struct or ctypes. For example:

    class Cookie(ctypes.Structure):
        _fields_ = (('start_pos',     ctypes.c_longlong),
                    ('dec_flags',     ctypes.c_int),
                    ('bytes_to_feed', ctypes.c_int),
                    ('chars_to_skip', ctypes.c_int),
                    ('need_eof',      ctypes.c_byte))

In the simple case only the buffer start_pos is non-zero, and the result of tell() is just the 64-bit file pointer. In Serhiy's UTF-7 example it needs to also convey the bytes_to_feed and chars_to_skip values:

    >>> f.tell()
    680564735109527527154978616360239628288
    >>> cookie_bytes = f.tell().to_bytes(ctypes.sizeof(Cookie), sys.byteorder)
    >>> state = Cookie.from_buffer_copy(cookie_bytes)
    >>> state.start_pos
    0
    >>> state.dec_flags
    0
    >>> state.bytes_to_feed
    16
    >>> state.chars_to_skip
    2
    >>> state.need_eof
    0

So a seek(0, SEEK_CUR) in this case has to seek the buffer to 0, read and decode 16 bytes, and skip 2 characters. 

Isn't this solvable at least for the case of truncating, Martin? It could do a tell(), seek to the start_pos, read and decode the bytes_to_feed, re-encode the chars_to_skip, seek back to the start_pos, write the encoded characters, and then truncate.

    >>> f = open('temp.txt', 'w+', encoding='utf-7')
    >>> f.write(b'+BDAEMQQyBDMENA-'.decode('utf-7'))
    5
    >>> _ = f.seek(0); f.read(2)
    'аб'
    >>> cookie_bytes = f.tell().to_bytes(sizeof(Cookie), byteorder)
    >>> state = Cookie.from_buffer_copy(cookie_bytes)
    >>> f.buffer.seek(state.start_pos)
    0
    >>> buf = f.buffer.read(state.bytes_to_feed)
    >>> s = buf.decode(f.encoding)[:state.chars_to_skip]
    >>> f.buffer.seek(state.start_pos)
    0
    >>> f.buffer.write(s.encode(f.encoding))
    8
    >>> f.buffer.truncate()
    8
    >>> f.close()
    >>> open('temp.txt', encoding='utf-7').read()
    'аб'

Rewriting the encoded bytes is necessary to properly terminate the UTF-7 sequence, which makes me doubt whether this simple approach will work for all codecs. But something like this is possible, no?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26158>
_______________________________________


More information about the docs mailing list