Mailman 3 Use __bytes__ to access buffer-protocol from "user-land" - Python-ideas

15 Nov 2020

      I am working on a toolbox for computer-archaeology where old data media are "excavated" and presented on a web-page. (https://github.com/Datamuseum-DK/AutoArchaeologist for anybody who cares).

Since these data-media can easily sum tens of gigabytes, mmap and virtual memory is my weapons of choice and that has brought me into an obscure corner of python where few people seem to venture:  I want to access the buffer-protocol from "userland".

The fundamental problem is that if I have a image of a disk and it has 2 partitions, I end up with the "mmap.mmap" object that mapped the raw disk image, and two "bytes" or "bytearray" objects, each containing one partition, for a total memory footprint of twice the size of the disk.

As the tool dives into the filesystems in the partitions and creates objects for the individual files in the filesystem, that grows to three times the size of the disk etc.

To avoid this, I am writing a "bytes-like" scatter-gather class (not yet committed), and that is fine as far as it goes.

If I want to write one of my scatter-gather objects to disk, I have to:

    fd.write(bytes(myobj))

As a preliminary point, I think that is just wrong:  A class with a __bytes__ method should satisfy any needs the buffer-protocol might have, so this should work:

   fd.write(myobj)

But taking this a little bit further, I think __bytes__ should be allowed to be an iterator, provided the object also offers __len__, so that this would work:

    class bar():

        def __len__(self):
            return 3

        def __bytes__(self):
            yield b'0'
            yield b'1'
            yield b'2'

    open("/tmp/_", "wb").write(foo())

This example is of course trivial, but hav the yield statements hand out hundreds of megabytes, and the savings in time and malloc-space becomes very tangible.

Poul-Henning

Use bytes to access buffer-protocol from "user-land"

phk＠freebsd.dk

Ben Rudiak-Gould

Serhiy Storchaka

tags

participants (3)