That's incredibly interesting. I've never used mmap before.
However, there's a problem.
I did a few experiments with mmap now, this is the latest:
path = pathlib.Path(r'P:\huge_file')
with path.open('r') as file: mmap = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) for match in re.finditer(b'.', mmap): pass
The file is 338GB in size, and it seems that Python is trying to load it into memory. The process is now taking 4GB RAM and it's growing. I saw the same behavior when searching for a non-existing match.
Should I open a Python bug for this?
On Sun, Oct 7, 2018 at 7:49 PM email@example.com wrote:
On 18-10-07 16.15, Ram Rachum wrote:
I tested it now and indeed bytes patterns work on memoryview objects. But how do I use this to scan for patterns through a stream without loading it to memory?
An mmap object is one of the things you can make a memoryview of, although looking again, it seems you don't even need to, you can just re.search the mmap object directly.
re.search'ing the mmap object means the operating system takes care of the streaming for you, reading in parts of the file only as necessary.
Python-ideas mailing list Pythonfirstname.lastname@example.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/