The file is 338GB in size, and it seems that Python is trying to load it into memory. The process is now taking 4GB RAM and it's growing. I saw the same behavior when searching for a non-existing match.path = pathlib.Path(r'P:\huge_file')with path.open('r') as file:mmap = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ)for match in re.finditer(b'.', mmap):pass
On 18-10-07 16.15, Ram Rachum wrote:
> I tested it now and indeed bytes patterns work on memoryview objects.
> But how do I use this to scan for patterns through a stream without
> loading it to memory?
An mmap object is one of the things you can make a memoryview of,
although looking again, it seems you don't even need to, you can
just re.search the mmap object directly.
re.search'ing the mmap object means the operating system takes care of
the streaming for you, reading in parts of the file only as necessary.
Python-ideas mailing list
Code of Conduct: http://python.org/psf/codeofconduct/