[Python-ideas] Support parsing stream with `re`
Ram Rachum
ram at rachum.com
Mon Oct 8 03:56:15 EDT 2018
That's incredibly interesting. I've never used mmap before.
However, there's a problem.
I did a few experiments with mmap now, this is the latest:
path = pathlib.Path(r'P:\huge_file')
with path.open('r') as file:
mmap = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ)
for match in re.finditer(b'.', mmap):
pass
The file is 338GB in size, and it seems that Python is trying to load it
into memory. The process is now taking 4GB RAM and it's growing. I saw the
same behavior when searching for a non-existing match.
Should I open a Python bug for this?
On Sun, Oct 7, 2018 at 7:49 PM <2015 at jmunch.dk> wrote:
> On 18-10-07 16.15, Ram Rachum wrote:
> > I tested it now and indeed bytes patterns work on memoryview objects.
> > But how do I use this to scan for patterns through a stream without
> > loading it to memory?
>
> An mmap object is one of the things you can make a memoryview of,
> although looking again, it seems you don't even need to, you can
> just re.search the mmap object directly.
>
> re.search'ing the mmap object means the operating system takes care of
> the streaming for you, reading in parts of the file only as necessary.
>
> regards, Anders
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20181008/a260385d/attachment.html>
More information about the Python-ideas
mailing list