[Python-ideas] Support parsing stream with `re`

Ram Rachum ram at rachum.com
Mon Oct 8 08:11:14 EDT 2018


" Windows will aggressively fill up your RAM in cases like this because
after all why not?  There's no use to having memory just sitting around
unused."

Two questions:

1. Is the "why not" sarcastic, as in you're agreeing it's a waste?
2. Will this be different on Linux? Which command do I run on Linux to
verify that the process isn't taking too much RAM?


Thanks,
Ram.


On Mon, Oct 8, 2018 at 3:02 PM Erik Bray <erik.m.bray at gmail.com> wrote:

> On Mon, Oct 8, 2018 at 12:20 PM Cameron Simpson <cs at cskk.id.au> wrote:
> >
> > On 08Oct2018 10:56, Ram Rachum <ram at rachum.com> wrote:
> > >That's incredibly interesting. I've never used mmap before.
> > >However, there's a problem.
> > >I did a few experiments with mmap now, this is the latest:
> > >
> > >path = pathlib.Path(r'P:\huge_file')
> > >
> > >with path.open('r') as file:
> > >    mmap = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ)
> >
> > Just a remark: don't tromp on the "mmap" name. Maybe "mapped"?
> >
> > >    for match in re.finditer(b'.', mmap):
> > >        pass
> > >
> > >The file is 338GB in size, and it seems that Python is trying to load it
> > >into memory. The process is now taking 4GB RAM and it's growing. I saw
> the
> > >same behavior when searching for a non-existing match.
> > >
> > >Should I open a Python bug for this?
> >
> > Probably not. First figure out what is going on. BTW, how much RAM have
> you
> > got?
> >
> > As you access the mapped file the OS will try to keep it in memory in
> case you
> > need that again. In the absense of competition, most stuff will get
> paged out
> > to accomodate it. That's normal. All the data are "clean" (unmodified)
> so the
> > OS can simply release the older pages instantly if something else needs
> the
> > RAM.
> >
> > However, another possibility is the the regexp is consuming lots of
> memory.
> >
> > The regexp seems simple enough (b'.'), so I doubt it is leaking memory
> like
> > mad; I'm guessing you're just seeing the OS page in as much of the file
> as it
> > can.
>
> Yup. Windows will aggressively fill up your RAM in cases like this
> because after all why not?  There's no use to having memory just
> sitting around unused.  For read-only, non-anonymous mappings it's not
> much problem for the OS to drop pages that haven't been recently
> accessed and use them for something else.  So I wouldn't be too
> worried about the process chewing up RAM.
>
> I feel like this is veering more into python-list territory for
> further discussion though.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20181008/9e061c18/attachment.html>


More information about the Python-ideas mailing list