" Windows will aggressively fill up your RAM in cases like this because after all why not?  There's no use to having memory just sitting around unused."

Two questions:

1. Is the "why not" sarcastic, as in you're agreeing it's a waste?
2. Will this be different on Linux? Which command do I run on Linux to verify that the process isn't taking too much RAM?


Thanks,
Ram.


On Mon, Oct 8, 2018 at 3:02 PM Erik Bray <erik.m.bray@gmail.com> wrote:
On Mon, Oct 8, 2018 at 12:20 PM Cameron Simpson <cs@cskk.id.au> wrote:
>
> On 08Oct2018 10:56, Ram Rachum <ram@rachum.com> wrote:
> >That's incredibly interesting. I've never used mmap before.
> >However, there's a problem.
> >I did a few experiments with mmap now, this is the latest:
> >
> >path = pathlib.Path(r'P:\huge_file')
> >
> >with path.open('r') as file:
> >    mmap = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ)
>
> Just a remark: don't tromp on the "mmap" name. Maybe "mapped"?
>
> >    for match in re.finditer(b'.', mmap):
> >        pass
> >
> >The file is 338GB in size, and it seems that Python is trying to load it
> >into memory. The process is now taking 4GB RAM and it's growing. I saw the
> >same behavior when searching for a non-existing match.
> >
> >Should I open a Python bug for this?
>
> Probably not. First figure out what is going on. BTW, how much RAM have you
> got?
>
> As you access the mapped file the OS will try to keep it in memory in case you
> need that again. In the absense of competition, most stuff will get paged out
> to accomodate it. That's normal. All the data are "clean" (unmodified) so the
> OS can simply release the older pages instantly if something else needs the
> RAM.
>
> However, another possibility is the the regexp is consuming lots of memory.
>
> The regexp seems simple enough (b'.'), so I doubt it is leaking memory like
> mad; I'm guessing you're just seeing the OS page in as much of the file as it
> can.

Yup. Windows will aggressively fill up your RAM in cases like this
because after all why not?  There's no use to having memory just
sitting around unused.  For read-only, non-anonymous mappings it's not
much problem for the OS to drop pages that haven't been recently
accessed and use them for something else.  So I wouldn't be too
worried about the process chewing up RAM.

I feel like this is veering more into python-list territory for
further discussion though.