[Python-ideas] Support parsing stream with `re`

Chris Angelico rosuav at gmail.com
Mon Oct 8 10:28:01 EDT 2018


On Mon, Oct 8, 2018 at 11:15 PM Anders Hovmöller <boxed at killingar.net> wrote:
>
>
> However, another possibility is the the regexp is consuming lots of memory.
>
> The regexp seems simple enough (b'.'), so I doubt it is leaking memory like
> mad; I'm guessing you're just seeing the OS page in as much of the file as it
> can.
>
>
> Yup. Windows will aggressively fill up your RAM in cases like this
> because after all why not?  There's no use to having memory just
> sitting around unused.  For read-only, non-anonymous mappings it's not
> much problem for the OS to drop pages that haven't been recently
> accessed and use them for something else.  So I wouldn't be too
> worried about the process chewing up RAM.
>
> I feel like this is veering more into python-list territory for
> further discussion though.
>
>
> Last time I worked on windows, which admittedly was a long time, the file cache was not attributed to a process, so this doesn't seem to be relevant to this situation.

Depends whether it's a file cache or a memory-mapped file, though. On
Linux, if I open a file, read it, then close it, I'm not using that
file any more, but it might remain in cache (which will mean that
re-reading it will be fast, regardless of whether that's from the same
or a different process). That usage shows up as either "buffers" or
"cache", and doesn't belong to any process.

In contrast, a mmap'd file is memory that you do indeed own. If the
system runs short of physical memory, it can simply discard those
pages (rather than saving them to the swap file), but they're still
owned by one specific process, and should count in that process's
virtual memory.

(That's based on my knowledge of Linux today and OS/2 back in the 90s.
It may or may not be accurate to Windows, but I suspect it won't be
very far wrong.)

ChrisA


More information about the Python-ideas mailing list