[Python-ideas] Support parsing stream with `re`

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Wed Oct 10 01:27:56 EDT 2018


Chris Angelico writes:
 > On Wed, Oct 10, 2018 at 5:09 AM Stephen J. Turnbull
 > <turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
 > >
 > > Chris Angelico writes:
 > >
 > >  > Both processes are using the virtual memory. Either or both could be
 > >  > using physical memory. Assuming they haven't written to the pages
 > >  > (which is the case with executables - the system mmaps the binary into
 > >  > your memory space as read-only), and assuming that those pages are
 > >  > backed by physical memory, which process is using that memory?
 > >
 > > One doesn't know.  Clever side-channel attacks aside, I don't care,
 > > and I don't see how it could matter.
 > 
 > It matters a lot when you're trying to figure out what your system
 > is doing.

Sure, but knowing how your system works is far more important.  Eg,
create a 1TB file on a POSIX system, delete it while a process still
has it opened, and it doesn't matter how you process the output of du
or ls, you still have 1TB of used file space not accounted for.  The
same applies to swapfiles.  But "df" knows and will tell you.

In fact, "ps" will tell you how much shared memory a process is using.

I just don't see a problem here, on the "I'm not getting the data I
need" side.  You do have access to the data you need.

 > >  > >  > Tell me, which process is responsible for libc being in memory?
 > >  > >  > Other than, like, all of them?
 > >  > >
 > >  > > Yes.  Why would you want a different answer?
 > >  >
 > >  > Because that would mean that I have way more *physical* memory in use
 > >  > than I actually have chips on the motherboard for.
 > >
 > > No, that's like saying that because you have multiple links to a file
 > > on disk you're using more physical disk than you have.
 > 
 > Actually, that's exactly the same problem, with exactly the same
 > consequences. How do you figure out why your disk is full? How do you
 > enforce disk quotas? How can you get any sort of reasonable metrics
 > about anything when the sum of everything vastly exceeds the actual
 > usage?

You add up the right things, of course, and avoid paradoxes.

The disk quota enforcement problem is indeed hard.  This sounds to me
like a special problem studied in cost accounting, a problem which was
solved (a computation that satisfies certain axioms was shown to exist
and be unique) in a sense by Aumann and Shapley in the 1970s.  The A-S
prices have been used by telephone carriers to allocate costs of fixed
assets with capacity constraints to individual calls, though I don't
know if the method is still in use.  I'm not sure if the disk quota
problem fits the A-S theorem (which imposes certain monotonicity
conditions), but the general paradigm does.

However, the quota problem (and in general, the problem of allocation
of overhead) is "hard" even if you have complete information about the
system, because it's a values problem: what events are bad? what
events are worse? what events are unacceptable (result in bankruptcy
and abandonment of the system)?  Getting very complete, accurate
information about the physical consequences of individual events in
the system (linking to a file on disk, allocating a large quantity of
viritual memory) is not difficult, in the sense that you throw money
and engineers at it, and you get "df".  Getting very complete,
accurate information about the values you're trying to satisfy is
possible only for an omniscient god, even if, as in business, they can
be measured in currency units.

Steve


More information about the Python-ideas mailing list