Newbie question about text encoding
Dave Angel
davea at davea.name
Fri Feb 27 02:30:46 EST 2015
On 02/27/2015 12:58 AM, Steven D'Aprano wrote:
> Dave Angel wrote:
>
>> (Although I believe Seymour Cray was quoted as saying that virtual
>> memory is a crock, because "you can't fake what you ain't got.")
>
> If I recall correctly, disk access is about 10000 times slower than RAM, so
> virtual memory is *at least* that much slower than real memory.
>
It's so much more complicated than that, that I hardly know where to
start. I'll describe a generic processor/OS/memory/disk architecture;
there will be huge differences between processor models even from a
single manufacturer.
First, as soon as you add swapping logic to your
processor/memory-system, you theoretically slow it down. And in the
days of that quote, Cray's memory was maybe 50 times as fast as the
memory used by us mortals. So adding swapping logic would have slowed
it down quite substantially, even when it was not swapping. But that
logic is inside the CPU chip these days, and presumably thoroughly
optimized.
Next, statistically, a program uses a small subset of its total program
& data space in its working set, and the working set should reside in
real memory. But when the program greatly increases that working set,
and it approaches the amount of physical memory, then swapping becomes
more frenzied, and we say the program is thrashing. Simple example, try
sorting an array that's about the size of available physical memory.
Next, even physical memory is divided into a few levels of caching, some
on-chip and some off. And the caching is done in what I call strips,
where accessing just one byte causes the whole strip to be loaded from
non-cached memory. I forget the current size for that, but it's maybe
64 to 256 bytes or so.
If there are multiple processors (not multicore, but actual separate
processors), then each one has such internal caches, and any writes on
one processor may have to trigger flushes of all the other processors
that happen to have the same strip loaded.
The processor not only prefetches the next few instructions, but decodes
and tentatively executes them, subject to being discarded if a
conditional branch doesn't go the way the processor predicted. So some
instructions execute in zero time, some of the time.
Every address of instruction fetch, or of data fetch or store, goes
through a couple of layers of translation. Segment register plus offset
gives linear address. Lookup those in tables to get physical address,
and if table happens not to be in on-chip cache, swap it in. If
physical address isn't valid, a processor exception causes the OS to
potentially swap something out, and something else in.
Once we're paging from the swapfile, the size of the read is perhaps 4k.
And that read is regardless of whether we're only going to use one
byte or all of it.
The ratio between an access which was in the L1 cache and one which
required a page to be swapped in from disk? Much bigger than your
10,000 figure. But hopefully it doesn't happen a big percentage of the
time.
Many, many other variables, like the fact that RAM chips are not
directly addressable by bytes, but instead count on rows and columns.
So if you access many bytes in the same row, it can be much quicker than
random access. So simple access time specifications don't mean as much
as it would seem; the controller has to balance the RAM spec with the
various cache requirements.
--
DaveA
More information about the Python-list
mailing list