Writing an emulator in python - implementation questions (for performance)

Santiago Romero sromero at gmail.com
Thu Nov 12 15:37:55 CET 2009

> >  I'm trying to port (just for fun), my old Sinclair Spectrum emulator,
> > ASpectrum, from C to Python + pygame.
> The answer to your question is, "Use numpy".  More details below.

 Let's see :-)

> >  How can I implement this in Python, I mean, define a 16 byte variable
> > so that high and low bytes can be accessed separately and changing W,
> > H or L affects the entire variable? I would like to avoid doing BIT
> > masks to get or change HIGH or LOW parts of a variable and let the
> > compiled code to do it by itself.
> You can do clever memory slicing like this with numpy.  For instance:
> breg = numpy.zeros((16,),numpy.uint8)
> wreg = numpy.ndarray((8,),numpy.uint16,breg)
> This causes breg and wreg to share the same 16 bytes of memory.  You
> can define constants to access specific registers:
> R1L = 1
> R1H = 2
> R1 = 1
> breg[R1H] = 2
> print wreg[R1]

 And how about speed?

Assuming a 16 bit register named BC which contains 2 8 bit regiters (B
and C)...

 Will the above be faster than shifts and bit operations (<<, and,
>> ) with new B and C values to "recalculate" BC when reading or
changing either B, C or BC?

> >  Is python's array module the best (and fastest) implementation to
> > "emulate" the memory?
> I'd use numpy for this as well.  (I doubt the Z80 had a 16-bit bus,
> but if it did you could use the same memory sharing trick I showed you
> with the registers to simulate word reads and writes.)

 Z80 has a 16 bit ADDRESS bus, 8 bit DATA bus. This means you can
address from 0 to 65535 memory cells of 8 bytes. Z80 has 16 bit bus
operations, but currently in C I write 16 bit values as two 8 bit
consecutive values without using (unsigned short *) pointers. But it
seems that numpy would allow me to do it better than C in this case...

> Note that, when performing operations on single values, neither numpy
> nor array module are necessarily a lot faster than Python lists, might
> even be slower.  But they use a LOT less memory, which is important
> for largish arrays.

 Well, we're talking about a 128KB 1-byte array, that's the maximum
memory size a Sinclair Spectrum can have, and always by using page
swapping of 16KB blocks in a 64KB addressable space...

 If you really think that python lists can be faster that numpy or
array modules, let me know.

 Maybe I'll make a "benchmark test", by making some millions of read /
write operations and timing the results.

 I wan't to write the emulator as "pythonic" as possible...

> >  I don't know how to emulate paging in python...
> numpy again.  This would mean you'd have to fiddle with addresses a
> bit, but that shouldn't be too big a deal.  Create the physical
> memory:
> mem = numpy.zeros((128*1024,),numpy.uint8)

 A 128K array of zeroes...

> Then create the pages.  (This is a regular Python list containing
> numpy slices. numpy slices share memory so there is no copying of
> underlying data.)
> page = [mem[0:16*1024],
>         mem[16*1024:32*1024],
>         mem[32*1024:48*1024],
>         mem[48*1024:64*1024]]

 Those are just like pointers to the "mem" numpy array, pointing to
concrete start indexes, aren't they?

> To access the byte at address 42432, you'd have use bit operations to
> get a page number and index (2 and 9664 in this case), then you can
> access the memory like this:

 Do you mean:

    page = address / 16384
    index = address MOD 16384


 Or, better, with:

  page = address >> 14
  index = address & 16383


> page[2][9664] = 33
> p = page[3][99]
> To swap a page, reassign the slice of main memory,
> page[2] = mem[96*1024:112*1024]
> Now, accessing address 42432 will access a byte from a different page.

 But the above calculations (page, index) wouldn't be slower than a
simple play 64KB numpy array (and make no paging mechanism when
reading or writing) and copy memory slices when page swappings are

> If you don't want to fiddle with indirect pages and would just rather
> copy memory around when a page swap occurs, you can do that, too.
> (Assigning to a slice copies the data rather than shares.)  I don't
> know if it's as fast as memset but it should be pretty quick.

 That's what I was talking about.
 With "page and index" I'm slowing down EACH memory read and write,
and that includes opcode and data fetching...

 With "memory copying in page swaps", memory is always read and write
quickly, and if "slice copies" are fast enough, the emulation will be
>100% of speed (I hope) for a 3.5Mhz system ...

> Hope these brief suggestions help.  If you don't want third party
> libraries, then numpy will be of no use.  But I guess if you're using
> pygame third party modules are ok.  So get numpy, it'll make things a
> lot easier.  It can be a bit daunting to learn, though.

 Yes, you've been very helpful!

 How about numpy portability? Is array more portable?

 And finally, do you think I'm doing right by using global variables
for registers, memory, and so, or should I put ALL into a single
object and pass it to functions?

 Ummm ... I think I'm going to do some tests with some "for i in range
(1,100000000000)" + time :-)

 Thanks a lot for your help :-)

 Any other suggestion is welcome.

More information about the Python-list mailing list