Hi all,

Thought I would chime in since this is also of interest to me. I have two distinct projects related to hardware simulation, one of which is very well suited towards the RPython toolchain (Pydgin), and another which is much more closely related to MyHDL (PyMTL). Sarah kindly pointed out that Pydgin is relevant here (thanks Sarah!), but I thought it might be useful to briefly summarize both Pydgin and PyMTL to explain why one uses RPython and the other does not.

[[ This turned out to be really long, so feel free to check out the TL;DR at the end. ]]

== PyMTL==

PyMTL (https://github.com/cornell-brg/pymtl) is a framework for hardware design. If you just use PyMTL for RTL-modeling it could very much be considered an alternative to MyHDL: special Python types are provided to model fixed-bitwidth logic that can be simulated directly in Python. Once this behavioral logic is verified in Python simulation, and provided the logic is described in a sufficiently restricted subset of Python, we can then translate it into Verilog. We do not have a VHDL backend.

More generally, PyMTL supports multi-level simulation, so that functional-level, cycle-level, and register-transfer level logic can be described, composed, and simulated together. This, along with the way we construct our models (we use Python classes like Verilog modules), makes us a bit different than MyHDL. However, I think the mechanism we use to translate RTL logic into Verilog is quite similar: we have our own "translator" to manually walk the AST of behavioral logic, infer types, and generate Verilog source.

[[ Aside: for a clarification of the distinctions between functional-level, cycle-level, and RTL modeling you can see a brief summary here: http://morepypy.blogspot.com/2015/03/pydgin-using-rpython-to-generate-fast.html#simulators-designing-hardware-with-software ]]

== Pydgin ==

Pydgin (https://github.com/cornell-brg/pydgin) is a framework for constructing fast **functional-level** simulators for processors, also known as Instruction-Set Simulators (ISSs). Pydgin uses the RPython translation toolchain to take simple Python descriptions of a processor's state and instructions and generate a JIT-enabled ISS. This simulator is capable of executing compiled ELF binaries (however, your compiler and your Pydgin ISS must agree on what the target architecture is!).

Pydgin is a great match for RPython because writing a simple processor simulator is very similar to writing a bytecode interpreter. Instructions are fetched, decoded, and then executed; the primary difference being that instead of bytecodes we are executing binary instructions extracted from a compiled executable.

However, this does not necessarily work for arbitrary hardware. Processors specifically just happen to behave a lot like a language VM. I'm not sure this approach could be used to create say, a fast functional-level model of a router; or a specialized vision-processing accelerator. It certainly wouldn't make sense for something simple like a ripple-carry adder (bad example, but I think you catch what I mean).

== Using the RPython for HDL Generation ==

PyMTL certainly shares the same challenge Henry mentioned in terms of translator/converter/compiler robustness: we only support a subset of Python which is is many ways more restrictive than RPython. I've thought a little bit about using the RPython translation toolchain to make Verilog generation easier; its type analysis is certainly more sophisticated than my simple translator. To make this possible, support for fixed-bitwidth types would be needed for the type annotator, and in our case we would really need support for Python properties because we use these quite a bit.

However, I think the RPython translation toolchain wouldn't really address the biggest challenge here, which is lowering the behavioral logic into a representation translatable into synthesizable Verilog. I've actually found the the type annotation to be manageable (although my type inference for local temporaries really could/should be improved), the hard part for me has been transforming the AST into a representation that maps to valid Verilog. This often involves unrolling lists of ports into individual ports, expanding attribute accesses into temporary wires and assignments, determining which signals should be marked as type wire or reg, creating wire lists so that indexed accesses map to the correct input/output ports. More importantly, it would be really nice to know when it simply **is not possible** to convert a Python construct into valid Verilog. My analysis is currently not great at this.

I'm not sure the intermediate representation used by the RPython toolchain would be a very good mapping for Verilog, so I'm not entirely sure there is much else than the annotator we could really leverage. That combined with the relative complexity of the RPython toolchain makes it seem like it might not be the right way to go. It will likely need to support much more complex Python code than we could ever hope to convert into Verilog, which results in both slower translation times and potentially complicates maintainability. However, I'm not a compiler expert so I could be (and hopefully I am!) wrong here.

== How Can RPython Help RTL Designers? ==

That being said, I think there are some neat opportunities for using RPython to speed up hardware simulation. Pydgin was certainly one approach, but like I mentioned earlier it only works well for models that behave like an interpreter. However, both MyHDL and PyMTL use Python for hardware simulation, and both have shown rather good speedups when using PyPy. I think the addition of a few features could potentially improve our simulation performance on PyPy even more:

=== Backend support for fixed bit-width datatypes ===

RPython already has this for 32-bit integers, but in RTL we are dealing with arbitrary bitwidths. I currently model these in Python using a special Bits class which handles all shifting, slicing, bitmasking, and conversions when interacting with Python ints. I suspect these operations are pretty expensive. Maybe PyPy already does a good job of optimizing these, but it seems like there might be an opportunity here.

=== Support for tracing loop-less logic ===

RTL models have very data dependent control flow, however, this control flow very rarely uses loops and instead consists of large if/else trees. If/else trees model muxes which are common in hardware, whereas loops are only sometimes synthesizable into real hardware and are therefore used less frequently (loops in HDL descriptions are generally only synthesizable if they represent logic that will ultimately be unrolled in a hardware implementation).

In addition, PyMTL models combinational logic with many small functions and an event queue: the order and frequency in which these small functions are placed on the queue to execute is extremely data-dependent. I suspect the PyPy tracer has a very hard time optimizing this logic. However, we know ahead of time these small functions will be executed very frequently and should be JIT'd, we just don't know the order in which they will execute since this will likely change each iteration through the simulator.

For these reasons, we think RTL models are very challenging for PyPy to optimize; and we have data that supports this. If you look at Figure 14 of our PyMTL paper (http://csl.cornell.edu/~cbatten/pdfs/lockhart-pymtl-micro2014.pdf), you'll see that PyPy is able to give us 25x speedup on a functional-level model of a simple mesh network (~1 primary logic function), a 12x speedup on a cycle-level model (~3*64 primary logic functions), and only a 6x improvement on an RTL model (~14*64 logic functions). A 6x performance improvement is nothing to sneeze at, but it seems like a more detailed model should have an opportunity to get **higher** speedups than less-detailed models because there are more operations to optimize.

=== Hooks for guaranteeing JITing of specific functions ===

To address the problems mentioned above, it would be really helpful to be able to mark specific functions as being JIT'able to PyPy. We know these will be executed very frequently, so we don't even need/want a visit counter, just JIT it as soon as you see it. Or maybe set the trace boundries to be the entry and exit of the decorated function. Unfortunately, this is more of a method-based approach and I don't think it's easy to implement in RPython/PyPy.

= TL;DR =

RPython is great for creating fast simulators for hardware models that act like an interpreter (e.g. processors). I have doubts about how useful it can be for generating Verilog for a Python HDL; although I hope I'm wrong here and someone smarter than me can figure it out. However, I think both MyHDL and PyMTL simulations could greatly benefit from certain PyPy optimizations.

Derek

On Mon, Mar 23, 2015 at 9:02 AM, Sarah Mount <mount.sarah@gmail.com> wrote:

Hi,

On Mon, Mar 23, 2015 at 1:00 PM, Henry Gomersall <heng@cantab.net> wrote:
> On 23/03/15 12:50, Sarah Mount wrote:
> <snip>
>>>
>>> Well, potentially, but the big win is in being allowed a broader range of
>>> >convertible constructs. For example, there is currently no way to handle
>>> >general iterables (only loops of the form `for i in range(N):` are
>>> > allowed).
>>> >Clearly, this is very restrictive for writing nice, expressive code.
>>> >
>>> >Stepping back a minute. The ultimate goal IMO would be a tool that takes
>>> > a
>>> >MyHDL instance block (that is, that represents the function of a
>>> > hardware
>>> >block), along with the associated static namespace, and converts into
>>> >something that downstream tools can understand (VHDL or Verilog), with
>>> > as
>>> >much expressive power in the code as makes sense given the target
>>> >restrictions.
>>> >
>>
>> Hmm. So, what I understand from this is that your current front-end
>> implements a very small subset of Python, you have noticed that
>> RPython implements a slightly larger subset of Python, so you want to
>> replace your front-end and maybe some internals with the RPython
>> equivalents? I'm not sure how well this will work. For one thing,
>> RPython is intended to be translated to a native format (i.e. you take
>> a description of an interpreter or VM in RPython and after a very long
>> compile you get a native executable that is your interpreter, with any
>> RPython internals, such as JITs and GCs, included). I'm not sure if
>> this is a win for you or not, because I'm not sure if an RPython
>> front-end can really be made to fit a MyHDL back-end, without just
>> re-writing the whole thing as an interpreter.
>
>
> So, the thinking initially, which I still think might be the route to go, is
> to tap in to the flow object space. Having a really good representation of
> the flow graph would be hugely useful in generating the relevant HDL code.
> If it's possible to also tap into some of the annotation code, this might be
> useful, but then again it might not :)
>

I don't know enough about the internals of RPython etc. to speak to
this. It sounds like it would be more useful to you to try and extract
the modules and packages you need and move them over to MyHDL (if your
licenses match suitably) rather than using the full toolchain.

Perhaps I have the wrong end of the stick though.

Sarah

--
Dr. Sarah Mount, Senior Lecturer, University of Wolverhampton
website: http://www.snim2.org/
twitter: @snim2

_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev