[pypy-dev] Custom types for annotating a flow object space

Mon Mar 23 16:26:10 CET 2015

Hi all,

Thought I would chime in since this is also of interest to me.  I have two
distinct projects related to hardware simulation, one of which is very well
suited towards the RPython toolchain (Pydgin), and another which is much
more closely related to MyHDL (PyMTL).  Sarah kindly pointed out that
Pydgin is relevant here (thanks Sarah!), but I thought it might be useful
to briefly summarize both Pydgin and PyMTL to explain why one uses RPython
and the other does not.

[[ This turned out to be really long, so feel free to check out the TL;DR
at the end. ]]

== PyMTL==

PyMTL (https://github.com/cornell-brg/pymtl) is a framework for hardware
design. If you just use PyMTL for RTL-modeling it could very much be
considered an alternative to MyHDL: special Python types are provided to
model fixed-bitwidth logic that can be simulated directly in Python. Once
this behavioral logic is verified in Python simulation, and provided the
logic is described in a sufficiently restricted subset of Python, we can
then translate it into Verilog. We do not have a VHDL backend.

More generally, PyMTL supports multi-level simulation, so that
functional-level, cycle-level, and register-transfer level logic can be
described, composed, and simulated together. This, along with the way we
construct our models (we use Python classes like Verilog modules), makes us
a bit different than MyHDL.  However, I think the mechanism we use to
translate RTL logic into Verilog is quite similar: we have our own
"translator" to manually walk the AST of behavioral logic, infer types, and
generate Verilog source.

[[ Aside: for a clarification of the distinctions between functional-level,
cycle-level, and RTL modeling you can see a brief summary here:
http://morepypy.blogspot.com/2015/03/pydgin-using-rpython-to-generate-fast.html#simulators-designing-hardware-with-software
]]

== Pydgin ==

Pydgin (https://github.com/cornell-brg/pydgin) is a framework for
constructing fast **functional-level** simulators for processors, also
known as Instruction-Set Simulators (ISSs).  Pydgin uses the RPython
translation toolchain to take simple Python descriptions of a processor's
state and instructions and generate a JIT-enabled ISS.  This simulator is
capable of executing compiled ELF binaries (however, your compiler and your
Pydgin ISS must agree on what the target architecture is!).

Pydgin is a great match for RPython because writing a simple processor
simulator is very similar to writing a bytecode interpreter.  Instructions
are fetched, decoded, and then executed; the primary difference being that
instead of bytecodes we are executing binary instructions extracted from a
compiled executable.

However, this does not necessarily work for arbitrary hardware. Processors
specifically just happen to behave a lot like a language VM. I'm not sure
this approach could be used to create say, a fast functional-level model of
a router; or a specialized vision-processing accelerator. It certainly
wouldn't make sense for something simple like a ripple-carry adder (bad
example, but I think you catch what I mean).

== Using the RPython for HDL Generation ==

PyMTL certainly shares the same challenge Henry mentioned in terms of
translator/converter/compiler robustness: we only support a subset of
Python which is is many ways more restrictive than RPython. I've thought a
little bit about using the RPython translation toolchain to make Verilog
generation easier; its type analysis is certainly more sophisticated than
my simple translator.  To make this possible, support for fixed-bitwidth
types would be needed for the type annotator, and in our case we would
really need support for Python properties because we use these quite a bit.

However, I think the RPython translation toolchain wouldn't really address
the biggest challenge here, which is lowering the behavioral logic into a
representation translatable into synthesizable Verilog.  I've actually
found the the type annotation to be manageable (although my type inference
for local temporaries really could/should be improved), the hard part for
me has been transforming the AST into a representation that maps to valid
Verilog.  This often involves unrolling lists of ports into individual
ports, expanding attribute accesses into temporary wires and assignments,
determining which signals should be marked as type wire or reg, creating
wire lists so that indexed accesses map to the correct input/output ports.
More importantly, it would be really nice to know when it simply **is not
possible** to convert a Python construct into valid Verilog. My analysis is
currently not great at this.

I'm not sure the intermediate representation used by the RPython toolchain
would be a very good mapping for Verilog, so I'm not entirely sure there is
much else than the annotator we could really leverage. That combined with
the relative complexity of the RPython toolchain makes it seem like it
might not be the right way to go.  It will likely need to support much more
complex Python code than we could ever hope to convert into Verilog, which
results in both slower translation times and potentially complicates
maintainability. However, I'm not a compiler expert so I could be (and
hopefully I am!) wrong here.

== How Can RPython Help RTL Designers? ==

That being said, I think there are some neat opportunities for using
RPython to speed up hardware simulation. Pydgin was certainly one approach,
but like I mentioned earlier it only works well for models that behave like
an interpreter. However, both MyHDL and PyMTL use Python for hardware
simulation, and both have shown rather good speedups when using PyPy. I
think the addition of a few features could potentially improve our
simulation performance on PyPy even more:

=== Backend support for fixed bit-width datatypes ===

RPython already has this for 32-bit integers, but in RTL we are dealing
with arbitrary bitwidths. I currently model these in Python using a special
Bits class which handles all shifting, slicing, bitmasking, and conversions
when interacting with Python ints.  I suspect these operations are pretty
expensive.  Maybe PyPy already does a good job of optimizing these, but it
seems like there might be an opportunity here.

=== Support for tracing loop-less logic ===

RTL models have very data dependent control flow, however, this control
flow very rarely uses loops and instead consists of large if/else trees.
If/else trees model muxes which are common in hardware, whereas loops are
only sometimes synthesizable into real hardware and are therefore used less
frequently (loops in HDL descriptions are generally only synthesizable if
they represent logic that will ultimately be unrolled in a hardware
implementation).

In addition, PyMTL models combinational logic with many small functions and
an event queue: the order and frequency in which these small functions are
placed on the queue to execute is extremely data-dependent.  I suspect the
PyPy tracer has a very hard time optimizing this logic. However, we know
ahead of time these small functions will be executed very frequently and
should be JIT'd, we just don't know the order in which they will execute
since this will likely change each iteration through the simulator.

For these reasons, we think RTL models are very challenging for PyPy to
optimize; and we have data that supports this.  If you look at Figure 14 of
our PyMTL paper (
http://csl.cornell.edu/~cbatten/pdfs/lockhart-pymtl-micro2014.pdf), you'll
see that PyPy is able to give us 25x speedup on a functional-level model of
a simple mesh network (~1 primary logic function), a 12x speedup on a
cycle-level model (~3*64 primary logic functions), and only a 6x
improvement on an RTL model (~14*64 logic functions).  A 6x performance
improvement is nothing to sneeze at, but it seems like a more detailed
model should have an opportunity to get **higher** speedups than
less-detailed models because there are more operations to optimize.

=== Hooks for guaranteeing JITing of specific functions ===

To address the problems mentioned above, it would be really helpful to be
able to mark specific functions as being JIT'able to PyPy.  We know these
will be executed very frequently, so we don't even need/want a visit
counter, just JIT it as soon as you see it.  Or maybe set the trace
boundries to be the entry and exit of the decorated function.
Unfortunately, this is more of a method-based approach and I don't think
it's easy to implement in RPython/PyPy.

= TL;DR =

RPython is great for creating fast simulators for hardware models that act
like an interpreter (e.g. processors). I have doubts about how useful it
can be for generating Verilog for a Python HDL; although I hope I'm wrong
here and someone smarter than me can figure it out.  However, I think both
MyHDL and PyMTL simulations could greatly benefit from certain PyPy
optimizations.

Derek

On Mon, Mar 23, 2015 at 9:02 AM, Sarah Mount <mount.sarah at gmail.com> wrote:

> Hi,
>
> On Mon, Mar 23, 2015 at 1:00 PM, Henry Gomersall <heng at cantab.net> wrote:
> > On 23/03/15 12:50, Sarah Mount wrote:
> > <snip>
> >>>
> >>> Well, potentially, but the big win is in being allowed a broader range
> of
> >>> >convertible constructs. For example, there is currently no way to
> handle
> >>> >general iterables (only loops of the form `for i in range(N):` are
> >>> > allowed).
> >>> >Clearly, this is very restrictive for writing nice, expressive code.
> >>> >
> >>> >Stepping back a minute. The ultimate goal IMO would be a tool that
> takes
> >>> > a
> >>> >MyHDL instance block (that is, that represents the function of a
> >>> > hardware
> >>> >block), along with the associated static namespace, and converts into
> >>> >something that downstream tools can understand (VHDL or Verilog), with
> >>> > as
> >>> >much expressive power in the code as makes sense given the target
> >>> >restrictions.
> >>> >
> >>
> >> Hmm. So, what I understand from this is that your current front-end
> >> implements a very small subset of Python, you have noticed that
> >> RPython implements a slightly larger subset of Python, so you want to
> >> replace your front-end and maybe some internals with the RPython
> >> equivalents? I'm not sure how well this will work. For one thing,
> >> RPython is intended to be translated to a native format (i.e. you take
> >> a description of an interpreter or VM in RPython and after a very long
> >> compile you get a native executable that is your interpreter, with any
> >> RPython internals, such as JITs and GCs, included). I'm not sure if
> >> this is a win for you or not, because I'm not sure if an RPython
> >> front-end can really be made to fit a MyHDL back-end, without just
> >> re-writing the whole thing as an interpreter.
> >
> >
> > So, the thinking initially, which I still think might be the route to
> go, is
> > to tap in to the flow object space. Having a really good representation
> of
> > the flow graph would be hugely useful in generating the relevant HDL
> code.
> > If it's possible to also tap into some of the annotation code, this
> might be
> > useful, but then again it might not :)
> >
>
> I don't know enough about the internals of RPython etc. to speak to
> this. It sounds like it would be more useful to you to try and extract
> the modules and packages you need and move them over to MyHDL (if your
> licenses match suitably) rather than using the full toolchain.
>
> Perhaps I have the wrong end of the stick though.
>
> Sarah
>
> --
> Dr. Sarah Mount, Senior Lecturer, University of Wolverhampton
> website:  http://www.snim2.org/
> twitter: @snim2
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> https://mail.python.org/mailman/listinfo/pypy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20150323/5ac0c27d/attachment-0001.html>