March 30, 2016

Synchronizing Verilog, Python and C

by Andrey Filippov

Elphel NC393 as all the previous camera models relies on the intimate cooperation of the FPGA programmed in Verilog HDL and the software that runs on a general purpose CPU. Just as the FPGA manufacturers increase the speed and density of their devices, so do the Elphel cameras. FPGA code consists of the hundreds of files, tens of thousand lines of code and is constantly modified during the lifetime of the product both by us and by our users to accommodate the cameras for their applications. In most cases, if it is not just a bug fix or minor improvement of the previously implemented functionality, the software (and multiple layers of it) needs to be aware of the changes. This is both the power and the challenge of such hybrid systems, and the synchronization of the changes is an important issue.

Verilog parameters

Verilog code of the camera consists of the parameterized modules, we try to use parameters and generate Verilog operators in most cases, but `define macros and `ifdef conditional directives are still used to switch some global options (like synthesis vs. compilation, various debug levels). Eclipse-based VDT that we use for the FPGA development is aware of the parameters, and when the code instantiates a parametrized module that has parameter-dependent widths of the ports, VDT verifies that the instance ports match the signals connected to them, and warns the developer if it is not the case. Many parameters are routed through the levels of the hierarchy so the deeper instances can be controlled from a single header file, making it obvious which parameters influence which modules operations. Some parameters are specified directly, while some have to be calculated – it is the case for the register address decoders of the same module instances for different channels. Such channels have the same relative address maps, but different base addresses. Most of the camera parameters (not counting the trivial ones where the module instance parameters are defined by the nature of the code) are contained in a single x393_parameters.vh header file. There are more than six hundred of them there and most influence the software API.

Development cycle

When implementing some new camera FPGA functionality, we start with the simulation – always. Sometimes very small changes can be applied to the code, synthesized and tested in the actual hardware, but it almost never works this way – bypassing the simulation step. So far all the simulation we use consit of the plain old Verilog test benches (such as this or that) – not even System Verilog. Most likely for simulating CPU+FPGA devices ideal would be the use the software programming language to model the CPU side of the SoC and keep Verilog (or VHDL who prefers it) to the FPGA. Something like cocotb may work, especially we are already manually translating Verilog into Python, but we are not there yet.

Translaing Verilog to Python

So the next step is as I just mentioned – manual translation of the Verilog tasks and functions used in simulation to Python that code that can run on the actual hardware. The result does not look extremely pythonian as I try to follow already tested Verilog code, but it is OK. Not all the translation is manual – we use a import_verilog_parameters.py module to “understand” the parameters defined in Verilog files (including simple arithmetic and logical operations used to generate derivative parameters/localparams in the Verilog code), get the values from the same source and so reduce the possibility to accidentally use old software with the modified FPGA implementation. As the parameters are known to the program at a run time and PyDev (running, btw, in the same Eclipse IDE as the VDT – just as a different “perspective”) can not catch the misspelled parameter names. So the program has an option to modify itself and generate pre-defines for each of the parameter. Only the top part of the vrlg module is human-generated, everything under line 120 is automatically generated (and has to be re-generated only after adding new parameters to the Verilog source).

Hardware testing with Python programs

When the Verilog code is manually translated (or while new parts of the code are being translated or developed from scratch) it is possible to operate the actual camera. The top module is still called test_mcntrl as it started with DDR3 memory calibration using Levenberg-Marquardt algorithm (luckily it needs to run just once – it takes camera 10 minutes to do the full calibration this way).

This program keeps track of the Verilog parameters and macros, exposes all the functions (with the names not beginning with the underscore character), extracts docstrings from the code and combines it with the generated list of the function parameters and their default values, provides search/help for the functions with regexp (a must when there are hundreds of such functions). Next code ran in the camera:

And the same one in PyDev console window of Eclipse IDE – “simulated” means that the program could not detect the FPGA and so it is not the target hardware:

Python program was also used for the AHCI SATA controller initial development (before adding it was possible to add is as Linux kernel platform driver, but number of parameters there is much smaller, and most of the addresses are defined by the AHCI standard.

Synchronizing parameters with the kernel drivers

Next step is to update/redesign/develop the Linux kernel drivers to support camera functionality. Learning the lessons from the previous camera models (software was growing with the hardware incrementally) we are trying to minimize manual intervention into the process of synchronizing different layers of code (including the “hardware” one). Previous camera interface to the FPGA consisted of the hand-crafted files such as x353.h. It started from the x313.h (for NC313 – our first camera based on Axis CPU and Xilinx FPGA – same was used in NC323 that scanned many billions of book pages), was modified for the NC333 and later for our previous NC353 used in car-mounted panoramic cameras that captured most of the world’s roads.

Each time the files were modified to accommodate the new hardware, it was always a challenge to add extra bits to the memory controller addresses, image frame widths and heights (they are now all 16-bit wide – enough for the multi-gigapixel sensors). With Python modules already knowing all the current values of the Verilog parameters that define software interface it was natural to generate the C files needed to interface the hardware in the same environment.

Implementation of the register access in the FPGA

The memory-mapped registers in the camera share the same access mechanism – they use MAXIGP0 (CPU master, general purpose, channel 0) AXI port available in SoC, generously mapped there to 1/4 of the whole 32-bit address range (0x40000000.0x7fffffff). While logically all the locations are 32-bit wide, some use just 1 byte or even no data at all – any write to such address causes defined action.

Internally the commands are distributed to the target modules over a tree of byte-parallel buses that tolerate register insertion, at the endpoints they are converted to the parallel format by cmd_deser.v instances. The status data from the modules (sent by status_generate.v) is routed as messages (also in byte-parallel format to reduce the required FPGA routing resources) to a single block memory that can be read over the AXI by the CPU with zero delay. The status generation by the subsystems is individually programmed to be either on demand (in response to the write operation by the CPU) or automatically when the register data changes. While this write and read mechanism is common, the nature of the registers and data may be very different as the project combines many modules designed at different time for different purposes. All the memory mapped locations in the design fall into 3 categories:

  • Read only registers that allow to read status from the various modules, DMA pointers and other small data items.
  • Read/write registers – the ones where result of writing does not depend on any context. The full write register address range has a shadow memory block in parallel, so reading from that address will return the data that was last written there.
  • Write-only registers – all other registers where write action depends on the context. Some modules include large tables exposed through a pair of address/data locations in the address map, many other have independent bit fields with the corresponding “set” bit, so internal values are modified for only the selected field.

Register access as C11 anonymous members

All the registers in the design are 32-bit wide and are aligned to 4-byte ranges, even as not all of them use all the bits. Another common feature of the used register model is that some modules exist in multiple instances, each having evenly spaced base addresses, some of them have 2-level hierarchy (channel and sub-channel), where the address is a sum of the category base address, relative register address and a linear combination of the two indices.

Individual C typedef is generated for each set of registers that have different meanings of the bit fields – this way it is possible to benefit from the compiler type checking. All the types used fit into the 32 bits, and as in many cases the same hardware register can accept alternative values for individual bit fields, we use unions of anonymous (to make access expressions shorter) bit-field structures.

Here is a generated example of such typedef code (full source):

Some member names in the example above are commented out (like /*tbl_mode*/ in lines 398, 408 and 420). This is done so because some bit fields (in this case bits [29:28]) have the same meaning in all alternative structures, and auto-generating complex union/structure combinations to create a valid C code with each member having unique name would produce rather clumsy code. Instead this script makes sure that same named members really designate the same bit fields, and then makes them anonymous while preserving names for a human reader. The last member (u32 d32:32;) is added to each union making it possible to address each of them as an unsigned long variable without casting.

And this is a snippet of the part of the generator code that produced it:

The vrlg.* values used above are in turn read from the x393_parameters.vh Verilog file:

Auto-generated files also include x393.h, it provides other constant definitions (like valid values for the bit fields) – lines 301..303, and function declarations to access registers. Names of the functions for read-only and write-only are derived from the address symbolic names by converting them to the lower case, the ones which deal with read/write registers have set_ and get_ prefixes attached.

Register access functions are implemented with readl() and writel(), this is a corresponding section of the x393.c file:

There are two other header files generated from the same data, one (x393_defs.h) is just an alternative way to represent register addresses – instead of the getter and setter functions it defines the preprocessor macros:

The last generated file – x393_map.h uses the preprocessor macro format to provide a full ordered address map of all the available registers for all channels and sub-channels. It is intended to be used just as a reference for developers, not as an actual include file.

Conclusions

The generated code for Elphel NC393 camera is definitely very hardware-specific, its main purpose is to encapsulate as much as possible of the hardware interface details and so to reduce dependence of the higher layers of software on the modifications of the HDL code. Such tasks are common to other projects that involve CPU/FPGA tandems, and similar approach to organizing software/hardware interface may be useful there too.


One response to “Synchronizing Verilog, Python and C”

Leave a Reply

Your email address will not be published. Required fields are marked *


9 − = seven