NC393 progress update: one gigapixel per second (12x faster than NC353)
All the PCBs for the new camera: 10393, 10389 and 10385 are modified to rev “A”, we already received the new boards from the factory and now are waiting for the first production batch to be build. The PCB changes are minor, just moving connectors away from the board edge to simplify mechanical design and improve thermal contact of the heat sink plate to the camera body. Additionally the 10389A got m2 connector instead of the mSATA to accommodate modern SSD.
While waiting for the production we designed a new sensor board (10398) that has exactly the same dimensions, same image sensor format as the current 10338E and so it is compatible with the hardware for the calibrated sensor front ends we use in photogrammetric cameras. The difference is that this MT9F002 is a 14 MPix device and has high-speed serial interface instead of the legacy parallel one. We expect to get the new boards and the sensors next week and will immediately start working with this new hardware.
In preparation for the faster sensors I started to work on the FPGA code to make it ready for the new devices. We planned to use modern sensors with the serial interfaces from the very beginning of the new camera design, so the hardware accommodates up to 8 differential data lanes plus a clock pair in addition to the I²C and several control signals. One obviously required part is the support for Aptina HiSPi (High Speed Serial Pixel) interface that in case of MT9F002 uses 4 differential data lanes, each running at 660 Mbps – in 12-bit mode that corresponds to 220 MPix/s. Until we’ll get the actual sensors I could only simulate receiving of the HiSPi data using the sensor model written ourselves following the interface documentation. I’ll need yet to make sure I understood the documentation correctly and the sensor will produce output similar to what we modeled.
The sensor interface is not the only piece of the code that needed changes, I also had to increase significantly the bandwidth of the FPGA signal processing and to modify the I²C sequencer to support 2-byte register addresses.
Data that FPGA receives from the sensor passes through the several clock domains until it is stored in the system memory as a sequence of compressed JPEG/JP4 frames:
- Sensor data in each channel enters FPGA at a pixel clock rate, and subsequently passes through vignetting correction/scaling module, gamma conversion module and histogram calculation modules. This chain output is buffered before crossing to the memory clock domain.
- Multichannel DDR3 memory controller records sensor data in line-scan order and later retrieves it in overlapping (for JPEG) or non-overlapping (for JP4) square tiles.
- Data tiles retrieved from the external DDR3 memory are sent to the compressor clock domain to be processed with JPEG algorithm. In color JPEG mode compressor bandwidth has to be 1.5 higher than the pixel rate, as for 4:2:0 encoding each 16×16 pixels macroblock generate 6 of the 8×8 image blocks – 4 for Y (intensity) and 2 – for color components. In JP4 mode when the de-mosaic algorithm runs on the host computer the compressor clock rate equals the pixel rate.
- Last clock domain is 150MHz used by the AXI interface that operates in 64-bit parallel mode and transfers the compressed data to the system memory.
Two of these domains used double clock rate for some of the processing stages – histograms calculation in the pixel clock domain and Huffman encoder/bit stuffer in the compressor. In the previous NC353 camera pixel clock rate was 96MHz (192 MHz for double rate) and compressor rate was 80MHz (160MHz for double rate). The sensor/compressor clock rates difference reflects the fact that the sensor data output is not uniform (it pauses during inactive lines) and the compressor can process the frame at a steady rate.
MT9F002 image sensor has the output pixel rate of 220MPix/s with the average (over the full frame) rate of 198MPix/s. Using double rate clocks (440MHz for the sensor channel and 400MHz for the compressor) would be rather difficult on Zynq, so I needed first to eliminate such clocks in the design. It was possible to implement and test this modification with the existing sensor, and now it is done – four of the camera compressors each run at 250 MHz (even on “-1”, or “slow” speed grade silicon) making it total of 1GPix/sec. It does not need to have 4 separate sensors running simultaneously – a single high speed imager can provide data for all 4 compressors, each processing every 4-th frame as each image is processed independently.
At this time the memory controller will be a bottleneck when running all four MT9F002 sensors simultaneously as it currently provides only 1600MB/s bandwidth that may be marginally sufficient for four MT9F002 sensor channels and 4 compressor channels each requiring 200MB/s (bandwidth overhead is just a few percent). I am sure it will be possible to optimize the memory controller code to run at higher rate to match the compressors. We already identified which parts of the memory controller need to be modified to support 1.5x clock increase to the total of 2400MB/s. And as the production NC393 camera will have higher speed grade SoC there will be an extra 20% performance increase for the same code. That will provide bandwidth sufficient not just to run 4 sensors at full speed and compress the output data, but to do some other image manipulation at the same time.
Compared to the previous Elphel NC353 camera the new NC393 prototype already is tested to have 12x higher compressor bandwidth (4 channels instead of one and 250MPix/s instead of 80MPix/s), we plan to have the actual sensor with a full data processing chain results soon.
Leave a Reply