FFT256
by Oleg Dzhimiev
1. [Done] Add 1/256 after the last stage and check the simulation.
- Added a 1-bit right shift (1/2) after each “butterfly” – in the end it is 1/256 accumulated. This can be an early rounding. With 12 bits of color after all the sums in 8 stages the final value can be up to 20 bits wide. Only 16 bits are stored so the 1-bit right shift keeps the value within 12. Alternatively, a 4-bit right shift can be applied after every 4 stages.
2. [Done] Make a simple simulation with FFT256 and IFFT256.
- Made a testbench FFT256-IFFT256. The input and the output differ. Need to check more properly overflows in calculations and cosine tables.
- Corrected a mistake concerning two’s complement 8 bits expanding to 18 (just filling the upper part with the 7th bit(Hi)). Same for 16 bit values before 18×18 multiplication.
3. [In Progress] Optimize BRAM usage, the goal is 5 or 6 BRAMs & 8 MULTs (not sure for the input and output buffer)
- The plan is to use 4 BRAM32X32s and the output will be not buffered – the “butterfly” output needs to be delayed by 2 tacts then. And a specific writes and write order to input buffer will be needed.
TODO:
- Optimize BRAM usage.
- Add headers and commit to CVS.
- Write a correlation computation block.
- Set up memory controller for frames read/write.
- Integrate the correlation block to 10359’s firmware.
Oleg, can you try to make it working at higher than 96 MHz frequency? It may be not available on the 10359 but 10359A will have more clocks. There are a lot of registers in the FPGA so adding them between the stages is always possible, sometimes it makes sense before/after registers inside the multipliers as those have different timing specs than distributed ones.
Yes, I’ll try. Currently I made all MULTs & BRAMs ins and outs registered.