FFT256 DIF
1. [Done] Wrote code for the FFT256 DIF.
- Resources
FFT is performed in 8 conveyor organized stages. Each stage is similar to as described here. So far each stage uses 2 BRAM ports (A for 16-bit Re-part and B for 16-bit Im-part) and the 1 MULT18X18 (for the “butterfly”) – 4 BRAMs + 6 MULT18X18s (MULTs are not used in the last 2 stages). Plus other logic – 1 FFT256 uses 18% of FPGA resourses.
- Performance time
Each BRAM is shared by 2 stages for write and by 2 stages for read (e.g., write – stages 2,3 and read – stages 3,4) – and because the address bus is used for 4 double accesses then each channel double writes/reads every 8 tacts – this results in:
Load time + Computation time + Readout time @10ns Clk ~ 2.5us + 8×10us + 2.5us = 85us =(
1a. [Done] Removed 2 MULTs from the stages 7 & 8 – because the sine and cosine are +/-1 or 0 – no need in multiplication.
2. [In Progress] FFT256 verification – calculated coefficients in OOo Spreadsheet – for tested sequence the results are almost equal – will be better to move it to the testbench.
TODO: (almost the same because initially there was a DIT algorithm and I was writing DIF)
- Write a correlation computation block.
- Make FFT run faster – get away from full buffering between stages.
- Set up memory controller for frames read/write.
- Integrate the correlation block to 10359’s firmware.
Leave a Reply