April 10, 2009


by Oleg Dzhimiev

1. [Done] Add 1/256 after the last stage and check the simulation.

  • Added a 1-bit right shift (1/2) after each “butterfly” – in the end it is 1/256 accumulated. This can be an early rounding. With 12 bits of color after all the sums in 8 stages the final value can be up to 20 bits wide. Only 16 bits are stored so the 1-bit right shift keeps the value within 12. Alternatively, a 4-bit right shift can be applied after every 4 stages.

2. [Done] Make a simple simulation with FFT256 and IFFT256.

  • Made a testbench FFT256-IFFT256. The input and the output differ. Need to check more properly overflows in calculations and cosine tables.
  • Corrected a mistake concerning two’s complement 8 bits expanding to 18 (just filling the upper part with the 7th bit(Hi)). Same for 16 bit values before 18×18 multiplication.

3. [In Progress] Optimize BRAM usage, the goal  is 5 or 6 BRAMs &  8 MULTs (not sure for the input and output buffer)

  • The plan is to use 4 BRAM32X32s and the output will be not buffered – the “butterfly” output needs to be delayed by 2 tacts then. And a specific writes and write order to input buffer will be needed.


  1. Optimize BRAM usage.
  2. Add headers and commit to CVS.
  3. Write a correlation computation block.
  4. Set up memory controller for frames read/write.
  5. Integrate the correlation block to 10359’s firmware.

2 responses to “FFT256”

  1. andrey says:

    Oleg, can you try to make it working at higher than 96 MHz frequency? It may be not available on the 10359 but 10359A will have more clocks. There are a lot of registers in the FPGA so adding them between the stages is always possible, sometimes it makes sense before/after registers inside the multipliers as those have different timing specs than distributed ones.

  2. Oleg Dzhimiev says:

    Yes, I’ll try. Currently I made all MULTs & BRAMs ins and outs registered.

Leave a Reply

Your email address will not be published. Required fields are marked *

+ 8 = fourteen