Matrix Multiplication with Half the Multiplications

This repository offers source code for ML hardware architectures that utilize alternative inner-product algorithms to achieve the same performance with fewer multiplier units, enhancing theoretical throughput and compute efficiency limits of ML accelerators. The innovative Free-pipeline Fast Inner Product (FFIP) algorithm and architecture improve the classic Fast Inner-Product (FIP) algorithm proposed by Winograd while boosting clock frequency and throughput. FFIP can be seamlessly integrated into traditional fixed-point systolic array ML accelerators to achieve equivalent throughput with half the multiply-accumulate (MAC) units or double the systolic array size within a fixed hardware budget. The source code organization includes a compiler, RTL, simulation setup, and test scripts for thorough verification of the accelerator functionality.

https://github.com/trevorpogue/algebraic-nnhw