Abstract: In post-quantum cryptography (PQC) algorithm CRISTALS-Kyber, polynomial multiplication takes up the main part. The number theoretic transform (NTT) can reduce the complexity of polynomial multiplication, so simple and efficient NTT architecture design is very important for the implementation of the whole algorithm. This paper proposes a hardware-friendly two-stage iterative address access algorithm for memory-based NTT/INTT, and designs a serial two-level iterative hardware architecture. In this architecture, when calculating NTT/INTT, half of the intermediate coefficients are provided by the butterfly unit (BFU), which saves BRAM quantity and simplifies the circuit structure. The architecture can also realize the sharing of NTT-INTT data streams, which further simplifies the control logic. In order to complete polynomial multiplication, a BFU is employed to complete the point-wise multiplication (PWM). The architecture was deployed on Xilinx Artix-7, and experimental results show that compared with the existing design, the proposed architecture reduces LUT, FF, BRAM resources by 30%, 23%, and 25%, respectively, and improves area-time product (ATP) performance by 10%-40%.
Keywords: post-quantum cryptography(PQC); Kyber algorithm; number theoretic transform(NTT); polynomial multiplier; memory access