基于 Zynq 的卷积神经网络加速器设计

孟凡开 $ ^{1,2} $，张峰 $ ^{2} $，李淼 $ ^{2} $，张多利 $ ^{1} $

(1. 合肥工业大学微电子学院，安徽合肥 230601；2. 中国科学院自动化研究所国家专用集成电路设计工程技术研究中心，北京 100190)

摘要

针对卷积神经网络(convolutional neural network, CNN)嵌入式部署资源开销大、运行速度慢等问题，文章提出一种以Tiny-YOLOv3作为算法模型的CNN硬件加速器。首先，基于Tiny-YOLOv3网络各层的特性和要求设计CNN加速器实现方案，将权重系数按位分割，面向单bit权重设计卷积加速器，通过逐位实施达到处理速度和识别率的高效平衡；然后，采用查表选择法实现卷积算子的乘加运算，设计一款 $ 6\times3\times16 $的三维加速器计算阵列，可单周期完成288个卷积窗口计算；最后，在Xilinx Zynq UltraScale+MPSoC系列芯片上对设计的CNN加速器进行性能测试。实验结果表明，该CNN加速器在200 MHz频率下具有518.4 GOPS的算力，比现有的解决方案性能提高了约63%。

关键词

卷积神经网络(CNN)；Tiny-YOLOv3网络模型；硬件加速；流水阵列；并行运算

中图分类号：TN47

文献标志码：A

文章编号：1003-5060（2025）07-0904-06

Design of convolutional neural network accelerator based on Zynq

MENG Fankai $ ^{1,2} $, ZHANG Feng $ ^{2} $, LI Miao $ ^{2} $, ZHANG Duoli $ ^{1} $

(1. School of Microelectronics, Hefei University of Technology, Hefei 230601, China; 2. National ASIC Design Engineering Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China)

Abstract

In view of the problems of high cost and slow running of embedded deployment of convolutional neural network(CNN), a CNN hardware accelerator using Tiny-YOLOv3 as algorithm model is presented. According to the characteristics and requirements of each layer of Tiny-YOLOv3 network, a CNN accelerator implementation scheme is proposed, which divides the weight coefficients bit by bit, designs the convolution accelerator for single bit weight, and achieves a high-efficiency compromise between processing speed and recognition efficiency through bit-by-bit implementation. The multiplication and addition of convolution operator is implemented by a table-looking selection method. A $ 6 \times 3 \times 16 $ 3D accelerator calculation array is designed, and 288 convolution windows can be calculated in a single cycle. Finally, the performance of CNN accelerator is tested on Xilinx Zynq UltraScale+MPSoC series chips. The results show that the designed CNN accelerator has a computational power of 518.4 GOPS at 200 MHz, achieving a performance improvement of about 63% compared to existing solutions.

Keywords

convolutional neural network(CNN); Tiny-YOLOv3 network model; hardware acceleration; pipeline array; parallel operation

收稿日期：2023-05-12

修回日期：2023-06-02

基金项目：国家自然科学基金资助项目（61874156）；安徽省高校协同创新资助项目（GXXT-2019-030）