合肥工业大学校徽 合肥工业大学学报自科版

导航菜单

基于 Zynq 的卷积神经网络加速器设计

Design of convolutional neural network accelerator based on Zynq

期刊信息

合肥工业大学(自然科学版),2025年7月,第48卷第7期:904-909

DOI: 10.3969/j.issn.1003-5060.2025.07.007

作者信息

孟凡开 $ ^{1,2} $,张峰 $ ^{2} $,李淼 $ ^{2} $,张多利 $ ^{1} $

(1. 合肥工业大学微电子学院,安徽合肥 230601;2. 中国科学院自动化研究所国家专用集成电路设计工程技术研究中心,北京 100190)

摘要和关键词

摘要: 针对卷积神经网络(convolutional neural network, CNN)嵌入式部署资源开销大、运行速度慢等问题,文章提出一种以Tiny-YOLOv3作为算法模型的CNN硬件加速器。首先,基于Tiny-YOLOv3网络各层的特性和要求设计CNN加速器实现方案,将权重系数按位分割,面向单bit权重设计卷积加速器,通过逐位实施达到处理速度和识别率的高效平衡;然后,采用查表选择法实现卷积算子的乘加运算,设计一款 $ 6\times3\times16 $的三维加速器计算阵列,可单周期完成288个卷积窗口计算;最后,在Xilinx Zynq UltraScale+MPSoC系列芯片上对设计的CNN加速器进行性能测试。实验结果表明,该CNN加速器在200 MHz频率下具有518.4 GOPS的算力,比现有的解决方案性能提高了约63%。

关键词: 卷积神经网络(CNN);Tiny-YOLOv3网络模型;硬件加速;流水阵列;并行运算

Authors

MENG Fankai $ ^{1,2} $, ZHANG Feng $ ^{2} $, LI Miao $ ^{2} $, ZHANG Duoli $ ^{1} $

(1. School of Microelectronics, Hefei University of Technology, Hefei 230601, China; 2. National ASIC Design Engineering Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China)

Abstract and Keywords

Abstract: In view of the problems of high cost and slow running of embedded deployment of convolutional neural network(CNN), a CNN hardware accelerator using Tiny-YOLOv3 as algorithm model is presented. According to the characteristics and requirements of each layer of Tiny-YOLOv3 network, a CNN accelerator implementation scheme is proposed, which divides the weight coefficients bit by bit, designs the convolution accelerator for single bit weight, and achieves a high-efficiency compromise between processing speed and recognition efficiency through bit-by-bit implementation. The multiplication and addition of convolution operator is implemented by a table-looking selection method. A $ 6 \times 3 \times 16 $ 3D accelerator calculation array is designed, and 288 convolution windows can be calculated in a single cycle. Finally, the performance of CNN accelerator is tested on Xilinx Zynq UltraScale+MPSoC series chips. The results show that the designed CNN accelerator has a computational power of 518.4 GOPS at 200 MHz, achieving a performance improvement of about 63% compared to existing solutions.

Keywords: convolutional neural network(CNN); Tiny-YOLOv3 network model; hardware acceleration; pipeline array; parallel operation

基金信息

国家自然科学基金资助项目(61874156);安徽省高校协同创新资助项目(GXXT-2019-030)

个人中心