基于函数型 k 近邻分类模型的 $ PM_{2.5} $ 研究

刘壮，凌能祥

(合肥工业大学数学学院，安徽合肥230601)

摘要

文章利用函数型数据分析方法，选取每天24 h的温度数据作为一条独立的曲线样本，并在该基础上建立函数型k近邻分类模型，用以对当天的24 h平均 $ PM_{2.5} $质量浓度进行分类判别。分别选取二次型核函数、指数型核函数、三角型核函数建立k近邻分类模型，并对其结果进行分析，通过对比发现，利用三角型核函数的k近邻分类模型对 $ PM_{2.5} $质量浓度进行分类的准确性最高且最稳健。采用NW（Nadaraya-Watson）核方法与k近邻分类模型进行比较分析，结果表明，k近邻分类模型能有效提高分类的准确率。

关键词

函数型数据分类; k 近邻; 核函数; 非参数统计

中图分类号：O212.7

文献标志码：A

文章编号：1003-5060（2024）07-0967-04

Analysis of $ PM_{2.5} $ based on functional k-nearest neighbors classification model

LIU Zhuang, LING Nengxiang

(School of Mathematics, Hefei University of Technology, Hefei 230601, China)

Abstract

In this paper, a functional data analysis method is used to select temperature data of 24 h per day as an independent curve sample. On this basis, a functional k-nearest neighbors(KNN) classification model is established to classify and discriminate average $ PM_{2.5} $ concentration of the day. The quadratic kernel function, exponential kernel function, and triangle kernel function are selected to establish the kNN classification model, and the results are analyzed. Through comparison, it is found that the kNN classification model using triangle kernel function is the most accurate and robust in classifying $ PM_{2.5} $ concentration. A comparative analysis is performed using the Nadaraya-Watson (NW) kernel method and the kNN classification model. The results show that the kNN classification model can effectively improve the classification accuracy.

Keywords

functional data classification; k-nearest neighbors (KNN); kernel function; nonparametric statistics

收稿日期：2020-02-26

修回日期：2020-03-16

基金项目：