Air pollution poses significant risks to human health, ecosystems, and socio-economic stability. Accurately predicting PM2.5 concentration and Air Quality Index (AQI) is crucial for understanding pollution factors and devising effective control measures. This paper addresses Question C of the 2023 Huazhong Cup, focusing on predicting air quality using a multivariate hybrid prediction model, Prophet-XGBoost, which combines the Prophet time series decomposition algorithm and the XGBoost machine learning model. To address problem 1, this study performed KNN interpolation and IQR outlier removal on the data in Annex 1 and Annex 2 to eliminate missing and outliers in the meteorological data, and then standardised the data. Then, Random Forest Regression (RFR) was used to filter out the features related to the changes of PM2.5 concentration. Firstly, the Random Forest model was trained to determine the decision tree and the optimal number of leaves of the model, and then regression analysis was performed to find out the importance of these features to PM2.5 concentration, and the three main features screened out were PM10, the average temperature and CO, with the scores of 0.7742, 0.1075 and 0.0910, respectively. Comparison with the multiple linear regression model in the model evaluation demonstrated the accuracy of the model, and the calculation of Pearson's correlation coefficient confirmed the reasonableness of the model. In order to solve problems 2 and 3, when constructing the multi-step model, this study divides the training set and test set according to the ratio of 8:2, and firstly trains the prediction results of the known data based on the Prophet timedecomposition algorithm and the XGBoost machine learning model respectively. The results show that the two models have their own advantages and disadvantages, in order to obtain the prediction results that can meet the cyclical and seasonal changes of meteorological features, as well as its unstable and nonlinear characteristics, this study combines the two models together and adopts the Prophet-XGBoost combined prediction model model. The indexes of the combined model are greatly improved compared with the single model, which proves the reasonableness of the hybrid model prediction. Finally, the Prophet-XGBoost model was used to predict the PM2.5 concentration and AQI at the given times in Annex 3, and the air quality warning levels were determined based on the prediction results, which provided a useful reference for the formulation of more effective air quality management strategies.[1]
|