This study investigates a multimodal deep learning framework that integrates unmanned aerial vehicle (UAV) multispectral imagery with meteor
This study investigates a multimodal deep learning framework that integrates unmanned aerial vehicle (UAV) multispectral imagery with meteorological data to predict cotton yield. The study analyzes the impact of different neural network architectures, including the CNN feature extraction layer, the depth of the fully connected layer, and the method of integrating meteorological data, on model performance. Experimental results show that the model combining UAV multispectral imagery with weekly meteorological data achieved optimal yield prediction accuracy (RMSE = 0.27 t/ha; R2 = 0.61). Specifically, models based on AlexNet (Model 9) and CNN2conv (Model 18) exhibited superior accuracy. ANOVA results revealed that deeper fully connected layers significantly reduced RMSE, while variations in CNN architectural complexity had no statistically significant effect. Furthermore, although the models exhibited comparable prediction accuracy (RMSE: 0.27–0.33 t/ha; R2: 0.61–0.69 across test datasets), their yield prediction spatial distributions varied significantly (e.g., Model 9 predicted a mean yield of 3.88 t/ha with a range of 2.51–4.89 t/ha, versus Model 18 at 3.74 t/ha and 2.33–4.76 t/ha), suggesting the need for further evaluation of spatial stability. This study underscores the potential of deep learning models integrating UAV and meteorological data for precision agriculture, offering valuable insights for optimizing spatiotemporal data integration strategies in future research.