Crop yield information plays a pivotal role in ensuring food security. Advances in Earth Observation technology and the availability of historical yield records have promoted the use of machine learning for yield prediction. Significant research efforts have been made in this direction, encompassing varying choices of yield determinants and particularly how spatial and temporal information are encoded. However, these efforts are often conducted under diverse experimental setups, complicating their inter-comparisons. In this paper, we present our findings on multiple strategies for encoding spatial-spectral information at the county level through averaging pixel values, pixel sampling, and image histograms alongside approaches for encoding temporal information, including recurrent neural networks, temporal convolutions, and attention mechanisms. Our numerical results indicate that predicting crop yield solely using time series data can be effective, even without spatial information, and classical machine learning methods remain competitive in this application. Surface reflectance information emerges as a critical predictor in the absence of weather and spectral indices. While machine learning models typically require an extensive sample size, our findings suggest that reliance on long-term historical data may hinder models' ability to accurately reflect current conditions. This study provides valuable insights into feature and model selection for county-level yield prediction, highlighting the interplay between data structure, model complexity, and predictive performance.
article
BibTeXKey: OKS+25