Can machine learning and data science be used to predict the future for growers?
Crop yield prediction is an extremely interesting technical challenge. It is not only one of the most financially impactful decisions a farmer makes each season, but also helps enable management decisions that can maximize harvested yield.
Predicting yield is challenging because it depends on a variety of factors such as weather, genetics of the seed, crop protection application, pest infestation and other factors - some controllable and some not.
In this post, we will introduce three different crop yield prediction problems and highlight the ways in which Climate LLC (Climate) is approaching them from a data science perspective.
Field Productivity: Modeling Potential
Growers need to make crucial planting decisions one year prior to harvest. The investments made in seeds and farm inputs prior to planting heavily influence their operation’s resilience to risk in the coming year, as well as their profitability.
Forward-looking decisions prior to planting include selecting an appropriate seed product, depending on a plethora of different variables. For example, knowing that planting might be delayed by a wet spring would suggest selecting a faster-maturing variety of seed months before planting. However, it’s challenging to make balanced decisions with tradeoffs between profitability and risk, unpredictable weather and commodity price fluctuations.
One of the pivotal Climate machine learning models is the Field Productivity model, which outputs next season’s likely yield levels given specific seed type inputs using all the past years’ data available. This data includes not just the geographical coordinates, weather, and soil information for the field, but also all the seeds used in past years for that particular field, and their associated yield values.
The field productivity model uses this data - along with publicly available datasets - to predict the probability distribution of next year’s yield value for that field. This model is then used as an input for other products and programs like our seed placement models, which help farmers identify ideal hybrids and select the optimum seed choice that will maximize their investment and yields.
Forecast: Anticipating The Season’s Results
Some growers also need to monitor yield estimates as the crop is still growing and as it responds to conditions in its environment, as well as to management practices like fungicide application and irrigation. Understanding near-real-time yield impacts can help farmers anticipate impacts to their bottom line at the end of the season.
While Field Productivity predicts yield for the next season, yield monitoring, or forecasted yield, predicts yield for the current season. The forecast yield prediction model also leverages satellite imaging data, along with location information of the field, the seed product which was planted in any given year and weather data.
The modeling approach for forecast is quite similar to that of hindcast, utilizing machine learning model iteration with a number of statistical approaches that can produce a probability distribution of the yield, which narrows as the season progresses. This model can then be used to track and monitor yield changes in real time thereby improving harvest logistics by enabling farmers to plan when and where to harvest first.
Hindcast: Filling in Past Seasons
Even though digital records for fields are ubiquitous today, yield data from even a few years ago is often scant. To solve this problem of missing historical yield data, satellite images can be used not only to predict yield, but also to extract useful environmental features such as surface temperature and precipitation, which are important factors that affect yield.
The hindcast model is an important stepping stone to filling in data gaps, as it increases the amount of training data available for assessing next year’s field productivity. The hindcast model can also be used to identify the historical health of a field, so that growers are better prepared for a wide variety of environmental factors impacting their yield.
Our hindcast models use methods rooted in machine learning model iteration to infer a single yield value for a given year and a given field. The models use a variety of techniques common to computer vision to predict yield. These yield values are then used to fill in missing yield data values for yield forecasting models.
Currently our hindcast models are being developed for use in the Midwestern United States on fields growing corn, though we are in the process of developing models for additional crops and geographies.
This three-pronged estimation of the past, present, and future yield values both supports and mitigates the risk for the most important (and financially impactful) decisions farmers make each season. Field Productivity, Forecast, and Hindcast are just the tip of the iceberg when it comes to using innovative machine learning techniques to help predict the future for our grower customers. We’re looking forward to the ways in which these efforts to model farm yield will improve farm operations in future seasons.