Case Studies Predicting Hydroponic Crop Yield from Sensor Data - and Turning It Into Planting Decisions
ml Self-Directed Applied Project · Applied Project

Predicting Hydroponic Crop Yield from Sensor Data - and Turning It Into Planting Decisions

A self-directed applied Machine Learning project: predict final crop yield from hydroponics sensor and growth-stage data, find which controllable variables actually drive yield, then chain those predictions to a demand forecast to produce a weekly planting recommendation matrix.

Problem

Hydroponics is a sensor-rich environment - temperature, humidity, CO₂, nutrient mix, pH, light - yet the planning questions that matter (which crops to start, how nutrient mix affects yield, when to plant for demand) were still answered by intuition rather than the data already being logged.

Outcome

An end-to-end pipeline from raw hourly sensor logs to a weekly planting-recommendation matrix, with feature-importance analysis isolating nitrogen concentration and cumulative light exposure as the two highest-impact controllable yield drivers.

Impact - who used it & what changed

A self-directed applied project demonstrating that yield prediction can be tied to demand forecasting to produce concrete planting decisions; its value is methodological - the modeling and feature-engineering choices - rather than a measured production deployment.

Context

Hydroponics is a data-rich environment. Environmental sensors track temperature, humidity, nutrient levels, and light exposure continuously - far more controllable than soil farming. But logging data and using it are different things, and the data was sitting unused.

This project was about closing that gap end to end: predict crop yield from environmental and operational data, find which of those variables actually move yield, then connect the predictions to demand so the output is an actual decision - what to plant, and how much - rather than a number on a dashboard.

The Problem

Where conditions can be precisely controlled, planning by intuition leaves value on the table. Without a model, basic questions had no evidence-backed answer: Which crops should we grow more of next cycle? How does nutrient mix affect final yield? When should we plant to meet forecasted demand?

Each answer was a good guess - experience-informed, but still a guess. The goal was to replace the guessing with something measurable, and to be honest about which variables are worth acting on.

Data & Inputs

Approach

Three connected models, deliberately sequenced so the output is a decision, not a prediction.

Yield prediction. Decision trees, random forests, and a shallow neural network to predict final yield weight from environmental and growth-stage features. Feature-importance analysis to find the most impactful controllable variables - the ones worth acting on.

Demand forecasting. A time-series model on historical sales by crop type, capturing weekly seasonality and trend.

Planning integration. Yield predictions combined with the demand forecast to generate planting recommendations - how many plants of each crop type to start each week to meet forecasted demand within the predicted yield range.

Results

Technical Detail

Data characteristics. Environmental sensors logged at hourly resolution across multiple growing cycles. The core modeling challenge: sensors in a hydroponics system are highly correlated. Temperature, humidity, and CO₂ all co-vary with the HVAC system, creating multicollinearity that standard linear models amplify rather than handle gracefully - which is a large part of why tree ensembles won out.

Feature engineering.

Model comparison.

Feature-importance findings. Nitrogen concentration and cumulative light exposure (daily light integral, DLI) were the two highest-impact controllable variables. Temperature contributed meaningfully but showed diminishing returns beyond a crop-specific comfort range. The point of this analysis was not just a yield number - it was to identify where intervention would pay off: the model pointed at nutrient dosing and supplemental lighting as the variables most worth controlling.

Planning integration. Yield predictions were chained to a time-series demand forecast (exponential smoothing on historical weekly sales by crop type). The output was a weekly recommendation matrix - crop type × recommended planting quantity - sized to meet 4-week-ahead demand within the predicted yield range. That is the step that turns a yield model into a planning tool.

Stack

Python, Scikit-learn, XGBoost, TensorFlow, Pandas, NumPy, Matplotlib

Stack

Python Scikit-learn XGBoost TensorFlow Pandas Matplotlib
forecasting ml agriculture decision-support

Have a problem worth solving?

Whether you need a quantitative researcher, a Machine Learning systems builder, or a technical advisor, I take a small number of consulting engagements at a time.

Book a call →