Identifying which production attributes actually drive product quality — and building a tool to predict it.
A leading Irish food manufacturer had invested significantly in collecting detailed data across their production process. For every product batch, they recorded over 150 attributes — raw material specifications, processing parameters, environmental conditions, and time-temperature profiles tracked across multiple stages of production.
They also had something most manufacturers don't: formal quality assessments from trained evaluation panels, scoring finished products across multiple quality dimensions.
The data sat in spreadsheets. The quality team knew their products varied, but they couldn't explain why. Some batches consistently scored well. Others underperformed. The patterns weren't visible in Excel.
The core questions were:
Do the production attributes we're measuring actually correlate with finished product quality?
Which attributes matter most? Which can we stop measuring?
Can we predict quality outcomes from production data alone — before the product reaches evaluation?
Do different product lines behave differently, or is there a universal quality driver?
This wasn't an AI-for-the-sake-of-AI project. It was a commercial question: can we use data we already collect to make better production decisions?
Barry Gough, our COO, led the project with support from our specialist data science research team — including postdoctoral researchers and production ML engineers from our university partnership.
We took a systematic approach combining statistical rigour with modern machine learning:
The raw data covered 350+ product records with 150+ attributes each. Before any modelling, we spent the first week on data quality: auditing for missing values, outliers, and inconsistent coding. We restructured the data into a clean, reusable schema designed to accommodate future production runs.
A critical step was feature engineering from the time-temperature profiles. Raw sensor curves can't be fed directly into models, so we extracted meaningful features: initial values, rates of change, time to key thresholds, area under the curve, and inflection points. These engineered features captured the process dynamics that matter for quality outcomes.
We generated comprehensive R² correlation matrices showing how each production attribute — individually and in combination — relates to each quality dimension. This highlighted which measurements actually matter and which add no predictive value.
We also ran reverse analysis: taking the top and bottom quartile products and working backwards to identify what characterises the best versus the worst performers. This often reveals patterns that forward modelling misses.
We evaluated multiple approaches to find what works best for this specific data: gradient boosting (XGBoost), random forests, support vector machines, and explainable boosting machines. We tested both regression (predicting exact scores) and classification (predicting quality grades) to determine which gives the most practical results.
All validation was performed at the batch level using proper cross-validation — ensuring honest accuracy estimates that reflect real-world performance, not overfitting to the training data.
We used SHAP analysis to make every prediction explainable: not just what the model predicts, but why.
We packaged everything into clear, actionable outputs: a comprehensive technical report, an executive summary for senior stakeholders, and a working prototype tool that lets the quality team input production attributes and see predicted quality scores with explanations.
Clean Data Schema — The raw spreadsheet data restructured into a format designed for ongoing analysis. Built to accommodate future production data without repeating the preparation work.
R² Correlation Matrices — Comprehensive tables showing which production attributes correlate most strongly with each quality dimension. Revealed that a small subset of attributes drives the majority of quality variation.
Feature Importance Rankings — A ranked list of which attributes matter most. Identified the "golden attributes" that drive prediction and highlighted which data points add little value — potentially saving measurement effort.
Quartile Analysis — Detailed breakdown of what characterises top-performing products versus bottom-performing products. Patterns the quality team hadn't been able to see in spreadsheets.
Predictive Models — Trained machine learning models for predicting quality outcomes per product line and overall. Model artefacts and code included so the client can apply them to new production data.
Working Prototype Tool — An interactive application where the quality team can input production attributes and see predicted quality scores with explanations of which factors are driving the prediction.
Technical Report and Executive Summary — Full documentation covering data used, models developed, evaluation methodology, results, key insights, and recommendations for next steps. Written for both technical and non-technical stakeholders.
The analysis confirmed that a subset of production attributes — significantly fewer than the 150+ being measured — drives the majority of quality variation. The client now knows which measurements matter and which they could potentially reduce or eliminate.
The predictive models achieved meaningful accuracy for quality scoring, giving the production team a tool they can use alongside their existing processes. The quartile analysis revealed patterns in raw material characteristics and processing conditions that the quality team had suspected but couldn't previously evidence.
The engagement was structured as a fixed-price Phase 1 project. An optional Phase 2 was scoped for production tooling and a sampling protocol — designed to be informed by Phase 1's findings about which attributes prove most predictive.
Important context: Data science projects deliver evidence, not guaranteed outcomes. If this analysis had concluded that the data doesn't support reliable quality prediction, that would still be a valuable deliverable — it saves the client from investing further in the wrong direction. We state this upfront in every data science engagement.
Results vary by business. Figures shown are measured and estimated during delivery.
Barry Gough managed day-to-day delivery with 20 years of enterprise technology experience. Our specialist data science team includes postdoctoral researchers and production ML engineers — not junior analysts.
We sell decision-grade evidence, not guaranteed predictions. We state upfront that negative findings — data that doesn't support a hypothesis — are valid deliverables. This builds trust and ensures the client gets genuine insight.
Every output was designed for business use: prototype tools for the quality team, executive summaries for the board, and clean data schemas for ongoing analysis. Not a research paper that sits on a shelf.
A 20-minute call to understand your data and explore whether predictive analytics could help your quality, yield, or operations.
Or email us at hello@deeppurple.ai
Want to understand the process first? See how we work →
COO, Deep Purple AI Consulting
Barry Gough is the COO of Deep Purple AI Consulting. With an MSc in Computer Science from University College Dublin — where machine learning was a core focus of his studies — and over 20 years building production software systems, Barry brings formal ML training and deep hands-on engineering experience to every AI and data analytics engagement.
Barry completed his masters at UCD in 2011, studying ML algorithms, statistical modelling, and data-driven systems at a pivotal moment — just as big data techniques were maturing and deep learning was about to transform the industry. At Purpledecks (Deep Purple's predecessor consultancy), he spent nearly a decade progressing from Senior Developer to Head of Operations, leading the technical delivery of enterprise projects that increasingly incorporated machine learning, computer vision, data classification, predictive features, and recommendation engines for commercial clients across Ireland and the UK.
In 2023, as CTO of Reactable AI, Barry architected and built an autonomous AI marketing engine from the ground up — a self-learning system that generates and optimises marketing campaigns across channels. This was one of Ireland's earliest production deployments of autonomous AI agents, requiring him to design systems where AI made real decisions with real consequences.
At Deep Purple, Barry leads all technical delivery — AI system architecture, machine learning model development, data pipeline engineering — and manages a team of PhD-level data scientists. His combination of formal ML education, a decade of incorporating AI into commercial projects, and hands-on experience architecting autonomous AI systems means clients work directly with a technical lead who can make genuine engineering decisions about AI.
Deep Purple AI Consulting (deeppurple.ai) is an AI consultancy and custom software development company based in Ireland. We help established businesses identify where AI can make a real difference, then build the systems to make it happen. Senior-only delivery. Grant-funded where possible. No hype.
We use cookies to ensure our website works properly and to help us improve it. You can accept all cookies or customise your preferences. See our Cookie Policy for details.