How long does a predictive analytics project take?

A typical engagement takes 4–6 weeks from data handover to final report and working prototype. The timeline depends on data quality and the number of analytical questions being investigated.

What data do we need to start a predictive quality project?

You need historical production data — typically in spreadsheets or exported from your ERP or MES system. This includes process parameters (temperatures, times, pressures), raw material attributes, and quality outcomes. At least 6–12 months of data with 200+ records gives reliable results.

What if the data doesn't support a prediction?

That's a valid and valuable outcome. A well-executed analysis that concludes 'the data does not support this prediction' saves you from investing further in the wrong direction. We deliver decision-grade evidence, not guaranteed outcomes.

Case Study

Data Analytics & Predictive Modelling

Predictive Quality Analytics for a Leading Irish Food Manufacturer

Identifying which production attributes actually drive product quality — and building a tool to predict it.

IndustryFood Manufacturing

ServiceData Analytics

Timeline4 weeks

Project LeadBarry Gough

350+ products analysed150+ attributes per product4-week engagementWorking prediction tool

At a Glance

Client

Leading Irish food manufacturer

Challenge

Determine whether 150+ production attributes can predict finished product quality

Approach

Statistical analysis + machine learning on 350+ product records with trained panel quality scores

Deliverables

R² correlation matrices, feature importance rankings, predictive ML models, quartile analysis, working prototype tool

Timeline

4 weeks from data handover to final delivery

The Challenge

A leading Irish food manufacturer had invested significantly in collecting detailed data across their production process. For every product batch, they recorded over 150 attributes — raw material specifications, processing parameters, environmental conditions, and time-temperature profiles tracked across multiple stages of production.

They also had something most manufacturers don't: formal quality assessments from trained evaluation panels, scoring finished products across multiple quality dimensions.

The data sat in spreadsheets. The quality team knew their products varied, but they couldn't explain why. Some batches consistently scored well. Others underperformed. The patterns weren't visible in Excel.

The core questions were:

Do the production attributes we're measuring actually correlate with finished product quality?

Which attributes matter most? Which can we stop measuring?

Can we predict quality outcomes from production data alone — before the product reaches evaluation?

Do different product lines behave differently, or is there a universal quality driver?

This wasn't an AI-for-the-sake-of-AI project. It was a commercial question: can we use data we already collect to make better production decisions?

Our Approach

Barry Gough, our COO, led the project with support from our specialist data science research team — including postdoctoral researchers and production ML engineers from our university partnership.

We took a systematic approach combining statistical rigour with modern machine learning:

Data Preparation and Exploration

Week 1

The raw data covered 350+ product records with 150+ attributes each. Before any modelling, we spent the first week on data quality: auditing for missing values, outliers, and inconsistent coding. We restructured the data into a clean, reusable schema designed to accommodate future production runs.

A critical step was feature engineering from the time-temperature profiles. Raw sensor curves can't be fed directly into models, so we extracted meaningful features: initial values, rates of change, time to key thresholds, area under the curve, and inflection points. These engineered features captured the process dynamics that matter for quality outcomes.

Statistical Analysis

Week 2

We generated comprehensive R² correlation matrices showing how each production attribute — individually and in combination — relates to each quality dimension. This highlighted which measurements actually matter and which add no predictive value.

We also ran reverse analysis: taking the top and bottom quartile products and working backwards to identify what characterises the best versus the worst performers. This often reveals patterns that forward modelling misses.

Machine Learning Models

Week 3

We evaluated multiple approaches to find what works best for this specific data: gradient boosting (XGBoost), random forests, support vector machines, and explainable boosting machines. We tested both regression (predicting exact scores) and classification (predicting quality grades) to determine which gives the most practical results.

All validation was performed at the batch level using proper cross-validation — ensuring honest accuracy estimates that reflect real-world performance, not overfitting to the training data.

We used SHAP analysis to make every prediction explainable: not just what the model predicts, but why.

Deliverables and Prototype

Week 4

We packaged everything into clear, actionable outputs: a comprehensive technical report, an executive summary for senior stakeholders, and a working prototype tool that lets the quality team input production attributes and see predicted quality scores with explanations.

What We Delivered

Clean Data Schema — The raw spreadsheet data restructured into a format designed for ongoing analysis. Built to accommodate future production data without repeating the preparation work.

R² Correlation Matrices — Comprehensive tables showing which production attributes correlate most strongly with each quality dimension. Revealed that a small subset of attributes drives the majority of quality variation.

Feature Importance Rankings — A ranked list of which attributes matter most. Identified the "golden attributes" that drive prediction and highlighted which data points add little value — potentially saving measurement effort.

Quartile Analysis — Detailed breakdown of what characterises top-performing products versus bottom-performing products. Patterns the quality team hadn't been able to see in spreadsheets.

Predictive Models — Trained machine learning models for predicting quality outcomes per product line and overall. Model artefacts and code included so the client can apply them to new production data.

Working Prototype Tool — An interactive application where the quality team can input production attributes and see predicted quality scores with explanations of which factors are driving the prediction.

Technical Report and Executive Summary — Full documentation covering data used, models developed, evaluation methodology, results, key insights, and recommendations for next steps. Written for both technical and non-technical stakeholders.

The Outcome

The analysis confirmed that a subset of production attributes — significantly fewer than the 150+ being measured — drives the majority of quality variation. The client now knows which measurements matter and which they could potentially reduce or eliminate.

The predictive models achieved meaningful accuracy for quality scoring, giving the production team a tool they can use alongside their existing processes. The quartile analysis revealed patterns in raw material characteristics and processing conditions that the quality team had suspected but couldn't previously evidence.

The engagement was structured as a fixed-price Phase 1 project. An optional Phase 2 was scoped for production tooling and a sampling protocol — designed to be informed by Phase 1's findings about which attributes prove most predictive.

Important context: Data science projects deliver evidence, not guaranteed outcomes. If this analysis had concluded that the data doesn't support reliable quality prediction, that would still be a valuable deliverable — it saves the client from investing further in the wrong direction. We state this upfront in every data science engagement.

Results vary by business. Figures shown are measured and estimated during delivery.

Why This Worked

Senior-Led Delivery

Barry Gough managed day-to-day delivery with 20 years of enterprise technology experience. Our specialist data science team includes postdoctoral researchers and production ML engineers — not junior analysts.

Honest About Outcomes

We sell decision-grade evidence, not guaranteed predictions. We state upfront that negative findings — data that doesn't support a hypothesis — are valid deliverables. This builds trust and ensures the client gets genuine insight.

Practical, Not Academic

Every output was designed for business use: prototype tools for the quality team, executive summaries for the board, and clean data schemas for ongoing analysis. Not a research paper that sits on a shelf.

Frequently Asked Questions

Have Production Data You're Not Using?

A 20-minute call to understand your data and explore whether predictive analytics could help your quality, yield, or operations.

Or email us at hello@deeppurple.ai

Want to understand the process first? See how we work →

About Barry Gough

COO, Deep Purple AI Consulting

Barry Gough is the COO of Deep Purple AI Consulting. With an MSc in Computer Science from University College Dublin — where machine learning was a core focus of his studies — and over 20 years building production software systems, Barry brings formal ML training and deep hands-on engineering experience to every AI and data analytics engagement.

Barry completed his masters at UCD in 2011, studying ML algorithms, statistical modelling, and data-driven systems at a pivotal moment — just as big data techniques were maturing and deep learning was about to transform the industry. At Purpledecks (Deep Purple's predecessor consultancy), he spent nearly a decade progressing from Senior Developer to Head of Operations, leading the technical delivery of enterprise projects that increasingly incorporated machine learning, computer vision, data classification, predictive features, and recommendation engines for commercial clients across Ireland and the UK.

In 2023, as CTO of Reactable AI, Barry architected and built an autonomous AI marketing engine from the ground up — a self-learning system that generates and optimises marketing campaigns across channels. This was one of Ireland's earliest production deployments of autonomous AI agents, requiring him to design systems where AI made real decisions with real consequences.

At Deep Purple, Barry leads all technical delivery — AI system architecture, machine learning model development, data pipeline engineering — and manages a team of PhD-level data scientists. His combination of formal ML education, a decade of incorporating AI into commercial projects, and hands-on experience architecting autonomous AI systems means clients work directly with a technical lead who can make genuine engineering decisions about AI.

Connect on LinkedIn Email

Deep Purple AI Consulting (deeppurple.ai) is an AI consultancy and custom software development company based in Ireland. We help established businesses identify where AI can make a real difference, then build the systems to make it happen. Senior-only delivery. Grant-funded where possible. No hype.