57 Manufacturing Analytics: Predictive Maintenance and Quality

57.1 Why Manufacturing Analytics Matters

An unplanned line stoppage in a high-mix automotive plant costs more in two hours than analytics costs for the year.

Manufacturing is the function where analytics began — Walter Shewhart’s 1924 control chart predates spreadsheets, dashboards, and most modern statistics. The discipline now reaches from raw-material receipt to dealer warranty, instrumented at every step by sensors, MES (Manufacturing Execution System) logs, ERP transactions, and increasingly by IIoT (Industrial IoT) telemetry. The BI analyst on a manufacturing desk works with shop-floor managers who think in cycles, takt, and first-pass yield, and the dashboards must speak that vocabulary back at them.

For a BI analyst, manufacturing clusters into three jobs. Predictive maintenance analytics answers which machine will fail, when, and what should we do before it does? — vibration and temperature trends, time-to-failure modelling, maintenance-cost optimisation. Quality analytics answers are we making it right the first time, and where are defects coming from? — SPC charts, defect-Pareto, first-pass yield, scrap and rework. Production analytics answers are we running the plant at the rate the demand needs and the equipment can sustain? — Overall Equipment Effectiveness (OEE), takt-time conformance, downtime root-cause, throughput. Douglas C. Montgomery (2019) is the standard reference for statistical quality control — the visualisation idioms (X-bar, R, p-, c-charts) at the heart of every quality dashboard come from this lineage. R. Keith Mobley (2002) frames predictive maintenance not as a technology choice but as a shift from time-based to condition-based intervention, a shift the BI dashboard makes legible to plant managers.

The manufacturing-dashboard contract

Three rules separate manufacturing dashboards from every other kind:

Shop-floor first. The primary screen is a 55-inch andon display visible from the line, not a laptop in an office. Fonts large, colours stark, refresh in seconds.
Loss is the headline currency. Every minute of downtime, every scrap unit, every rework hour translates to rupees on the plant loss tile.
Action lives at the workstation. A dashboard that needs a manager to act is too slow. The dashboard surfaces information that operators, supervisors, and SMED teams can act on within the shift.

57.2 Predictive Maintenance Analytics

R. Keith Mobley (2002) distinguishes three maintenance philosophies — reactive (run to failure), preventive (time-based), and predictive (condition-based). Predictive maintenance uses sensor data and models to intervene only when failure is genuinely approaching, capturing the cost saving of preventive without the asset-utilisation loss of fixed schedules.

Three Maintenance Strategies and Their Visualisations

Strategy	Trigger	Cost profile	Visualisation lens
Reactive	Wait for failure.	Lowest planned cost, highest unplanned cost; line stoppage is the norm.	Failure-event log; downtime Pareto.
Preventive	Fixed time or cycle interval.	Predictable cost; over-services healthy assets.	Compliance dashboard; PM completion rate.
Predictive (PdM)	Condition signals from sensors.	Lowest total cost when implemented well; needs analytics infrastructure.	Sensor trend chart with thresholds; remaining useful life curves.

The condition-monitoring dashboard

A condition-monitoring dashboard tracks vibration, temperature, current draw, oil-debris counts, and other sensor signals against learned baselines. The standard view is a small-multiples line chart, one panel per sensor, with shaded reference bands for healthy, warning, and alert. Crossing the warning band schedules an inspection; crossing the alert band schedules an intervention before the next shift.

The signals themselves matter — vibration spectra and temperature gradients carry early warning of bearing wear, motor degradation, and lubrication failure that the operator’s ear and hand cannot detect until the failure mode is well advanced.

flowchart LR
  A[Sensors<br/>vibration, temp,<br/>current, oil] --> B[Baseline<br/>learned per asset]
  B --> C[Live trend<br/>vs threshold]
  C --> D{Threshold<br/>crossed?}
  D -->|Healthy| E[Continue<br/>monitoring]
  D -->|Warning| F[Schedule<br/>inspection]
  D -->|Alert| G[Intervene<br/>before next shift]
  style E fill:#E6F4EA,stroke:#137333
  style F fill:#FFF7E6,stroke:#F4B400
  style G fill:#FCE8E6,stroke:#D93025

Remaining useful life and the failure-curve chart

A Remaining Useful Life (RUL) model converts the sensor history into an estimated number of operating hours before failure. The visualisation is a fan chart per asset — central RUL estimate plus 80 percent and 95 percent prediction intervals shaded outward from it. As the asset degrades, the fan narrows and the central estimate falls. The chart is the input to the maintenance-planning conversation: which assets need parts ordered now, which can wait, which can be deferred to the next planned shutdown.

The cost of false alarms is real

A predictive-maintenance dashboard that triggers an unnecessary intervention costs nearly as much as missing a failure — both produce downtime, both burn parts and labour. Always pair PdM accuracy with false-positive rate and cost per intervention. R. Keith Mobley (2002) emphasises that the business case for PdM stands or falls on this trade-off, and the dashboard is what makes it visible.

57.3 Quality Analytics: SPC, Defects, and First-Pass Yield

Quality analytics descends directly from Shewhart’s 1924 control chart and Deming’s lifelong championing of statistical thinking on the shop floor. The core idea is simple — distinguish common-cause variation (the natural noise of any process) from special-cause variation (a real shift that demands action). The visualisation is the statistical process control (SPC) chart, and it remains the most influential single chart in the history of industrial analytics.

The SPC chart family

Different chart types fit different data types (Douglas C. Montgomery, 2019):

X-bar and R chart — sample mean and range over time for continuous measurements (dimensions, weights). Detects shifts in central tendency and dispersion separately.
p-chart — proportion of defective units in samples of varying size. Used for pass/fail attribute data.
c-chart — count of defects per unit (e.g., scratches per panel). For Poisson-distributed defect counts.
CUSUM and EWMA — cumulative-sum and exponentially-weighted-moving-average charts. More sensitive to small persistent shifts than the standard Shewhart chart.

The standard rendering shows the data series, the centre line, and three reference lines (UCL, LCL, and ±1σ for the Western Electric rules used to flag out-of-control patterns). A point above UCL or below LCL is the loud alarm; a run of seven consecutive points on one side of centre is the quiet alarm that often precedes the loud one.

Defect Pareto and the 80/20 of quality problems

When a process is producing defects, the next question is which defects? The Pareto chart introduced in Chapter 11 has its highest-leverage application here: defects sorted descending with a cumulative-percentage line. Pareto analysis routinely shows that 80 percent of defects come from 20 percent of root causes; the dashboard names them and ranks remediation effort.

A second view, the defect-by-station heatmap, locates defects in the production line. Rows = workstation, columns = defect type, cell colour = count. Concentrations identify the workstation that is the source — sometimes the obvious one, often surprisingly upstream.

First-pass yield and the rolled throughput yield

First-pass yield (FPY) is the percentage of units that pass through a station correctly the first time, without rework or scrap. Rolled throughput yield (RTY) is the product of FPYs across all stations — and the difference between station-level FPY and end-to-end RTY is often startling. A line with five stations each at 95 percent FPY has an RTY of 77 percent; the same line at 99 percent each has 95 percent. The dashboard view is a small-multiples bar of FPY per station with the rolled product called out as a card.

Capability is not the same as control

A process can be in statistical control (stable, predictable variation) but not capable (the spread sits inside the spec limits). Douglas C. Montgomery (2019) emphasises that the Cp and Cpk indices answer the capability question separately from the SPC chart’s control question. A capable, in-control process has Cpk ≥ 1.33 by convention. The dashboard should show both views side by side; either alone misleads.

57.4 Production Analytics and OEE

Production analytics rolls up the equipment, quality, and labour numbers into a single decision-driving headline: Overall Equipment Effectiveness (OEE).

OEE and its three components

OEE is the product of three rates:

Availability = Run Time / Planned Production Time. Captures unplanned downtime.
Performance = (Ideal Cycle Time × Total Count) / Run Time. Captures speed loss against design rate.
Quality = Good Count / Total Count. Captures rework and scrap loss.

World-class OEE in discrete manufacturing is around 85 percent; many real plants run at 50-65 percent. The visualisation is a stacked bar showing the loss decomposition — Available Time → Planned Downtime → Unplanned Downtime → Speed Loss → Quality Loss → Effective Output. Each block translates to rupees through the cost model; the chart is the canonical six big losses view in Total Productive Maintenance literature.

flowchart LR
  A[Total Calendar Time] --> B[Planned<br/>Production Time]
  B --> C[Run Time]
  C --> D[Net Operating<br/>Time]
  D --> E[Fully Productive<br/>Time]
  B -.->|Planned<br/>downtime| F[Loss]
  C -.->|Unplanned<br/>downtime| G[Availability loss]
  D -.->|Speed loss| H[Performance loss]
  E -.->|Defects| I[Quality loss]
  style E fill:#E6F4EA,stroke:#137333
  style F fill:#E8F0FE,stroke:#1A73E8
  style G fill:#FCE8E6,stroke:#D93025
  style H fill:#FFF7E6,stroke:#F4B400
  style I fill:#F1E8FE,stroke:#673AB7

Downtime Pareto and root-cause coding

The downtime Pareto sorts unplanned-downtime causes by total minutes lost, with a cumulative line. The chart only works if the operators code downtime correctly at the workstation — machine fault, changeover, material starvation, quality stop, operator break. The dashboard, run on an andon display, shows the day’s running total against the same day-of-week last quarter, with the top three causes highlighted.

Andon, takt time, and the line balance chart

Lean manufacturing’s traditional displays — the andon board, the takt-time signal, the line-balance chart — are themselves data visualisations, and modern BI tools render them digitally without losing the discipline. The takt-time chart shows actual cycle time per workstation against the calculated takt rate (available time divided by demand); workstations above the takt line are the bottleneck. A line-balance bar chart with workstations on the x-axis and cycle time on the y-axis lets the supervisor see, in seconds, where to load- or unload-balance to smooth flow.

57.5 Common Pitfalls

What goes wrong

SPC charts without out-of-control rules. A chart with only the centre line and ±3σ misses 80 percent of meaningful signals. Configure Western Electric or Nelson rules.
OEE reported as a single number. Without the three-component decomposition, the headline number cannot be acted on.
Predictive maintenance without false-positive accounting. Unnecessary interventions cost as much as missed failures; show both error types.
Quality dashboards in absolute counts only. A spike in defect counts may just reflect higher production volume; report rates and counts together.
Capability without control. A capable but out-of-control process is dangerously misread.
Downtime Pareto without coding discipline. Other as the top category means operators are not coding causes; fix the input before reading the output.
Andon displays at desktop fonts. A 14-point label is invisible from the line. Design for the viewing distance, not the laptop.
Dashboards that don’t reset by shift. Manufacturing decisions are shift-by-shift; daily totals hide the shift handover where most issues land.
Sensor dashboards without baselines. A vibration value of 4.2 mm/s tells the audience nothing without the asset’s normal range.
Forgetting the operator. The plant manager sees the dashboard, but the operator at the station is the one who can act. Build operator-facing tiles, not just management ones.

57.6 Illustrative Cases

Three case sketches

Yuvijen Forge Components Ltd. predictive maintenance pilot. Plant analytics team instruments the eight largest forging presses with vibration and temperature sensors. A Power BI dashboard with a fan-chart RUL view replaces the fixed quarterly preventive-maintenance schedule on those assets. Within nine months, unplanned press downtime drops from 142 hours to 47 hours, parts-spend on those assets falls 18 percent (because services happen only when needed), and the maintenance team retires three near-miss bearing-failure events that the previous schedule would not have caught.

Yuvijen Forge SPC programme on a critical dimension. Quality team rebuilds the X-bar and R chart for crankshaft journal-diameter machining as a live Tableau dashboard, replacing a paper chart updated four hours after each shift. Western Electric rules flag a slow drift on the second shift two days before the spec breach would have generated scrap. Tool wear is identified, the tool changed mid-shift, and a 1.2 crore scrap event is avoided. The chart becomes the template for SPC across 11 critical dimensions.

Yuvijen Forge OEE turnaround. Operations team replaces a monthly OEE report with a real-time Power BI dashboard on three andon displays. Plant OEE rises from 56 to 71 percent within two quarters — most of the gain is from changeover time (now visible as the largest downtime category) and small-stop accumulation (only visible once minutes were aggregated correctly). The CFO sees the equivalent of one extra production line of capacity without additional capex.

57.7 Hands-On Exercise: Build a Plant OEE and SPC Dashboard

Three-page plant dashboard

Aim. Build a three-page manufacturing-analytics dashboard in Power BI that ties OEE, SPC, and predictive-maintenance trends together, with shop-floor refresh cadence and andon-display layouts. Tableau equivalents are noted.

Scenario. You are the BI lead in operations at Yuvijen Forge Components Ltd. The Plant Director has asked for a dashboard that runs live on andon displays at three lines, with a supervisor dashboard for the shift-change huddle and a Director summary for the weekly steering meeting.

Deliverable. A three-page Power BI report — OEE, SPC, PdM — with a 1-minute refresh, andon-display layout, shift-reset logic, and operator/supervisor/director RLS roles.

57.7.1 Step 1 — Load and model the data

Use Get Data in Power BI to load five MES/IIoT extracts:

production_events.csv — Timestamp, Line, Station, EventType (Run, Stop, ChangeOver, Quality), DownReason, GoodCount, RejectCount.
cycle_times.csv — Timestamp, Line, Station, CycleSeconds, IdealCycle.
quality_measurements.csv — Timestamp, Line, Station, Measurement, Spec_LSL, Spec_USL.
defects.csv — Timestamp, Line, Station, DefectCode, Count.
sensor_telemetry.csv — Timestamp, AssetID, Sensor, Value (long format; vibration, temp, current).

Build a DimDate calendar; mark it. Build a DimAsset table with AssetID, Line, Station, IdealCycleSeconds, BaselineMin, BaselineMax (per sensor). Build a DimShift table mapping wall-clock hours to shift (A 06:00-14:00, B 14:00-22:00, C 22:00-06:00).

57.7.2 Step 2 — Page 1: OEE

Build four visuals.

OEE big number. A card showing current-shift OEE in 60-point font, with prior-shift comparison delta beside it. This is the andon-display headline.

Loss decomposition stack. Stacked bar of Planned Time → Planned Downtime → Unplanned Downtime → Speed Loss → Quality Loss → Effective Output for the current shift, with rupee labels on each block.

Downtime Pareto. Sorted bar of unplanned-downtime causes for the current shift, descending by minutes lost, with cumulative-percent line. Top three highlighted.

Takt-time line balance. Bar chart with Station on x-axis, mean cycle time on y-axis, with a horizontal reference line at calculated takt time. Stations above the takt line flagged red.

DAX measures:

Availability =
DIVIDE(
    [RunTimeMinutes],
    [PlannedProductionTimeMinutes]
)

Performance =
DIVIDE(
    SUM(production_events[GoodCount]) * AVERAGE(DimAsset[IdealCycleSeconds]) / 60,
    [RunTimeMinutes]
)

Quality =
DIVIDE(
    SUM(production_events[GoodCount]),
    SUM(production_events[GoodCount]) + SUM(production_events[RejectCount])
)

OEE = [Availability] * [Performance] * [Quality]

Tableau alternative: card visual; stacked bar; sorted bar with cumulative line; bar with reference line.

57.7.3 Step 3 — Page 2: SPC

Build three visuals.

X-bar chart. Line chart of sample-mean per measurement-group over time, with centre line, UCL, LCL, and ±1σ reference lines. Western Electric rules implemented as DAX flags that conditionally colour out-of-control points red.

R chart. Companion line chart of sample range, with its own UCL, LCL, and centre line.

Cpk and Cp tile plus capability histogram. A card showing current Cp and Cpk, alongside a histogram of measurements with overlaid spec limits and process-capability bell curve. Cpk under 1.33 flagged red.

DAX measures:

XBar = AVERAGE(quality_measurements[Measurement])

Sigma_Subgroup =
STDEV.S(quality_measurements[Measurement])

UCL = [XBar] + 3 * [Sigma_Subgroup] / SQRT([SubgroupSize])

WE_Rule_1_OOC =
IF(
    OR([XBar] > [UCL], [XBar] < [LCL]),
    1, 0
)

Cpk =
MIN(
    DIVIDE(AVERAGE(quality_measurements[Spec_USL]) - [XBar], 3 * [Sigma_Subgroup]),
    DIVIDE([XBar] - AVERAGE(quality_measurements[Spec_LSL]), 3 * [Sigma_Subgroup])
)

Tableau alternative: line with reference lines; calculated field for OOC flag; histogram with computed reference distribution overlaid.

57.7.4 Step 4 — Page 3: Predictive maintenance

Build three visuals.

Asset health heatmap. Matrix with AssetID on rows, Sensor on columns, cell colour = current value vs baseline (green within healthy band, amber in warning, red in alert).

Sensor-trend small-multiples. Line chart small-multiples, one panel per asset, with shaded reference bands for healthy / warning / alert. Each panel shows the last 7 days of vibration, temperature, and current.

RUL fan chart. Per-asset line chart with central RUL estimate plus 80 percent and 95 percent prediction-interval shaded fans. Sortable list ranks assets by shortest RUL.

Tableau alternative: heatmap native; small-multiples via Trellis with reference bands; fan chart via dual-axis with calculated bounds.

57.7.5 Step 5 — Andon-display layout

Build a separate Andon layout for each line, designed for a 55-inch display 4-6 metres from the operator. Three tiles only:

OEE big number with shift trend.
Downtime Pareto top three causes for the shift.
Critical-dimension SPC chart with most recent point.

Font sizes: 60 pt for the headline, 28 pt for chart axes, 24 pt for table rows. Refresh: 60 seconds. Test the layout from the actual line distance before publishing.

57.7.6 Step 6 — Shift-reset logic

Build a DAX dimension that returns the active shift based on current time and the DimShift table. All current-shift tiles use this filter rather than the day filter, so totals reset at shift-change. The previous-shift comparison comes from a calculated table that holds the previous-shift value at the moment of the current shift’s start. Without this logic, the dashboard mixes shift signals and the C-shift handover loses its accountability anchor.

57.7.7 Step 7 — Operator, supervisor, director RLS and audit

Implement Power BI RLS:

Operator role. Sees only their own line; SPC and OEE tiles only.
Supervisor role. Sees all lines in their plant; full OEE, SPC, PdM pages.
Plant Director. Sees the plant rollup plus the weekly summary.
Network Operations. Sees all plants for cross-plant benchmarking.

Audit log retention is mandatory for traceability when a quality investigation reaches back through the dashboard timeline.

Connect to the Visualisation Layer

Manufacturing analytics is the original home of three idioms this book has reused throughout. SPC charts (introduced here, applied as run charts in Chapter 51 operations and Chapter 55 healthcare) remain the canonical signal-vs-noise visualisation. Pareto charts (Chapter 11, Chapter 51, Chapter 53, Chapter 54) descend from Joseph Juran’s quality-management work and live in the manufacturing world more than anywhere else. Loss-decomposition stacks (this chapter, Chapter 50 financial waterfalls) trace lineage to Toyota Production System’s seven wastes visualisation. The mobile and andon-display patterns of Chapter 47 take their cue from manufacturing’s earliest visual displays — the Toyota andon cord predates business intelligence by half a century.

Files and Screen Recordings

Power BI three-page plant dashboard with andon and supervisor layouts (yuvijen-forge-plant.pbix), Tableau equivalent (yuvijen-forge-plant.twbx), workshop dataset (yuvijen-forge-plant-data.xlsx), andon-display build (yuvijen-forge-andon.pbix), and a screen recording of the dashboard tour (yuvijen-forge-plant-walkthrough.mp4) will be embedded here.

Summary

Concept	Description
Manufacturing-Dashboard Contract
Shop-Floor First	Primary screen is a 55-inch andon display visible from the line, not a laptop
Loss in Rupees	Every minute of downtime, every scrap unit, every rework hour translates to rupees
Action at the Workstation	Dashboard surfaces information operators and supervisors can act on within the shift
Three Manufacturing Jobs
Predictive Maintenance	Which machine will fail, when, and what should we do before it does?
Quality Analytics	Are we making it right the first time, and where are defects coming from?
Production Analytics	Are we running at the rate demand needs and equipment can sustain?
Maintenance Strategies
Reactive Maintenance	Wait for failure; lowest planned cost, highest unplanned cost
Preventive Maintenance	Fixed time or cycle interval; predictable cost but over-services healthy assets
Predictive Maintenance Strategy	Condition-based; lowest total cost when implemented well
Predictive Maintenance Tools
Condition-Monitoring Dashboard	Tracks vibration, temperature, current, oil-debris against learned baselines
Sensor Trend Bands	Small-multiples line chart with healthy, warning, alert reference bands per sensor
Remaining Useful Life	Estimated operating hours before failure from sensor history
RUL Fan Chart	Central RUL estimate with 80 and 95 percent prediction-interval fans
False-Positive Cost	Unnecessary interventions cost as much as missed failures; show both error types
SPC Chart Family
X-Bar and R Chart	Sample mean and range over time; centre line, UCL, LCL for continuous data
p-Chart and c-Chart	Proportion-defective and count-of-defects charts for attribute data
CUSUM and EWMA	Cumulative-sum and exponentially-weighted-moving-average for small persistent shifts
Western Electric Rules	Out-of-control rules that flag patterns the standard 3-sigma test misses
Defect and Yield Tools
Defect Pareto	Defects sorted descending with cumulative-percent line; 80/20 of root causes
Defect-by-Station Heatmap	Workstation by defect type with cell colour as count; locates the source
First-Pass Yield	Percent of units passing through a station correctly the first time
Rolled Throughput Yield	Product of FPYs across all stations; often startlingly less than each station
Capability Cp and Cpk	Process-capability indices answering whether spread sits inside spec limits
Capability Versus Control	Capable but out-of-control is dangerously misread; show both views together
OEE and Production
Overall Equipment Effectiveness	Product of Availability, Performance, and Quality rates
Availability	Run Time over Planned Production Time; captures unplanned downtime
Performance	Ideal-cycle output over Run Time; captures speed loss against design rate
Quality (OEE component)	Good Count over Total Count; captures rework and scrap loss
Loss-Decomposition Stack	Stacked bar walking total time through six big losses to effective output
Downtime Pareto	Sorted bar of unplanned-downtime causes by minutes lost with cumulative line
Takt-Time Chart	Actual cycle per station against takt rate; bottleneck visible above the line
Line-Balance Chart	Workstation cycle-time bar chart for load- and unload-balancing
Andon Display	Toyota-era visual signal that predates BI by half a century; modern tools digitise it
Common Pitfalls
Pitfall: SPC Without OOC Rules	Centre line and 3-sigma alone miss 80 percent of meaningful signals
Pitfall: Single-Number OEE	Headline OEE without three-component decomposition cannot be acted on
Pitfall: PdM Without False-Positives	Show false-positive rate and cost per intervention alongside accuracy
Pitfall: Counts Without Rates	Defect counts without rates conflate volume change with quality change
Pitfall: Capability Without Control	A capable but out-of-control process is dangerously misread
Pitfall: Other-Heavy Pareto	Other as top category means operators are not coding causes — fix input first
Pitfall: Desktop-Font Andon	14-point labels are invisible from the line; design for viewing distance
Pitfall: No Shift Reset	Daily totals hide shift handover where most issues land
Pitfall: Sensors Without Baseline	Sensor values mean nothing without the asset's normal range
Pitfall: No Operator-Facing Tiles	Plant manager sees dashboards but operator is the one who can act
Hands-On Plant Dashboard
Page 1 — OEE	OEE big number, loss-decomposition stack, downtime Pareto, takt-time line balance
Page 2 — SPC	X-bar and R, p- and c-, capability histogram with Cp and Cpk tiles
Page 3 — PdM	Asset-health heatmap, sensor-trend small-multiples, RUL fan chart
Andon Layout	Three-tile 55-inch display layout with 60-pt headline, 28-pt axes, 60-second refresh
Shift-Reset Logic	DAX active-shift filter so all current-shift tiles reset at shift-change