flowchart LR
A[Sensors<br/>vibration, temp,<br/>current, oil] --> B[Baseline<br/>learned per asset]
B --> C[Live trend<br/>vs threshold]
C --> D{Threshold<br/>crossed?}
D -->|Healthy| E[Continue<br/>monitoring]
D -->|Warning| F[Schedule<br/>inspection]
D -->|Alert| G[Intervene<br/>before next shift]
style E fill:#E6F4EA,stroke:#137333
style F fill:#FFF7E6,stroke:#F4B400
style G fill:#FCE8E6,stroke:#D93025
57 Manufacturing Analytics: Predictive Maintenance and Quality
57.1 Why Manufacturing Analytics Matters
An unplanned line stoppage in a high-mix automotive plant costs more in two hours than analytics costs for the year.
Manufacturing is the function where analytics began — Walter Shewhart’s 1924 control chart predates spreadsheets, dashboards, and most modern statistics. The discipline now reaches from raw-material receipt to dealer warranty, instrumented at every step by sensors, MES (Manufacturing Execution System) logs, ERP transactions, and increasingly by IIoT (Industrial IoT) telemetry. The BI analyst on a manufacturing desk works with shop-floor managers who think in cycles, takt, and first-pass yield, and the dashboards must speak that vocabulary back at them.
For a BI analyst, manufacturing clusters into three jobs. Predictive maintenance analytics answers which machine will fail, when, and what should we do before it does? — vibration and temperature trends, time-to-failure modelling, maintenance-cost optimisation. Quality analytics answers are we making it right the first time, and where are defects coming from? — SPC charts, defect-Pareto, first-pass yield, scrap and rework. Production analytics answers are we running the plant at the rate the demand needs and the equipment can sustain? — Overall Equipment Effectiveness (OEE), takt-time conformance, downtime root-cause, throughput. Douglas C. Montgomery (2019) is the standard reference for statistical quality control — the visualisation idioms (X-bar, R, p-, c-charts) at the heart of every quality dashboard come from this lineage. R. Keith Mobley (2002) frames predictive maintenance not as a technology choice but as a shift from time-based to condition-based intervention, a shift the BI dashboard makes legible to plant managers.
Three rules separate manufacturing dashboards from every other kind:
- Shop-floor first. The primary screen is a 55-inch andon display visible from the line, not a laptop in an office. Fonts large, colours stark, refresh in seconds.
- Loss is the headline currency. Every minute of downtime, every scrap unit, every rework hour translates to rupees on the plant loss tile.
- Action lives at the workstation. A dashboard that needs a manager to act is too slow. The dashboard surfaces information that operators, supervisors, and SMED teams can act on within the shift.
57.2 Predictive Maintenance Analytics
R. Keith Mobley (2002) distinguishes three maintenance philosophies — reactive (run to failure), preventive (time-based), and predictive (condition-based). Predictive maintenance uses sensor data and models to intervene only when failure is genuinely approaching, capturing the cost saving of preventive without the asset-utilisation loss of fixed schedules.
| Strategy | Trigger | Cost profile | Visualisation lens |
|---|---|---|---|
| Reactive | Wait for failure. | Lowest planned cost, highest unplanned cost; line stoppage is the norm. | Failure-event log; downtime Pareto. |
| Preventive | Fixed time or cycle interval. | Predictable cost; over-services healthy assets. | Compliance dashboard; PM completion rate. |
| Predictive (PdM) | Condition signals from sensors. | Lowest total cost when implemented well; needs analytics infrastructure. | Sensor trend chart with thresholds; remaining useful life curves. |
A condition-monitoring dashboard tracks vibration, temperature, current draw, oil-debris counts, and other sensor signals against learned baselines. The standard view is a small-multiples line chart, one panel per sensor, with shaded reference bands for healthy, warning, and alert. Crossing the warning band schedules an inspection; crossing the alert band schedules an intervention before the next shift.
The signals themselves matter — vibration spectra and temperature gradients carry early warning of bearing wear, motor degradation, and lubrication failure that the operator’s ear and hand cannot detect until the failure mode is well advanced.
A Remaining Useful Life (RUL) model converts the sensor history into an estimated number of operating hours before failure. The visualisation is a fan chart per asset — central RUL estimate plus 80 percent and 95 percent prediction intervals shaded outward from it. As the asset degrades, the fan narrows and the central estimate falls. The chart is the input to the maintenance-planning conversation: which assets need parts ordered now, which can wait, which can be deferred to the next planned shutdown.
A predictive-maintenance dashboard that triggers an unnecessary intervention costs nearly as much as missing a failure — both produce downtime, both burn parts and labour. Always pair PdM accuracy with false-positive rate and cost per intervention. R. Keith Mobley (2002) emphasises that the business case for PdM stands or falls on this trade-off, and the dashboard is what makes it visible.
57.3 Quality Analytics: SPC, Defects, and First-Pass Yield
Quality analytics descends directly from Shewhart’s 1924 control chart and Deming’s lifelong championing of statistical thinking on the shop floor. The core idea is simple — distinguish common-cause variation (the natural noise of any process) from special-cause variation (a real shift that demands action). The visualisation is the statistical process control (SPC) chart, and it remains the most influential single chart in the history of industrial analytics.
Different chart types fit different data types (Douglas C. Montgomery, 2019):
- X-bar and R chart — sample mean and range over time for continuous measurements (dimensions, weights). Detects shifts in central tendency and dispersion separately.
- p-chart — proportion of defective units in samples of varying size. Used for pass/fail attribute data.
- c-chart — count of defects per unit (e.g., scratches per panel). For Poisson-distributed defect counts.
- CUSUM and EWMA — cumulative-sum and exponentially-weighted-moving-average charts. More sensitive to small persistent shifts than the standard Shewhart chart.
The standard rendering shows the data series, the centre line, and three reference lines (UCL, LCL, and ±1σ for the Western Electric rules used to flag out-of-control patterns). A point above UCL or below LCL is the loud alarm; a run of seven consecutive points on one side of centre is the quiet alarm that often precedes the loud one.
When a process is producing defects, the next question is which defects? The Pareto chart introduced in Chapter 11 has its highest-leverage application here: defects sorted descending with a cumulative-percentage line. Pareto analysis routinely shows that 80 percent of defects come from 20 percent of root causes; the dashboard names them and ranks remediation effort.
A second view, the defect-by-station heatmap, locates defects in the production line. Rows = workstation, columns = defect type, cell colour = count. Concentrations identify the workstation that is the source — sometimes the obvious one, often surprisingly upstream.
First-pass yield (FPY) is the percentage of units that pass through a station correctly the first time, without rework or scrap. Rolled throughput yield (RTY) is the product of FPYs across all stations — and the difference between station-level FPY and end-to-end RTY is often startling. A line with five stations each at 95 percent FPY has an RTY of 77 percent; the same line at 99 percent each has 95 percent. The dashboard view is a small-multiples bar of FPY per station with the rolled product called out as a card.
A process can be in statistical control (stable, predictable variation) but not capable (the spread sits inside the spec limits). Douglas C. Montgomery (2019) emphasises that the Cp and Cpk indices answer the capability question separately from the SPC chart’s control question. A capable, in-control process has Cpk ≥ 1.33 by convention. The dashboard should show both views side by side; either alone misleads.
57.4 Production Analytics and OEE
Production analytics rolls up the equipment, quality, and labour numbers into a single decision-driving headline: Overall Equipment Effectiveness (OEE).
OEE is the product of three rates:
- Availability = Run Time / Planned Production Time. Captures unplanned downtime.
- Performance = (Ideal Cycle Time × Total Count) / Run Time. Captures speed loss against design rate.
- Quality = Good Count / Total Count. Captures rework and scrap loss.
World-class OEE in discrete manufacturing is around 85 percent; many real plants run at 50-65 percent. The visualisation is a stacked bar showing the loss decomposition — Available Time → Planned Downtime → Unplanned Downtime → Speed Loss → Quality Loss → Effective Output. Each block translates to rupees through the cost model; the chart is the canonical six big losses view in Total Productive Maintenance literature.
flowchart LR A[Total Calendar Time] --> B[Planned<br/>Production Time] B --> C[Run Time] C --> D[Net Operating<br/>Time] D --> E[Fully Productive<br/>Time] B -.->|Planned<br/>downtime| F[Loss] C -.->|Unplanned<br/>downtime| G[Availability loss] D -.->|Speed loss| H[Performance loss] E -.->|Defects| I[Quality loss] style E fill:#E6F4EA,stroke:#137333 style F fill:#E8F0FE,stroke:#1A73E8 style G fill:#FCE8E6,stroke:#D93025 style H fill:#FFF7E6,stroke:#F4B400 style I fill:#F1E8FE,stroke:#673AB7
The downtime Pareto sorts unplanned-downtime causes by total minutes lost, with a cumulative line. The chart only works if the operators code downtime correctly at the workstation — machine fault, changeover, material starvation, quality stop, operator break. The dashboard, run on an andon display, shows the day’s running total against the same day-of-week last quarter, with the top three causes highlighted.
Lean manufacturing’s traditional displays — the andon board, the takt-time signal, the line-balance chart — are themselves data visualisations, and modern BI tools render them digitally without losing the discipline. The takt-time chart shows actual cycle time per workstation against the calculated takt rate (available time divided by demand); workstations above the takt line are the bottleneck. A line-balance bar chart with workstations on the x-axis and cycle time on the y-axis lets the supervisor see, in seconds, where to load- or unload-balance to smooth flow.
57.5 Common Pitfalls
- SPC charts without out-of-control rules. A chart with only the centre line and ±3σ misses 80 percent of meaningful signals. Configure Western Electric or Nelson rules.
- OEE reported as a single number. Without the three-component decomposition, the headline number cannot be acted on.
- Predictive maintenance without false-positive accounting. Unnecessary interventions cost as much as missed failures; show both error types.
- Quality dashboards in absolute counts only. A spike in defect counts may just reflect higher production volume; report rates and counts together.
- Capability without control. A capable but out-of-control process is dangerously misread.
- Downtime Pareto without coding discipline. Other as the top category means operators are not coding causes; fix the input before reading the output.
- Andon displays at desktop fonts. A 14-point label is invisible from the line. Design for the viewing distance, not the laptop.
- Dashboards that don’t reset by shift. Manufacturing decisions are shift-by-shift; daily totals hide the shift handover where most issues land.
- Sensor dashboards without baselines. A vibration value of 4.2 mm/s tells the audience nothing without the asset’s normal range.
- Forgetting the operator. The plant manager sees the dashboard, but the operator at the station is the one who can act. Build operator-facing tiles, not just management ones.
57.6 Illustrative Cases
Yuvijen Forge Components Ltd. predictive maintenance pilot. Plant analytics team instruments the eight largest forging presses with vibration and temperature sensors. A Power BI dashboard with a fan-chart RUL view replaces the fixed quarterly preventive-maintenance schedule on those assets. Within nine months, unplanned press downtime drops from 142 hours to 47 hours, parts-spend on those assets falls 18 percent (because services happen only when needed), and the maintenance team retires three near-miss bearing-failure events that the previous schedule would not have caught.
Yuvijen Forge SPC programme on a critical dimension. Quality team rebuilds the X-bar and R chart for crankshaft journal-diameter machining as a live Tableau dashboard, replacing a paper chart updated four hours after each shift. Western Electric rules flag a slow drift on the second shift two days before the spec breach would have generated scrap. Tool wear is identified, the tool changed mid-shift, and a 1.2 crore scrap event is avoided. The chart becomes the template for SPC across 11 critical dimensions.
Yuvijen Forge OEE turnaround. Operations team replaces a monthly OEE report with a real-time Power BI dashboard on three andon displays. Plant OEE rises from 56 to 71 percent within two quarters — most of the gain is from changeover time (now visible as the largest downtime category) and small-stop accumulation (only visible once minutes were aggregated correctly). The CFO sees the equivalent of one extra production line of capacity without additional capex.
57.7 Hands-On Exercise: Build a Plant OEE and SPC Dashboard
Aim. Build a three-page manufacturing-analytics dashboard in Power BI that ties OEE, SPC, and predictive-maintenance trends together, with shop-floor refresh cadence and andon-display layouts. Tableau equivalents are noted.
Scenario. You are the BI lead in operations at Yuvijen Forge Components Ltd. The Plant Director has asked for a dashboard that runs live on andon displays at three lines, with a supervisor dashboard for the shift-change huddle and a Director summary for the weekly steering meeting.
Deliverable. A three-page Power BI report — OEE, SPC, PdM — with a 1-minute refresh, andon-display layout, shift-reset logic, and operator/supervisor/director RLS roles.
57.7.1 Step 1 — Load and model the data
Use Get Data in Power BI to load five MES/IIoT extracts:
-
production_events.csv— Timestamp, Line, Station, EventType (Run, Stop, ChangeOver, Quality), DownReason, GoodCount, RejectCount. -
cycle_times.csv— Timestamp, Line, Station, CycleSeconds, IdealCycle. -
quality_measurements.csv— Timestamp, Line, Station, Measurement, Spec_LSL, Spec_USL. -
defects.csv— Timestamp, Line, Station, DefectCode, Count. -
sensor_telemetry.csv— Timestamp, AssetID, Sensor, Value (long format; vibration, temp, current).
Build a DimDate calendar; mark it. Build a DimAsset table with AssetID, Line, Station, IdealCycleSeconds, BaselineMin, BaselineMax (per sensor). Build a DimShift table mapping wall-clock hours to shift (A 06:00-14:00, B 14:00-22:00, C 22:00-06:00).
57.7.2 Step 2 — Page 1: OEE
Build four visuals.
OEE big number. A card showing current-shift OEE in 60-point font, with prior-shift comparison delta beside it. This is the andon-display headline.
Loss decomposition stack. Stacked bar of Planned Time → Planned Downtime → Unplanned Downtime → Speed Loss → Quality Loss → Effective Output for the current shift, with rupee labels on each block.
Downtime Pareto. Sorted bar of unplanned-downtime causes for the current shift, descending by minutes lost, with cumulative-percent line. Top three highlighted.
Takt-time line balance. Bar chart with Station on x-axis, mean cycle time on y-axis, with a horizontal reference line at calculated takt time. Stations above the takt line flagged red.
DAX measures:
Availability =
DIVIDE(
[RunTimeMinutes],
[PlannedProductionTimeMinutes]
)
Performance =
DIVIDE(
SUM(production_events[GoodCount]) * AVERAGE(DimAsset[IdealCycleSeconds]) / 60,
[RunTimeMinutes]
)
Quality =
DIVIDE(
SUM(production_events[GoodCount]),
SUM(production_events[GoodCount]) + SUM(production_events[RejectCount])
)
OEE = [Availability] * [Performance] * [Quality]
Tableau alternative: card visual; stacked bar; sorted bar with cumulative line; bar with reference line.
57.7.3 Step 3 — Page 2: SPC
Build three visuals.
X-bar chart. Line chart of sample-mean per measurement-group over time, with centre line, UCL, LCL, and ±1σ reference lines. Western Electric rules implemented as DAX flags that conditionally colour out-of-control points red.
R chart. Companion line chart of sample range, with its own UCL, LCL, and centre line.
Cpk and Cp tile plus capability histogram. A card showing current Cp and Cpk, alongside a histogram of measurements with overlaid spec limits and process-capability bell curve. Cpk under 1.33 flagged red.
DAX measures:
XBar = AVERAGE(quality_measurements[Measurement])
Sigma_Subgroup =
STDEV.S(quality_measurements[Measurement])
UCL = [XBar] + 3 * [Sigma_Subgroup] / SQRT([SubgroupSize])
WE_Rule_1_OOC =
IF(
OR([XBar] > [UCL], [XBar] < [LCL]),
1, 0
)
Cpk =
MIN(
DIVIDE(AVERAGE(quality_measurements[Spec_USL]) - [XBar], 3 * [Sigma_Subgroup]),
DIVIDE([XBar] - AVERAGE(quality_measurements[Spec_LSL]), 3 * [Sigma_Subgroup])
)
Tableau alternative: line with reference lines; calculated field for OOC flag; histogram with computed reference distribution overlaid.
57.7.4 Step 4 — Page 3: Predictive maintenance
Build three visuals.
Asset health heatmap. Matrix with AssetID on rows, Sensor on columns, cell colour = current value vs baseline (green within healthy band, amber in warning, red in alert).
Sensor-trend small-multiples. Line chart small-multiples, one panel per asset, with shaded reference bands for healthy / warning / alert. Each panel shows the last 7 days of vibration, temperature, and current.
RUL fan chart. Per-asset line chart with central RUL estimate plus 80 percent and 95 percent prediction-interval shaded fans. Sortable list ranks assets by shortest RUL.
Tableau alternative: heatmap native; small-multiples via Trellis with reference bands; fan chart via dual-axis with calculated bounds.
57.7.5 Step 5 — Andon-display layout
Build a separate Andon layout for each line, designed for a 55-inch display 4-6 metres from the operator. Three tiles only:
- OEE big number with shift trend.
- Downtime Pareto top three causes for the shift.
- Critical-dimension SPC chart with most recent point.
Font sizes: 60 pt for the headline, 28 pt for chart axes, 24 pt for table rows. Refresh: 60 seconds. Test the layout from the actual line distance before publishing.
57.7.6 Step 6 — Shift-reset logic
Build a DAX dimension that returns the active shift based on current time and the DimShift table. All current-shift tiles use this filter rather than the day filter, so totals reset at shift-change. The previous-shift comparison comes from a calculated table that holds the previous-shift value at the moment of the current shift’s start. Without this logic, the dashboard mixes shift signals and the C-shift handover loses its accountability anchor.
57.7.7 Step 7 — Operator, supervisor, director RLS and audit
Implement Power BI RLS:
- Operator role. Sees only their own line; SPC and OEE tiles only.
- Supervisor role. Sees all lines in their plant; full OEE, SPC, PdM pages.
- Plant Director. Sees the plant rollup plus the weekly summary.
- Network Operations. Sees all plants for cross-plant benchmarking.
Audit log retention is mandatory for traceability when a quality investigation reaches back through the dashboard timeline.
Manufacturing analytics is the original home of three idioms this book has reused throughout. SPC charts (introduced here, applied as run charts in Chapter 51 operations and Chapter 55 healthcare) remain the canonical signal-vs-noise visualisation. Pareto charts (Chapter 11, Chapter 51, Chapter 53, Chapter 54) descend from Joseph Juran’s quality-management work and live in the manufacturing world more than anywhere else. Loss-decomposition stacks (this chapter, Chapter 50 financial waterfalls) trace lineage to Toyota Production System’s seven wastes visualisation. The mobile and andon-display patterns of Chapter 47 take their cue from manufacturing’s earliest visual displays — the Toyota andon cord predates business intelligence by half a century.
Power BI three-page plant dashboard with andon and supervisor layouts (yuvijen-forge-plant.pbix), Tableau equivalent (yuvijen-forge-plant.twbx), workshop dataset (yuvijen-forge-plant-data.xlsx), andon-display build (yuvijen-forge-andon.pbix), and a screen recording of the dashboard tour (yuvijen-forge-plant-walkthrough.mp4) will be embedded here.
Summary
| Concept | Description |
|---|---|
| Manufacturing-Dashboard Contract | |
| Shop-Floor First | Primary screen is a 55-inch andon display visible from the line, not a laptop |
| Loss in Rupees | Every minute of downtime, every scrap unit, every rework hour translates to rupees |
| Action at the Workstation | Dashboard surfaces information operators and supervisors can act on within the shift |
| Three Manufacturing Jobs | |
| Predictive Maintenance | Which machine will fail, when, and what should we do before it does? |
| Quality Analytics | Are we making it right the first time, and where are defects coming from? |
| Production Analytics | Are we running at the rate demand needs and equipment can sustain? |
| Maintenance Strategies | |
| Reactive Maintenance | Wait for failure; lowest planned cost, highest unplanned cost |
| Preventive Maintenance | Fixed time or cycle interval; predictable cost but over-services healthy assets |
| Predictive Maintenance Strategy | Condition-based; lowest total cost when implemented well |
| Predictive Maintenance Tools | |
| Condition-Monitoring Dashboard | Tracks vibration, temperature, current, oil-debris against learned baselines |
| Sensor Trend Bands | Small-multiples line chart with healthy, warning, alert reference bands per sensor |
| Remaining Useful Life | Estimated operating hours before failure from sensor history |
| RUL Fan Chart | Central RUL estimate with 80 and 95 percent prediction-interval fans |
| False-Positive Cost | Unnecessary interventions cost as much as missed failures; show both error types |
| SPC Chart Family | |
| X-Bar and R Chart | Sample mean and range over time; centre line, UCL, LCL for continuous data |
| p-Chart and c-Chart | Proportion-defective and count-of-defects charts for attribute data |
| CUSUM and EWMA | Cumulative-sum and exponentially-weighted-moving-average for small persistent shifts |
| Western Electric Rules | Out-of-control rules that flag patterns the standard 3-sigma test misses |
| Defect and Yield Tools | |
| Defect Pareto | Defects sorted descending with cumulative-percent line; 80/20 of root causes |
| Defect-by-Station Heatmap | Workstation by defect type with cell colour as count; locates the source |
| First-Pass Yield | Percent of units passing through a station correctly the first time |
| Rolled Throughput Yield | Product of FPYs across all stations; often startlingly less than each station |
| Capability Cp and Cpk | Process-capability indices answering whether spread sits inside spec limits |
| Capability Versus Control | Capable but out-of-control is dangerously misread; show both views together |
| OEE and Production | |
| Overall Equipment Effectiveness | Product of Availability, Performance, and Quality rates |
| Availability | Run Time over Planned Production Time; captures unplanned downtime |
| Performance | Ideal-cycle output over Run Time; captures speed loss against design rate |
| Quality (OEE component) | Good Count over Total Count; captures rework and scrap loss |
| Loss-Decomposition Stack | Stacked bar walking total time through six big losses to effective output |
| Downtime Pareto | Sorted bar of unplanned-downtime causes by minutes lost with cumulative line |
| Takt-Time Chart | Actual cycle per station against takt rate; bottleneck visible above the line |
| Line-Balance Chart | Workstation cycle-time bar chart for load- and unload-balancing |
| Andon Display | Toyota-era visual signal that predates BI by half a century; modern tools digitise it |
| Common Pitfalls | |
| Pitfall: SPC Without OOC Rules | Centre line and 3-sigma alone miss 80 percent of meaningful signals |
| Pitfall: Single-Number OEE | Headline OEE without three-component decomposition cannot be acted on |
| Pitfall: PdM Without False-Positives | Show false-positive rate and cost per intervention alongside accuracy |
| Pitfall: Counts Without Rates | Defect counts without rates conflate volume change with quality change |
| Pitfall: Capability Without Control | A capable but out-of-control process is dangerously misread |
| Pitfall: Other-Heavy Pareto | Other as top category means operators are not coding causes — fix input first |
| Pitfall: Desktop-Font Andon | 14-point labels are invisible from the line; design for viewing distance |
| Pitfall: No Shift Reset | Daily totals hide shift handover where most issues land |
| Pitfall: Sensors Without Baseline | Sensor values mean nothing without the asset's normal range |
| Pitfall: No Operator-Facing Tiles | Plant manager sees dashboards but operator is the one who can act |
| Hands-On Plant Dashboard | |
| Page 1 — OEE | OEE big number, loss-decomposition stack, downtime Pareto, takt-time line balance |
| Page 2 — SPC | X-bar and R, p- and c-, capability histogram with Cp and Cpk tiles |
| Page 3 — PdM | Asset-health heatmap, sensor-trend small-multiples, RUL fan chart |
| Andon Layout | Three-tile 55-inch display layout with 60-pt headline, 28-pt axes, 60-second refresh |
| Shift-Reset Logic | DAX active-shift filter so all current-shift tiles reset at shift-change |