Case Study

Beam Pump Failure Prediction

ML models trained on historian & SCADA signals to predict rod‑lift failures early and recommend proactive workovers—delivered as real‑time alerts and operator decision support in Ignition.

Oil & Gas Predictive Maintenance Python / ML Ignition OPC UA · MQTT PI / Canary

Executive Snapshot

Objective

Anticipate beam‑pump (rod‑lift) failures days in advance to reduce unplanned downtime, avoid deferred production, and prioritize maintenance.

My Role

IT/OT Solutions Architect & hands‑on ML developer. Led 3‑person team; built data pipelines, feature set, models, and Ignition UI/alerts.

Duration

~6–8 months (pilot → production)

Scope

100–300 wells • multi‑year history • near real‑time scoring (5–15 min)

Architecture

PLC → OPC UA/MQTT → Ignition Gateway → Historian (PI/Canary) → Python models → Alerts/CMMS

scikit‑learn / XGBoost Time‑series features Anomaly detection RBAC · TLS MLOps (retrain)

Outcome

Achieved ~70%+ precision on failure predictions in pilot; alert lead time typically 24–72 hours.
Reduced unplanned downtime on covered wells; improved maintenance prioritization and crew dispatching.
Operator UI in Ignition with risk score, top drivers (feature importance), and recommended actions.

Exact KPI deltas can be plugged in once we confirm production numbers for your site(s).

Problem

Beam‑pump failures (rod/tubing wear, pump‑off conditions, motor faults) caused frequent unplanned downtime and deferred production. Signals indicating degradation (e.g., amperage asymmetry, strokes‑per‑minute drift, pump‑card shape changes) were buried across SCADA tags and historian data—difficult for operators to monitor proactively at scale.

Solution

Built pipelines to aggregate PLC/SCADA data via OPC UA and MQTT into Ignition and the historian (PI/Canary).
Engineered time‑series features from motor current, SPM, runtime, fluid level proxies, temperatures, alarms, and pump‑card stats.
Trained classification models (e.g., XGBoost / Random Forest) and an anomaly model for early deviation detection.
Deployed a Python scoring service in the Ignition Gateway; surfaced risk scores & explanations to operators.
Integrated alert routing (email/SMS) and optional CMMS ticket creation for high‑risk wells.
Hardened with Purdue‑aligned zones, TLS, mutual certs, and RBAC; all changes auditable.

Architecture

Standards: ISA‑95. Security: segmented ICS zones, strict firewall allow‑lists, TLS, cert‑based MQTT, RBAC.

KPIs & Evidence

Metric	Definition	Pilot Result
Prediction Precision	Share of alerts that preceded a true failure/workover within the time window.	~70%+ (site‑specific)
Lead Time	Avg. time between first alert and failure event.	~24–72 hours
Downtime Reduction	Change in unplanned downtime hours on covered wells.	Plug production value here (e.g., 10–20%)
Deferred Production Avoided	Estimated barrels avoided due to proactive intervention.	Range TBD by site economics

We can swap in your actual metrics once you confirm them; structure and copy are ready.

Data audit & label strategy (failure taxonomy, windows, leakage checks).
Feature engineering & model selection; backtest on rolling windows.
Shadow‑mode scoring vs. operator notes/workover logs; threshold tuning.
Production rollout with alerting, CMMS hook, and monthly retrain job.

You: Architecture, ML design, Ignition integration, stakeholder alignment.
Developers (3): Pipelines, model training, dashboards, alerting.
Operations/Production Eng: Label curation, thresholds, action playbooks.

Labels matter: agreed failure taxonomy & time windows avoid data leakage.
Explainability builds trust: feature importances and trend panels in HMI.
Iterative thresholds: start conservative, tighten with feedback.