
Cloud inference costs roughly $0.003 per 1,000 predictions on a standard ML inference endpoint. For a manufacturing facility monitoring 50 assets at 50ms inference cycles, that is 86.4 million predictions per day — approximately $259 per day in cloud inference costs alone, before storage, egress, and data transfer fees. But the cost argument is actually secondary. The primary reasons to choose edge inference for predictive maintenance are latency, connectivity reliability, and security policy, not per-prediction economics.
The Latency Gap Between OT and Cloud
Operational technology networks in manufacturing facilities are not designed for low-latency cloud connectivity. The OT network is typically isolated from the internet by a demilitarized zone (DMZ) with firewall rules that the plant security team controls. Traffic from the OT network to a cloud ML endpoint passes through the DMZ, crosses the corporate IT network, exits to the internet, reaches the cloud, executes inference, and returns — a round-trip that is rarely under 80ms and often exceeds 200ms when the corporate network is congested.
EdgeRun's edge gateway executes the anomaly detection model locally at 50ms cycles, with end-to-end latency (sensor sample to anomaly score) under 100ms. For a bearing failure that generates a sharp acoustic emission transient lasting 0.5 seconds, a 200ms cloud round-trip means the event may not produce an alert until it has already passed. Edge inference processes the event in real time against the local model. The difference is not academic when the failure mode produces sharp, short-duration transients rather than gradual degradation trends.
Connectivity Reliability in Industrial Environments
Cloud inference requires reliable network connectivity. Manufacturing environments are not always reliable network environments. Welding operations generate electromagnetic interference that disrupts Wi-Fi. Crane movements can cause cable strain events on Ethernet runs. Plant-wide power events can disrupt network switches. A cloud-dependent monitoring system that loses connectivity stops generating anomaly scores — exactly when an equipment problem causing power quality issues might be developing.
Edge inference runs regardless of cloud connectivity status. The EdgeRun gateway stores anomaly history locally and syncs to the cloud dashboard when connectivity is restored. This means the 7-day rolling anomaly trend is always current even if the cloud dashboard was offline for several hours. For a system that is supposed to warn before failures occur, availability during OT network disruptions is a functional requirement, not a feature.
OT Security Policy and Data Sovereignty
Many manufacturers' OT security policies prohibit continuous transmission of operational data to external cloud endpoints. The operational data from a manufacturing plant — machine cycle rates, production throughput, equipment load profiles — is competitively sensitive. A plant running at 95% capacity on a particular production line is sharing information about its output capability with whoever operates the cloud endpoint that receives its sensor data. Some customers' legal and security teams will not approve this, regardless of what the data processing agreement says.
Edge inference means that raw sensor waveforms never leave the factory network. The only data transmitted to the EdgeRun cloud dashboard is the anomaly score, the alert status, and the asset health index — derived metrics that do not expose raw operational parameters. The raw vibration waveform stays on the edge gateway, available for local forensic analysis but not transmitted continuously. This architecture satisfies the OT security policies we have encountered at regulated facilities and those with competitive sensitivity about their manufacturing operations.
Model Update Frequency: Edge Changes the Economics
A common argument for cloud inference is that model updates are easier to manage centrally. This is true. It is also less important than it sounds. Anomaly detection models for condition monitoring do not need frequent retraining. The variational autoencoder trained during the 14-day baseline calibration period is valid for the asset's lifetime unless the operating conditions change substantially (new load profile, equipment modification, replacement with a different component). Retraining happens on a trigger — maintenance event, operating condition change, false positive investigation — not on a schedule.
EdgeRun's OTA (over-the-air) model update mechanism pushes signed model artifacts to the edge gateway during scheduled maintenance windows. The update process is atomic — the new model is staged and validated before the active model is replaced. If validation fails, the gateway rolls back to the previous model automatically. This happens once every several months per asset, not continuously. The operational complexity of managing model updates at the edge is manageable at that frequency.
Bandwidth and Data Volume
A single vibration sensor sampling at 25.6 kHz produces approximately 50 KB of raw data per second. For a plant with 50 sensors, continuous transmission of raw waveforms to a cloud inference endpoint requires 2.5 MB/s of sustained OT-to-cloud bandwidth. Many manufacturing OT networks are not designed for that sustained load, particularly when the OT network is separated from the IT network by a single firewall appliance that handles all inter-zone traffic.
Compressing the raw waveform before transmission helps but does not eliminate the problem. Vibration data compresses modestly because it contains high-frequency content that is difficult to compress without losing the frequency components of interest. A 50% compression ratio (achievable with lossless algorithms) halves the bandwidth requirement to 1.25 MB/s — still a significant sustained load on a network that was designed for PLC command traffic and occasional historian uploads.
Where Cloud Inference Makes Sense
Cloud inference is the right choice when the sensor count is low, the network is reliable, the plant security team approves external data transmission, and the failure modes produce gradual degradation rather than sharp transient events. It is also the right choice when the primary value is fleet-level analytics across many facilities — comparing bearing degradation rates across five plants in a fleet requires centralizing the data, and cloud is the natural home for that workload.
The architecture that many large manufacturing companies end up with is a hybrid: edge inference for real-time anomaly detection and alert generation, cloud aggregation for fleet analytics, predictive model benchmarking across sites, and long-term trend storage. The edge layer provides the reliability and latency characteristics needed for operational alerting; the cloud layer provides the data aggregation and analytics depth needed for long-term maintenance program optimization.
Total Cost of Ownership: The Honest Comparison
Edge inference hardware has an upfront cost that cloud inference does not. The EdgeRun ER-200 gateway costs $1,200 per unit and monitors up to 16 sensors per gateway. For a 50-sensor installation, three gateways are needed — $3,600 in hardware capital cost. The equivalent cloud inference subscription for 50 assets at continuous monitoring rates runs approximately $280/month — meaning the hardware pays for itself in raw inference cost terms in about 13 months, before accounting for network engineering costs and bandwidth fees.
That 13-month break-even calculation ignores the operational value of reliability: a cloud inference system that drops connectivity for 6 hours during a critical bearing failure event has a very different cost profile than that hourly rate suggests. The correct total cost of ownership comparison accounts for the expected cost of missed detections during connectivity outages, which is difficult to quantify but is not zero.
See the edge architecture in your facility
Book a demo to walk through deployment architecture for your specific OT environment.
Request a Demo