Executive summary

A deep dive into building AI copilots for field teams with the right split between on-device, edge, and cloud. We quantify latency budgets, privacy risk, cost models, failure modes, and present a reference architecture and rollout blueprint that actually works in the field.

On-Device, Edge, or Cloud? Architecting AI for Real-World Field Operations

Field technicians don’t care which GPU rendered a vector or where an embedding lives—they care whether the next step arrives before their hand reaches the panel, and whether close-out finishes before they shut the van door.

This playbook explains how to architect the right split of on-device, edge, and cloud AI for real-world field operations—so latency drops, privacy strengthens, and the ROI is provable.

TL;DR

Design around latency budgets, not model fads. Voice UX needs <150 ms round-trip; AR assist needs <50 ms for object overlays; proof-of-work OCR can tolerate 1–5 s in the background.
Process sensitive data on-device by default; offload only what benefits from fleet learning or heavy models.
Engineer failure-tolerant paths with offline-first design, state machines, and resumable queues so jobs don’t die when the cell tower does.
Measure ROI in callbacks avoided and minutes saved — not token counts. The split pays for itself when first-time-fix climbs and close-out shrinks.

The Decision Triangle: Latency, Privacy, Cost

Every AI request in the field hits three constraints:

Latency: How fast must the response be to keep the tech “heads-up”?
Privacy/Regulatory: Can this data leave the device? If yes, must retention be zero?
Cost/Footprint: What compute, network, and battery budgets do we have?

Architecture rule: Place each capability at the lowest tier that meets its latency and privacy requirements without blowing cost.

Table 1 — Typical SLOs & Placement

Capability	Target SLO	Default Placement	Notes
Wake word + VAD	`<30 ms`	On-device	Keeps UX snappy and private
Command parsing / NLU	`80–150 ms`	On-device → Edge fallback	Distill and quantize small model
Step guidance TTS	`80–150 ms`	On-device, cached	Cache prompts and voices
AR object alignment	`16–33 ms/frame`	On-device GPU / Neural	Prefer device NN APIs
Document OCR / proof	`1–5 s`, async	On-device → Edge	Batch and opportunistic upload
Long-form summarization	`1–4 s`	Edge / Cloud	Retain redactions and no-retain modes
Retrieval over KB	`150–500 ms`	Edge, with local cache	Push hot docs to device
Fleet analytics and model training	Minutes → hours	Cloud	De-identified aggregates only

Three-tier AI architecture: device, edge, cloud with data flows.

Latency Budgets You Can Actually Hit

Voice: The Unforgiving Path

Wake → Intent → Response audio must feel instantaneous.
Budget: Wake 10–20 ms + NLU ≤120 ms + TTS ≤80 ms = ≤200 ms end-to-end.

Tactics

Distill a small NLU model, such as 30–100M parameters, with intents and slots aligned to your job grammar.
Pre-compile TTS prompts for static guidance and cache audio on device.
Use audio ring buffers and half-duplex control to avoid barge-in chaos.

Voice pipeline with latency budgets per stage.

AR: The Perception Trap

Pose + alignment need 30–60 FPS for comfort.
Offload only heavy recognition; keep pose tracking local with ARKit, ARCore, NNAPI, or Metal.

Tactics

Use multi-rate pipelines: 60 FPS pose → 10 FPS detect → 1 FPS heavy classify.
Quantize models and prefer device GPUs / NPUs.
Fallback gracefully to static overlays and photo annotations.

AR pipeline showing multi-rate processing.

Privacy First: Data Minimization by Design

Default to on-device for audio and images; ship no-retention modes server-side.
Redaction at source: serials can be preserved, while customer PII is masked before leaving the device.
Immutable audit: hash media with timestamp and step ID; store proofs against job and asset IDs.

Pattern: Local Redact → Remote Summarize

On device: OCR, redact PII, and create a proof bundle with images, readings, and signatures.
Edge / Cloud: Summarize and structure without raw media unless customer policy allows.

Cost Modeling That Survives Finance Review

Token costs are a rounding error relative to truck rolls. Still, predictability matters.

OPEX Model

On-device: One-time model packaging; marginal cost near zero; battery budget matters.
Edge: Predictable per-request cost; cache and dedupe to reduce spikes.
Cloud LLM: Bursty; use tiered QoS with a small model by default and escalation on exception.

Example Monthly Model: 100 Techs

Cost Driver	Estimate
Voice / AR on-device	`$0.00` marginal, battery cost only
Retrieval, edge KV with CDN	`$0.20/user`
Summaries, small model	`3k calls × $0.001 ≈ $3`
Heavy exception calls, cloud	`500 × $0.02 ≈ $10`

Savings Driver: 30% fewer callbacks on 1,000 jobs → $80–120k/month saved.

The Reference Architecture: Deployable

Device: Phone / Headset

Wake / VAD
Small NLU
TTS cache
Pose tracking
OCR-lite
Local Proof Store, encrypted
Event Log: append-only state machine for steps
Resumable sync queue

Edge: Regional

RAG Gateway with KB shards, vector index, and hot-doc prefetch to devices
Summarization / Normalization using small models
Policy Engine for retention and redaction
Signed Media Store, if permitted:
- Pre-signed URLs
- Lifecycle rules
- WORM options

Cloud

Fleet Learning from de-identified traces
Model Registry
A/B rollout
Observability:
- Per-step latency SLOs
- Offline rate
- Failure taxonomy

Event Types

voice.intent
step.enter
step.exit
photo.proof
reading.capture
summary.ready
sync.retry

Reliability Engineering: When the Tower Goes Dark

Patterns

Offline-first UI: Everything important works with zero network.
Idempotent events: Retried events don’t duplicate steps.
Backpressure: Pause nonessential uploads when battery or network are low.
Escalation ladder: Small → medium → large model hops only when needed.

Minimal State Machine

stateDiagram-v2
		[*] --> Idle
		Idle --> Listening : wake
		Listening --> Intent : vad_ok
		Intent --> Step : intent_ok
		Step --> Capture : requires_proof
		Capture --> Step : proof_ok
		Step --> Summarize : job_end
		Summarize --> Sync : local_ok
		Sync --> Idle : remote_ok

		state Sync {
				[*] --> Queue
				Queue --> Retry : net_fail/battery_low
				Retry --> Queue : recover
				Queue --> [*] : ok
		}

Reliability patterns: offline, retry, backpressure, escalation.

Evaluation: Prove It Works and Keeps Working

SLOs

Voice round-trip, p95: ≤200 ms
AR overlay stability: ≥95% of session time
Proof bundle completeness: ≥99% of jobs
Offline successful close-out: ≥90% of offline starts

Field KPIs

Callback rate: ↓ 25–35% within 60–90 days
Close-out time: ↓ 8–15 minutes/job
Warranty approval time: ↓ 30–50%

Experiment Design

A/B by geography or crew.
Pre-register metrics and hold out at least one crew for 8 weeks.
Weekly triage on failure taxonomy:
- Where did it break?
- Why did it break?
- What should be fixed first?

Experiment dashboard for callback reduction and SLOs.

Security & Trust Controls: Operator-Friendly

Zero-retention toggles at tenant and job-type levels.
Admin-visible processing map that clearly shows what stayed local versus what left the device.
Per-step attestations cryptographically signed by device keys.
Customer-facing proof packet with redacted media and readings.

Rollout Blueprint: 90 Days

Phase 0: Week 0 — Choose One Job Type + Baseline

Instrument:
- Callback percentage
- Close-out minutes
- Failure taxonomy
Pick a high-volume, well-defined job, such as PM visits.

Phase 1: Weeks 1–3 — On-Device Voice Core

Ship wake / VAD / NLU / TTS.
Cache scripted steps locally.

Success metric: p95 voice round-trip ≤200 ms.

Phase 2: Weeks 4–6 — Proof Defaults + OCR

Require 2–3 annotated photos plus readings.
Perform local redaction.

Success metric: Proof bundle completeness ≥95%.

Phase 3: Weeks 7–9 — AR & RAG Edge

Pilot overlays on 1–2 confusing components.
Enable hot-doc caches.

Success metric: Callback delta ↓ ≥25% versus baseline crew.

Phase 4: Weeks 10–13 — Observability & A/B

Model registry
Cohort testing
Weekly failure triage

Success metric: Stable deltas that reproduce across crews.

The Executive POV: Why This Pays

Risk: Trust & Privacy

Sending raw field media to generic clouds is a non-starter for many enterprise buyers. On-device first wins deals by default: sensitive data stays local; only derived signals or redacted assets egress.

Speed: Behavior Change

Sub-second guidance, with p95 ≤200 ms, shifts technician behavior immediately. When the “right way” is also the fastest way, adoption sticks without mandates.

Scale: Smart Use of Cloud

Edge and cloud are used surgically for fleet learning, summaries, and cross-site search—not for every keystroke. This lowers variable costs and reduces blast radius.

Return: Hard Dollars

Callback reduction is the cash engine; close-out acceleration and faster warranty approvals compound the benefit. The result is higher gross margin with no additional headcount.

Bottom line: Place intelligence as close to the work as possible. Promote only what benefits from the crowd.

Executive summary emphasizing latency, privacy, ROI.

Appendix A — Quick Architecture Checklist

Voice pipeline p95 ≤200 ms; TTS cached.
Wake / VAD / NLU / TTS tuned for sub-second round-trip; cache hot prompts and synthesis.
On-device redaction; zero-retention pathways.
Blur faces and PII on device; enforce tenant/job-type zero-retention toggles.
Local proof store, encrypted; immutable hashes.
Photo, video, and readings written to an encrypted store; generate content hashes for audit trails.
RAG gateway with device hot-doc cache.
Retrieve only the smallest relevant chunks; cache job-type manuals and SOPs locally for offline use.
Offline-first state machine + resumable queues.
Deterministic job state transitions; queue uploads and retry with backoff when connectivity returns.
Observability: step SLOs, offline rate, failure taxonomy.
Emit per-step latency and accuracy metrics, percentage of time offline, and standardized failure codes.
A/B cohorts; callback and close-out KPIs.
Cohort flags by crew or region; pre-register metrics; track callbacks ↓, close-out minutes ↓, and warranty approvals ↑.

Conclusion & Next Steps

Reducing callbacks and accelerating close-outs is a systems problem, not a single feature bet. The pattern that survives contact with the field is consistent:

Latency: Sub-200 ms voice loops change behavior.
Clarity: AR and visual confirmations remove ambiguity where it actually occurs.
Proof by default: Evidence is captured as a byproduct of doing the work.
Privacy by design: On-device first, zero-retention paths, and auditable flows.
Observability: SLOs and failure taxonomy make improvement compounding.

Adopt one thing this quarter: start with Voice → Guidance → Proof on a single, well-scoped job type.

Baseline, instrument, A/B, and iterate weekly. The 25–35% callback reduction isn’t a moonshot—it’s a byproduct of removing friction and ambiguity where techs feel it most.

Call to action: If you’d like a reference flow, SLO template, or a pilot plan tailored to your fleet, reach out—let’s make first-time fix the norm, not the exception.

On-Device vs. Cloud AI in the Field: A Systems Architecture Playbook for Latency, Privacy, and ROI

On-Device, Edge, or Cloud? Architecting AI for Real-World Field Operations

TL;DR

The Decision Triangle: Latency, Privacy, Cost

Table 1 — Typical SLOs & Placement

Latency Budgets You Can Actually Hit

Voice: The Unforgiving Path

Tactics

AR: The Perception Trap

Tactics

Privacy First: Data Minimization by Design

Pattern: Local Redact → Remote Summarize

Cost Modeling That Survives Finance Review

OPEX Model

Example Monthly Model: 100 Techs

The Reference Architecture: Deployable

Device: Phone / Headset

Edge: Regional

Cloud

Event Types

Reliability Engineering: When the Tower Goes Dark

Patterns

Minimal State Machine

Evaluation: Prove It Works and Keeps Working

SLOs

Field KPIs

Experiment Design

Security & Trust Controls: Operator-Friendly

Rollout Blueprint: 90 Days

Phase 0: Week 0 — Choose One Job Type + Baseline

Phase 1: Weeks 1–3 — On-Device Voice Core

Phase 2: Weeks 4–6 — Proof Defaults + OCR

Phase 3: Weeks 7–9 — AR & RAG Edge

Phase 4: Weeks 10–13 — Observability & A/B

The Executive POV: Why This Pays

Risk: Trust & Privacy

Speed: Behavior Change

Scale: Smart Use of Cloud

Return: Hard Dollars

Appendix A — Quick Architecture Checklist

Conclusion & Next Steps

Further Reading & Tools

Topics in this article

Andrew Jensen

More field AI insights

What is an AI technician assistant?

Field service AI copilot vs. chatbot

Technician adoption checklist for field service AI

How to pilot field service AI on one workflow

Turn the article into a field workflow decision.

Get practical field AI insights from CoSkip.

Ready to test CoSkip on one real field workflow?