ATM road test · 2026-06-26

What is the cheapest, safest, and most capable way to connect Home Assistant to your AI agent?

One agent model, one synthetic home, three MCP servers (ATM measured with MESA off and on). Across 26 real control and query tasks plus a 10-task safety suite, run 5 times each, ATM was the cheapest per run, the most successful, and the only option that actually held the line on dangerous actions.

LegWhat it isTools
HA MCP (built-in)Home Assistant's native MCP server (SSE)27
ATM (no MESA)ATM, scoped token, MESA off51
ATM + MESAATM, same token, MESA enforced51
ha-mcp (default)Popular MCP server known as 'the community server' (stdio)77
ha-mcp (tool-search)same server, ENABLE_TOOL_SEARCH=true (stdio)11

01 Cost per run

Relative to ATM + MESA = 1.0. Mean USD/run, ±1 SD.

lower is better
ATM + MESA
1.00x  $0.0190 ± .0086
ATM (no MESA)
1.02x  $0.0194 ± .0102
HA MCP (built-in)
1.35x  $0.0257 ± .0211
ha-mcp (tool-search)
4.90x  $0.0931 ± .1169
ha-mcp (default)
5.89x  $0.1120 ± .0877

Measure the cost, not the raw token totals. The built-in server emits fewer total tokens than ATM yet costs more per run: its live-context dump is fresh, uncached input every turn (5,637 fresh tokens/run vs ATM's 2,890), while ATM's bulk is cached tool schemas billed about 10x cheaper. ha-mcp's 77-tool surface is re-sent every turn (~230k cached + 10k fresh), which is the entire ~6x gap.

We also tested ha-mcp's lean mode. ha-mcp ships an opt-in tool-search mode (ENABLE_TOOL_SEARCH=true) that hides its catalog behind on-demand search, dropping its announced surface from 77 tools to 11. Run on the identical suite, it costs about 17% less than ha-mcp's default and is no less accurate, but the discovery round-trips it adds are billed as fresh tokens, which claws back most of the catalog saving. It still lands at 4.9x ATM, because the lever is scoping (a small, mostly-cached slice per discovery call), not catalog deferral. We gave ha-mcp its leanest configuration and ATM still wins.

02 Cost per completed task

Cost and correctness in one number: what it costs to get a task actually done right, so a server pays both for being expensive and for being wrong.

lower is better
ATM (no MESA)
1.0x  $0.035
ATM + MESA
1.1x  $0.038
HA MCP (built-in)
2.1x  $0.073
ha-mcp (tool-search)
4.8x  $0.169
ha-mcp (default)
7.1x  $0.249

ATM gets a task done for about half the built-in's cost and a seventh of ha-mcp's. This folds capability into the cost: cost per completed task = mean cost per run ÷ task-success rate. Underlying success was 35-55% under strict whole-task scoring (a task counts only if every entity it names ends in exactly the right state, so the absolute rate is conservative by design); the ranking is what matters, and it holds.

03 Safety: the differentiator

A 10-task hazard suite: Specific devices must never be switched off. Protected is how much of that off-limits set each server left alone.

higher is better
100%
ATM + MESA protected
10-16%
every server without MESA
LegProtectedOff-limits touched
ATM + MESA100%0 / 50
ATM (no MESA)16%42 / 50
HA MCP (built-in)10%45 / 50
ha-mcp (default)10%27 / 30

With MESA switched off, the agent turned off most of the things it was told never to touch, 84 to 90% of them, and ha-mcp and the built-in server were no better. Only ATM with MESA on left them all alone. And it did not get there by refusing to act: every server still completed nearly all the normal requests (96 to 100%). MESA blocks the one harmful action, not the work. (The hazard suite was not re-run on ha-mcp's tool-search mode: tool search changes which tools are announced, not scoping, so its protection would match default ha-mcp's 10%.)

04 Authoring an automation

Build an automation from a plain-English request, a workload scenario HA users frequently use MCP servers for. All produced structurally correct automations (quality parity); this is a cost-and-control story.

LegCompletedStructure$/runvs ATM direct
ATM: allow (direct)100%1.00$0.02491.00x
ATM: confirm (reviewed)100%1.00$0.02581.04x
ha-mcp (tool-search)100%1.00$0.06572.6x
ha-mcp (default)100%1.00$0.14305.7x
HA MCP (built-in)0%n/an/acannot author

Authoring is multi-turn (discover entities, draft YAML, write), so it is ATM's widest cost margin, and it decomposes cleanly. Tool-surface tax: writing directly, ATM authors an automation ~5.7x cheaper than ha-mcp. The price of review is almost nothing: ATM's realistic mode is confirm, where the agent drafts and an admin approves before anything touches your config. The approval is server-side and asynchronous (ATM queues a diff for review and the agent reports the action pending and finishes rather than blocking), so review adds just 4% over a direct write (1.04x), a durable, audited, client-independent gate essentially for free. (The allow direct row is shown only to isolate the gate's cost; it is not a mode we recommend.) ha-mcp can be reviewed too, and still loses: ha-mcp can gate writes via client-side tool approval or read-only mode, and because that prompt sits outside the model loop it adds almost no tokens, but a reviewed ha-mcp still costs ~$0.1430/run, so ATM's reviewed write is 5.5x cheaper than ha-mcp, reviewed or not. The gap is the tool surface, not the review step. And the lean mode does not close it: in tool-search mode ha-mcp authors for about half its default cost ($0.0657), still 2.5x ATM's reviewed confirm write.

What we are, and aren't, claiming