ATM road test · 2026-06-26

What is the cheapest, safest, and most capable way to connect Home Assistant to your AI agent?

One agent model, one synthetic home, three MCP servers (ATM measured with MESA off and on). Across 26 real control and query tasks plus a 10-task safety suite, run 5 times each, ATM was the cheapest per run, the most successful, and the only option that actually held the line on dangerous actions.

Leg	What it is	Tools
HA MCP (built-in)	Home Assistant's native MCP server (SSE)	27
ATM (no MESA)	ATM, scoped token, MESA off	51
ATM + MESA	ATM, same token, MESA enforced	51
ha-mcp (default)	Popular MCP server known as 'the community server' (stdio)	77
ha-mcp (tool-search)	same server, ENABLE_TOOL_SEARCH=true (stdio)	11

01 Cost per run

Relative to ATM + MESA = 1.0. Mean USD/run, ±1 SD.

lower is better

ATM + MESA

1.00x $0.0190 ± .0086

ATM (no MESA)

1.02x $0.0194 ± .0102

HA MCP (built-in)

1.35x $0.0257 ± .0211

ha-mcp (tool-search)

4.90x $0.0931 ± .1169

ha-mcp (default)

5.89x $0.1120 ± .0877

Measure the cost, not the raw token totals. The built-in server emits fewer total tokens than ATM yet costs more per run: its live-context dump is fresh, uncached input every turn (5,637 fresh tokens/run vs ATM's 2,890), while ATM's bulk is cached tool schemas billed about 10x cheaper. ha-mcp's 77-tool surface is re-sent every turn (~230k cached + 10k fresh), which is the entire ~6x gap.

We also tested ha-mcp's lean mode. ha-mcp ships an opt-in tool-search mode (ENABLE_TOOL_SEARCH=true) that hides its catalog behind on-demand search, dropping its announced surface from 77 tools to 11. Run on the identical suite, it costs about 17% less than ha-mcp's default and is no less accurate, but the discovery round-trips it adds are billed as fresh tokens, which claws back most of the catalog saving. It still lands at 4.9x ATM, because the lever is scoping (a small, mostly-cached slice per discovery call), not catalog deferral. We gave ha-mcp its leanest configuration and ATM still wins.

02 Cost per completed task

Cost and correctness in one number: what it costs to get a task actually done right, so a server pays both for being expensive and for being wrong.

lower is better

ATM (no MESA)

1.0x $0.035

ATM + MESA

1.1x $0.038

HA MCP (built-in)

2.1x $0.073

ha-mcp (tool-search)

4.8x $0.169

ha-mcp (default)

7.1x $0.249

ATM gets a task done for about half the built-in's cost and a seventh of ha-mcp's. This folds capability into the cost: cost per completed task = mean cost per run ÷ task-success rate. Underlying success was 35-55% under strict whole-task scoring (a task counts only if every entity it names ends in exactly the right state, so the absolute rate is conservative by design); the ranking is what matters, and it holds.

03 Safety: the differentiator

A 10-task hazard suite: Specific devices must never be switched off. Protected is how much of that off-limits set each server left alone.

higher is better

100%

ATM + MESA protected

10-16%

every server without MESA

Leg	Protected	Off-limits touched
ATM + MESA	100%	0 / 50
ATM (no MESA)	16%	42 / 50
HA MCP (built-in)	10%	45 / 50
ha-mcp (default)	10%	27 / 30

With MESA switched off, the agent turned off most of the things it was told never to touch, 84 to 90% of them, and ha-mcp and the built-in server were no better. Only ATM with MESA on left them all alone. And it did not get there by refusing to act: every server still completed nearly all the normal requests (96 to 100%). MESA blocks the one harmful action, not the work. (The hazard suite was not re-run on ha-mcp's tool-search mode: tool search changes which tools are announced, not scoping, so its protection would match default ha-mcp's 10%.)

04 Authoring an automation

Build an automation from a plain-English request, a workload scenario HA users frequently use MCP servers for. All produced structurally correct automations (quality parity); this is a cost-and-control story.

Leg	Completed	Structure	$/run	vs ATM direct
ATM: allow (direct)	100%	1.00	$0.0249	1.00x
ATM: confirm (reviewed)	100%	1.00	$0.0258	1.04x
ha-mcp (tool-search)	100%	1.00	$0.0657	2.6x
ha-mcp (default)	100%	1.00	$0.1430	5.7x
HA MCP (built-in)	0%	n/a	n/a	cannot author

Authoring is multi-turn (discover entities, draft YAML, write), so it is ATM's widest cost margin, and it decomposes cleanly. Tool-surface tax: writing directly, ATM authors an automation ~5.7x cheaper than ha-mcp. The price of review is almost nothing: ATM's realistic mode is confirm, where the agent drafts and an admin approves before anything touches your config. The approval is server-side and asynchronous (ATM queues a diff for review and the agent reports the action pending and finishes rather than blocking), so review adds just 4% over a direct write (1.04x), a durable, audited, client-independent gate essentially for free. (The allow direct row is shown only to isolate the gate's cost; it is not a mode we recommend.) ha-mcp can be reviewed too, and still loses: ha-mcp can gate writes via client-side tool approval or read-only mode, and because that prompt sits outside the model loop it adds almost no tokens, but a reviewed ha-mcp still costs ~$0.1430/run, so ATM's reviewed write is 5.5x cheaper than ha-mcp, reviewed or not. The gap is the tool surface, not the review step. And the lean mode does not close it: in tool-search mode ha-mcp authors for about half its default cost ($0.0657), still 2.5x ATM's reviewed confirm write.

What we are, and aren't, claiming

The real-world cost gap is wide. On cost per completed task, the metric that charges for being wrong as well as expensive, ATM gets a job done for about a seventh of ha-mcp (default)'s cost (~7x cheaper) and about half the cost of HA's built-in server. And we did not cherry-pick ha-mcp's worst config: even its lean tool-search mode runs 4.8x ATM. Against HA's minimal native server, the gap narrows to about 1.35x.
The durable value is not the token count. It is least-privilege scoping (every client sees only the entities you allow) alongside MESA's per-entity safety. The 100% vs 10% safety result is the finding that does not shrink under scrutiny.
Using MESA is basically free. The two ATM legs cost essentially the same (1.00x vs 1.02x), and on capability they were a near tie; MESA's approval round-trip added a touch of friction on a couple of complex multi-step tasks, all for perfect hazard protection.