Agents Make Homelab Alerts Boring

A muted bronze alarm bell glowing softly amber, sealed under a glass dome on a dark server rack with faint blue status lights and rising steam. A small handwritten tag hangs from the dome's base. Title text: Agents Make Homelab Alerts Boring. Site identifier: MATGRETEN.DEV.

I got a storage alert for Frigate this morning. A real one. The host was filling up the disk where Frigate stores camera recordings, and the alert came through the same self-hosted NTFY stack I already use for the rest of my homelab.

The interesting part is not that I got an alert. The interesting part is how little of my morning it consumed.

A few months ago this would have been a very different kind of interruption. I would have SSHed into the box, checked disk usage, looked at Frigate retention, tried to remember where the Docker config lived, figured out whether this was a one-time spike or a real trend, and then probably left myself a note to make the alert better later.

Instead, I gave Amp a prompt.

It already knew my homelab. It knew Guardian is the machine running Frigate. It knew how to get there through Tailscale. It knew that my repo has skills describing the homelab topology, main homelab desktop versus Guardian's standalone hosting of Frigate, Docker operations, Frigate config management, and the Swamp models already declared in the repository. It knew there was an ntfy-homelab model instance using my already-published @mgreten/ntfy-notify extension.

So the prompt did not have to explain the world. It only had to explain the problem.

I sent it, walked outside to be with my kids before work, and came back to a concrete set of suggestions. I made a few tweaks, told it to do the work, and now both the alert and the underlying Frigate retention policy are better than they were this morning.

That's the part that still feels wild.

Reusing the Thing I Already Published

The win here was not a brand-new bespoke script. The win was that the NTFY part was already a model.

I had already published @mgreten/ntfy-notify, and this homelab repo already had an ntfy-homelab instance pointed at my self-hosted NTFY server:

type: '@mgreten/ntfy-notify'
name: ntfy-homelab
globalArguments:
  ntfyUrl: 'https://ntfy.tail001dd.ts.net'
  defaultTopic: camera-alerts-mat

That meant the new work could stay focused on the actual domain problem: Guardian's Frigate storage.

The local extension model is @homelab/guardian-storage-alert. It checks /srv/frigate, stores a typed storage_check resource, and only notifies when the disk crosses the warning or critical threshold. The workflow is small because the model does the work:

name: guardian-frigate-disk-alert
steps:
  - name: check-disk
    task:
      type: model_method
      modelIdOrName: guardian-storage-alert
      methodName: check

That is the direction I want more of my automation to go. Not a pile of shell scripts that only I understand or understood at one time. A declared model. A method. A resource. A workflow. Something an agent can inspect without me reconstructing all of the context from memory.

The Dynamic Part Matters

The first version of an alert like this is usually dumb in a useful way:

Disk is 75% full.

That's better than nothing. But it still leaves the real question unanswered: do I need to drop everything right now?

This morning's work extended the model so the alert is not just threshold-based. It reads the previous storage_check resource, compares it to the current reading, computes the net fill rate, and includes the projected time until full when the trend is meaningful.

So the notification can say something closer to:

Warning: Frigate Storage 91% Full
1.2 TiB free of 14.5 TiB on /srv/frigate
Filling at 38.4 GiB/day → ~31.2 days until full.

Or, if the disk is stable or shrinking, it can say that too.

That little bit of state changes the human response. A raw percentage creates anxiety. A rate and a projection create judgment. If Frigate is filling at a pace that gives me weeks, I can handle it after work. If it is filling at a pace that gives me hours, that is a different morning.

And because Swamp stores the previous check as data, the model does not need a separate database, a cron-specific temp file, or some random JSON blob under /tmp. The latest model artifact is the state.

Fixing the Problem Too

The other important detail is that we did not only make the alert smarter.

We also talked through the Frigate camera retention settings. That was the actual source of the pressure. The notification was telling me the disk was filling up, but the real question was whether the retention policy still matched what I wanted from the system.

That meant tradeoffs, not just commands.

How much continuous recording do I actually need? Which cameras are worth keeping longer because they cover the driveway, shop doors, or other high-value areas? How much do I care about alerts versus detections versus motion? Is 90 days of certain event history worth the storage cost? Where am I keeping data because it is useful, and where am I keeping it because I copied an old default and never came back?

Those are exactly the kinds of decisions that are annoying to make in a hurry. They require enough context to be careful, but not enough novelty to be interesting. Amp could inspect the Frigate template, summarize the current retention shape, explain the tradeoffs, and propose a smaller policy that still kept the parts I actually value.

So the morning was not just "make the alert less dumb." It was "understand why the alert fired, adjust the system causing the alert, and make the next alert carry more useful urgency information."

That is a much better loop.

Why Agents Help Here

The asynchronous part is the real story.

I am in one of the busiest seasons of my life. Work, four kids, family rhythms, all the normal stuff that makes uninterrupted technical headspace expensive. A small infrastructure problem like this is exactly the kind of thing that can steal more attention than it deserves.

To be clear, agents are not free. They create more for me to review, and sometimes that review pile is genuinely hard to keep up with. More drafts, more diffs, more suggestions, more half-finished threads waiting for me to decide whether they are good or just plausible. At work especially, that can become its own kind of pressure.

But for personal projects and the homelab, the tradeoff is different. The async benefits land harder because the alternative is often not "I do this carefully right now." The alternative is "I don't get to this for three weeks." A review pile is still a pile, but a reviewable draft of a fix is much better than a forgotten alert.

Not because the task is hard. Because it requires reloading context.

Where does Frigate run again? Which host has the storage? What is the Tailscale name? Where is the compose file? Do we manage this through Docker directly, through Swamp, through a workflow, through a custom extension? What did I change last time?

That context is now mostly text. Skills describe the machines. Models declare the integrations. Workflows describe the automation. The thread records the reasoning. Amp can navigate that faster than I can rebuild it from memory while also trying to get out the door.

So I can interact in little nudges:

I got this alert. Look at the current Frigate storage alert setup on Guardian.
Tell me what it is doing, what it should do better, and whether we can make it dynamic.

Then later:

Yes, do that. Keep it using the existing NTFY model path where it makes sense.

That is a different kind of computing rhythm. Not sitting down for a two-hour maintenance session. More like delegating a bounded investigation, checking the plan, talking through the retention tradeoffs, and letting the agent execute inside guardrails that already exist.

Trust, But With Parameters

I do not mean I blindly trust this stuff.

That is not the lesson. The lesson is that I can increasingly trust agents with bounded work when the boundaries are explicit. Guardian is declared. Frigate is declared. NTFY is declared. The storage alert model has typed arguments and typed output. The workflow has one job. The alert thresholds are visible. The model writes a resource I can inspect later.

This is the difference between "go fix my server" and "inspect this specific Frigate retention config, explain the tradeoffs, adjust the template, extend this Swamp model so its warning message uses the previous resource to calculate fill rate, then verify it."

One is reckless. The other is reviewable.

And reviewable work is where agents are getting genuinely useful. I can double-check the shape of the work instead of personally holding every command in my head. I can see the model diff. I can see the workflow. I can run the check. If something feels off, I nudge it.

The agent does not need perfect autonomy to be valuable. It needs enough context to make a good first pass, enough structure to stay inside the rails, and enough artifact output for me to verify the result.

The Boring Ending Is the Point

The end state is not glamorous.

Frigate still records cameras. Guardian still has a disk that needs watching. NTFY still sends me notifications. Swamp still runs a small workflow. Amp still needs me to read and approve things. The retention policy still reflects choices I made, not magic.

But the alert went from "a problem I need to make time to think about" to "a thing I can investigate, tune, and improve asynchronously while living my actual morning."

That is a big deal.

I keep coming back to this: agents are not only about doing more work faster. They are about preserving headspace. They let me leave a text thread, go be with my kids, come back, and re-enter the problem without starting over. The work is still mine. The responsibility is still mine. But the context loss is smaller.

For homelab maintenance, that might be the difference between an alert becoming another neglected TODO and an alert becoming a better system before breakfast.