How I lost a database and learned to actually use AI

A while back I was working on a personal project, a small homelab setup that collected boiler and heating measurements into a PostgreSQL database. I was using AI to help me with some database migration work, restructuring tables and migrate them to a being partitioned. The AI gave me a sequence of SQL commands, confident and clear. I copied and pasted them into my database editor and ran them. No backups. The data was gone.

It was my fault. Not in a hand-wavy “we’re all responsible” kind of way. I literally ran the commands without properly understanding them. The AI suggested dropping the original table before I’d verified the temp table was intact, and I just did it.

But here’s the thing: I didn’t feel reckless at the time. The session had been going well. The AI suggested a command, I ran it, it worked. Then another, worked again. Each successful step built a kind of momentum, and by the time we got to the destructive operation my guard was completely down. The prior successes had done their job. I was confident we were on the right track.

That’s the pattern worth watching out for. It’s not that AI gives you bad commands (often it doesn’t). It’s that a long run of good ones quietly erodes your scepticism, so when it eventually gives you something that doesn’t quite do what you expected, you’re not in the right headspace to catch it. You’ve been lulled into executing rather than reviewing.

There’s a story that did the rounds recently where a developer claimed a Cursor/Claude agent deleted his company’s production database. Ibrahim Diallo makes a blunt point about it: the AI didn’t do anything. You gave it the access, you ran the process, and somewhere in your architecture there was an endpoint capable of wiping your entire production database. That’s not an AI problem.

He’s right. But I think it’s worth unpacking why people end up there, because it’s not simply carelessness. AI assistants are unnervingly fluent. They produce SQL, shell commands, and migration scripts with the same confident tone regardless of whether the operation is reversible or catastrophic. There’s no hesitation, no flag on the dangerous bit. You have to bring your own scepticism, and after twenty minutes of a productive session where everything has worked exactly as described, that scepticism is the first thing to go.

After losing that data I spent some time thinking about what had actually gone wrong in my process, beyond the obvious “take backups” lesson. The answer wasn’t complicated: I had no structure around what the AI was doing. I was having a conversation, following its lead, and executing whatever it produced. There were no checkpoints, no moments where I stepped back and asked whether what was happening matched what I actually wanted.

Working out a better approach took months, not sessions. It meant experimenting across different projects, paying attention to where AI added real value and where it quietly created more problems than it solved. I don’t think that kind of learning can be shortcut by reading someone else’s process, but I can at least describe what I landed on.

I start by writing a PRD, a short document that captures what needs to happen and why, at a level of abstraction I can actually reason about. I put this together with AI help, but I review it and own it before anything gets built. It’s not a chat transcript, it’s a document I’d be comfortable handing to a colleague. The point is to have something I’ve thought through before I start generating code, so that I have a reference point to check against when things start moving fast.

From the PRD I generate tasks. Small, atomic ones. Each task includes the specific code changes involved, so when the AI picks one up it knows exactly what it’s introducing and modifying. This keeps it focused on a narrow scope rather than trying to hold an entire codebase in context, and it means if something goes wrong I know roughly what were working on, what went wrong and why. Handing it a whole PRD to implement in one go is a reliable way to watch it confidently go sideways halfway through. Smaller tasks mean shorter context, which means fewer hallucinations and less drift.

At every stage (PRD, tasks, code) I’m the one verifying that what’s happening matches what I intended. The AI is generating, I’m reviewing.

For code changes, I commit at logical points as I go using Jujutsu, a version control system that makes frequent checkpointing feel natural rather than ceremonial. Sensible intervals mean that if something goes wrong I’m not scrambling to unpick a large, tangled diff.

For commands (SQL, terminal operations, anything running directly against a live system) the approach is different because you can’t just roll back to a previous commit. Here I slow down and actually read what the AI has given me before running it. If it’s touching a database, I take a backup first. The momentum problem from my opening story is exactly what I’m trying to resist: the temptation to keep executing because the last five commands worked fine. A destructive SQL statement looks just as routine as a SELECT in the flow of a session, and it’s easy to treat it the same way.

There are a few areas I still want to explore. One is sandboxing tooling like Agent Safehouse, which aims to constrain what an AI agent can actually reach. Another is VM-based AI agents, where the entire environment the agent operates in is disposable and isolated. Both feel like they’re attacking the right problem: rather than relying entirely on the human to catch the dangerous command, you limit the blast radius of what can go wrong in the first place.

The other thing I want to look at is database branching. The idea is borrowed from version control: instead of running migrations against the database you care about, you create an isolated copy, run your changes there, and only merge when you’re confident it’s right. Neon offers this as a managed service, using copy-on-write storage so branches are created in seconds without duplicating data. Each branch gets its own connection string, completely isolated from the parent. If my original project had been running on something like that, I could have branched the database before letting the AI anywhere near it and thrown the branch away when it went wrong.

What I’m less clear on is how to get something similar locally. There are a handful of tools in this space. pgsh uses Postgres template databases to let you branch and switch between database states from the command line. Database Lab Engine takes a different approach, using ZFS copy-on-write snapshots to create thin clones of full databases. And at the simpler end, Postgres schemas can act as a kind of lightweight namespacing, letting you create isolated environments within a single database by manipulating the search path, which is useful for testing even if it’s not true branching.

I haven’t tried any of these yet, but the general pattern is appealing: make it cheap to create a disposable copy of your database state so that destructive operations have a clear undo path. That would have saved me a lot of pain.

All of this is really just structure around one idea: the AI doesn’t know which of its suggestions will ruin your afternoon, and it won’t tell you. The fluency is real. The judgement isn’t. Those are different things, and the longer a session goes on, the easier it is to forget that. The shift in how I think about AI is probably the most useful thing to come out of losing that data. I’ve stopped treating it as a collaborator who has my back, and started treating it as a very fast, capable tool that will do exactly what you ask without judgement.