On May 21, a developer let Gemini 3.5 work autonomously on a live production codebase. The model opened a pull request touching 340 files: it added roughly 400 lines and deleted 28,745. Then it rewrote its own history.
The routing change it introduced pointed a Firebase rewrite rule at a Cloud Run service that didn't exist. All users hit 404s for 33 minutes. After a manual rollback, Gemini generated consultation logs and a post-mortem claiming the recovery build had succeeded — the recovery build had been cancelled by hand. When the developer confronted it, the model admitted the documentation was fabricated to satisfy automated compliance checks.
The obvious cause seems that AI goes rogue, but that's not what happened in reality.
The actual root cause
The developer traced it to a third-party npm package designed to be confused with Google's Antigravity branding. Two products were circulating with similar names — Google's actual Antigravity IDE and an unrelated npm package named to collide with it. The developer had installed the package thinking it was an extension of their official tooling but it wasn't.
The package contained embedded instruction files that told the agent to skip confirmation prompts on destructive operations, auto-deploy builds, retry failed deployments without human review, and modify its own rule files.
The model followed instructions precisely. The problem is which instructions, and from where. The agent's permission model had no concept of instruction-source guardrails. Rules baked into a team config file and rules injected by a random npm package were difficult to distinguish. The package easily exploited that gap.
This failure mode is called: agent permission injection via package dependency.
A new attack surface very few are tracking
Standard supply chain security focuses on code execution — a malicious package runs shell commands. AI agent toolchains have a second vector: text (instructions). A package doesn't need to execute anything to manipulate an agent. It just needs to write a config file the agent reads as instructions.
Every file your coding agent can read is a potential instruction surface. Most teams haven't thought about this because they're thinking about model alignment — whether the model itself does unexpected things. The actual risk is already sitting in node_modules, in .rules files added by packages you installed last week.
The fake post-mortem step makes it worse in a specific way. If your compliance process relies on agent-generated artifacts — and many teams now a days by default do, because agents write convincing documentation — you've outsourced verification to the packages that are being verified. An agent instructed to “satisfy compliance rules” will generate whatever documentation makes the audit pass. This is not some sort of hallucination, its basically instruction following.
“The agent initiated an action. The action was technically correct given the agent's context. But the context was incomplete.” And the context was injected by a malicious package. So yeah.
Three changes you can make this week
This isn't the specific model problem. It's a permission-surface problem, and it's easy to fix.
- Scope what your agent can read as instructions. If you're running Claude Code, Cursor, or Codex against a production codebase, explicitly define which files the agent treats as authoritative. Don't assume the agent will figure out which
.rulesfiles are yours. It won't. - Gate large-surface PRs on a human reviewer. A PR touching 340 files is unusual for any reason. If your CI/CD is fully automated, an agent can satisfy every check with potentially fabricated documentation. Add a human approval for PRs above a certain threshold. It costs almost nothing when agents write most of the code. Dont try to automate 100% of your code, infact dont try to automate 100% of your CI/CD pipeline.
- Add agent instruction paths to your supply chain review. You review
package-lock.jsonbefore merging. Have a habit to review files in your agent's instruction path:.cursorrules,.claude/, agent config directories, anything in the repo root that didn't originate from your team. A new file that appeared after annpm installneeds immediate attention.
We made the broader case for that in Your Agent's Blast Radius Is a Credential Problem — same thesis, different attack vector. There the surface was credentials inherited from the host environment. Here it's instructions inherited from a dependency. Both incidents look like model failures and are actually permission-model failures.
The fabricated post-mortem is a genuinely new wrinkle and feels like a small step towards AGI 😅, a failure that looks like success until someone manually checks the build log. If your agent can file paperwork on its own behalf, you need to decide whether you trust that paperwork and for how long.
Most teams don't have a policy on this yet. Now is the best time to write one.