BLOG · BUILDING VIBEDRIFT

Your AI-written codebase is drifting. Here's how to measure it.

April 14, 2026 · ~8 min read

The bug you can't grep for

I was going through my handlers last month. All written with Claude and Cursor over a few weeks. Something felt wrong but I couldn't name it.

Then I looked closer.

userHandler.ts called requireAuth(req) on line 3, validated input against a schema, and threw a typed NotFoundError if the record was missing. Clean. Intentional. Consistent with every other handler in the project.

orderHandler.ts, written in a different session a week later, had no auth check. No input validation. And instead of throwing a typed error, it returned { status: 404, error: 'not found' } as a plain object.

Both compiled. Both passed tests. Both passed every linter. But one handler followed three behavioral conventions that the other completely ignored. Not because I changed my mind. Because the AI started a fresh session and made different decisions about how the application should behave.

That is drift.

Drift is not what you think it is

Most people hear "code drift" and think duplicated functions or inconsistent naming. Those are symptoms. Drift is something deeper.

Drift is the behavioral deviation between what your codebase intends to do and what the AI actually introduced. It is the delta between the established workflow of your application and the assumptions the AI made in a session where it had no memory of that workflow.

When a human developer joins a team, they read the existing code, absorb the patterns, and follow them. When an AI coding tool starts a new session, it has zero memory of what was established before. It doesn't know your project uses typed errors. It doesn't know every handler validates input. It doesn't know auth is mandatory on every route. So it makes reasonable but different choices. And those choices quietly contradict the behavioral contract your codebase has been building.

This shows up as:

+Architectural contradiction. 7 services use repository pattern. The AI introduces raw SQL in the 8th because it seemed simpler for that query. Both work. The architecture now has two competing philosophies.
+Hallucinated workflows. You ask for a GET endpoint. The AI scaffolds full CRUD with create, update, and delete handlers that are never routed, never tested, and never called. Dead weight that looks intentional.
+Wrong assumptions. The AI assumes your API returns { data, error }shaped responses because that's common in its training data. Your API actually returns { result, statusCode }. The new handler works but its response shape contradicts every other handler.
+Security inconsistency. Auth middleware applied on 8 of 10 routes. The AI built 2 routes in a session where the auth pattern wasn't in context. Those 2 routes are now exposed in production.
+Convention fracture. camelCase in 14 files because that's what session 1 established. snake_case in 2 files because session 12 didn't know about session 1.

None of these are syntax errors. None are bugs in the traditional sense. The application works. But its behavior is no longer internally coherent. The codebase has stopped agreeing with itself about how things should work.

Why nothing catches this today

Linters check syntax against predefined rules. They don't know what your codebase's behavioral patterns are.

PR review bots analyze diffs. They see what changed in a single commit. They don't compare a new file against the 50 files that came before it.

Complexity analyzers count branches and nesting. They measure how complicated code is, not whether it contradicts its neighbors.

All of these tools evaluate files in isolation. Not one of them asks the question that actually matters: does this file's behavior contradict the behavioral contract established by the rest of the project?

That is the gap.

VibeDrift measures drift

VibeDrift reads your entire project, builds a behavioral profile of your codebase, identifies the dominant patterns and workflows, and measures the deviation of every file from that established intent.

It doesn't enforce external rules. It discovers the rules your code already follows and finds where those rules break.

Five detectors analyze five dimensions of behavioral consistency:

Architectural consistency , are files solving the same category of problem in the same way?
Security posture, is auth, validation, and rate limiting applied uniformly?
Redundancy, are there hallucinated workflows, phantom scaffolding, or duplicate logic the AI generated without knowing it already existed?
Convention adherence , do naming, imports, error shapes, and async patterns stay consistent?
Scaffolding hygiene , is there generated code that exists but serves no purpose?

The output is a composite score from 0 to 100 representing how behaviorally coherent your codebase is. Not how "clean" it is. Not how "complex" it is. How much it agrees with itself.

Every finding shows the dominant behavior, the deviating files, and a targeted fix.

30 seconds to your score

npx @vibedrift/cli .

No install. No signup. No config. Runs locally, and your code never leaves your machine. After each scan the CLI sends a small anonymous usage beacon (no code, no file paths, no identifiers), on by default for everyone; turn it off with vibedrift telemetry disable or run --local-only for a fully offline scan.

Scanning 55 files · 6,151 LOC · TypeScript
✓ Static analysis .............. 0.8s
✓ Cross-file drift ............. 0.4s
✓ Code DNA ..................... 0.03s

58/100 · Grade D · 7 findings
Report: ./vibedrift-report.html

The report gives you your score, a breakdown across all five categories, and every finding with the file, the line, the dominant pattern, and what to change.

Deep scan, AI that understands behavior

The free scan uses static analysis and structural fingerprinting to catch drift that's visible in code patterns. It catches roughly 70% of issues.

The deep scan adds AI that understands what your code actually does, not just how it looks:

vibedrift . --deep

+Semantic analysis finds functions that look completely different but behave identically
+Intent mismatch detection catches functions whose names promise one thing but whose bodies do another
+Behavioral anomaly detection identifies files whose workflow doesn't fit any established pattern in your project

Function snippets only are sent for analysis. Never full files. Never git history. Processed in memory, never stored.

Every account gets 1 free deep scan a month, no card required.

What I found in my own code

Project	Files	Score	Grade	Top Issue
acme-api	55	42	D	3 competing data access patterns
mixstream-web	34	58	D	auth missing on 2 admin routes
vibelang-stdlib	44	68	C	hallucinated CRUD on 3 endpoints

Every project worked. Tests passed. Users were fine. But under the surface, the AI had been making contradictory behavioral decisions for weeks without anyone noticing.

CI/CD, catch drift before it merges

name: VibeDrift
on: [pull_request]

jobs:
  drift-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npx @vibedrift/cli . --json --fail-on-score 70
        env:
          VIBEDRIFT_TOKEN: ${{ secrets.VIBEDRIFT_TOKEN }}

--fail-on-score 70 blocks the merge if behavioral coherence drops below your threshold.

The bigger problem

VibeDrift measures drift that already exists. But the deeper question is: why can't code express intent in the first place?

A CLAUDE.md or .cursorrules file can say "use repository pattern." But those are guidelines, not enforcement. Across teams, tools, and months of development, they erode.

That's why I'm also building VibeLang , a language where behavioral intent is a compiler-enforced construct. The AI can't deviate because the language won't compile code that contradicts the declared architecture.

VibeDrift diagnoses. VibeLang prevents. But that's a story for another post.

Run it

npx @vibedrift/cli .

Website · npm

Drop your score in the comments. If VibeDrift flags something it shouldn't, tell me that too. It makes the tool better.

Local scans are free and open source, forever. The free tier includes 1 deep scan a month; Pro is $15/mo for 12, and you can top up 5 more for $10 on any plan. Credits never expire.