ai-spec — AI-Driven Development Orchestrator

The Problem

Why existing AI tools
aren't enough

Every AI coding tool faces the same structural limitations. ai-spec is designed to address all of them.

🧠

No Project Memory

AI doesn't know your error codes, middleware setup, or i18n constraints. Every conversation starts from zero — like code from a new hire who's never seen your codebase.

🕳️

No Structured Middle Layer

Natural language jumps directly to code. No reviewable, versionable contract in between. Misunderstandings are discovered in code — at high cost.

💥

All-or-nothing Generation

Generate a whole feature at once. One error fails everything. No checkpointing, no resume — failures restart from scratch.

🚪

Generate and Exit

Did tests pass? Any lint errors? Architecture violations? You have to check everything manually after the tool exits.

📉

Experience Lost Every Time

A security bug found in review? AI will repeat it next time. Team engineering knowledge can't systematically constrain AI behavior.

👻

Cross-task Hallucinations

Task B hallucinates function names from Task A — even though both files are in the same PR. Without a shared cache, AI guesses instead of reads.

The Pipeline

Every step, orchestrated

A fully automated 10-step pipeline from idea to reviewed, scored, production-ready code.

[1/10] Context → [2/10] Spec + Tasks → [3/10] Refinement → [3.4/10] Quality Gate → [Gate] Approval → [DSL] Contract → [Git] Worktree → [6/10] Codegen → [7/10] Tests → [8/10] Auto-fix → [9/10] 3-pass Review → [10/10] Harness Eval

[1/10] CONTEXT LOAD

Project-Aware from the Start

Scans routes, schemas, dependencies, middleware, and the project constitution. Every prompt is grounded in your actual codebase — not a generic template.

[2/10] SPEC + TASKS

Structured Spec with Task Decomposition

Generates a human-readable Markdown spec and decomposes it into ordered tasks: data → service → api → view → route → test. One AI call, complete output.

[3/10] REFINEMENT

Interactive Polish with Diff Preview

AI polishes the spec and shows a colored diff. You approve, reject, or request changes. Multiple rounds supported — no code is written until you say so.

[DSL] CONTRACT

Machine-Readable Dual Contract

Extracts a SpecDSL JSON — models, endpoints, behaviors — from the spec. Validated against 9 schema rules. The single source of truth for codegen, tests, and exports.

[6/10] CODEGEN

Task-Layered Generation with File Cache

Generates file-by-file in dependency order. Each completed file's exports are cached and injected into subsequent prompts — eliminating cross-task hallucinations.

[8/10] AUTO-FIX

Error Feedback Loop — Up to 3 Cycles

Runs npm test / lint / tsc, parses errors by file, and sends targeted AI fixes with DSL context. Dependency-sorted repair order maximizes cycle efficiency.

[9/10] 3-PASS REVIEW

Architecture + Implementation + Impact

Pass 1: architecture & spec compliance. Pass 2: implementation correctness & edge cases. Pass 3: blast radius, complexity score, breaking change risk.

[10/10] HARNESS EVAL

Automated Quality Score

Scores on 4 dimensions: compliance (30%) + DSL coverage (25%) + compile (20%) + review (25%). Linked to prompt hash — tracks quality over time with zero AI calls.

Core Features

Everything you need to
ship with confidence

Every feature addresses a real pain point in AI-assisted development.

📜

Project Constitution System

Self-evolving knowledge base (§1–§9) that auto-injects into every prompt. Scans routes, middleware, schema, and conventions on init. Grows smarter with every review via §9 lesson accumulation.

ai-spec init

🎯

Dual-Layer Contract

Human-readable Markdown Spec for engineers to review and align on. Machine-readable SpecDSL JSON for tools to consume. Both versioned, both auditable. Codegen, tests, and exports all share one contract.

Spec + DSL

🔄

Dual Feedback Loops

DSL Gap Loop: detects sparse contracts before codegen and triggers targeted spec enrichment. Review→DSL Loop: structural review issues feed back into the contract — so the next run starts cleaner.

Self-correcting

⚡

VCR Record & Replay

Record real AI responses on first run. Replay them deterministically in subsequent runs — zero API calls, zero cost. Iterate on pipeline logic and UI without burning tokens.

ai-spec create --vcr-record

🛡️

Approval Gate

Human review happens at the right moment: after the spec is clear and the DSL is valid, but before any code is written. Abort means zero disk residue. Proceed means every step has a verified contract to follow.

[Gate] checkpoint

🔁

Fix-History Self-Learning

Every successful import fix is appended to a ledger. On the next codegen run, a "DO NOT REPEAT" section is automatically injected into prompts — preventing the same hallucination from ever reoccurring.

v0.54+ zero-cost learning

↩️

Instant Rollback

Every run gets a unique RunId. Before any file is written, the original content is snapshotted. One command restores your entire repo to pre-run state — precise to the file, precise to the run.

ai-spec restore <runId>

🌐

9 AI Providers

Gemini, Claude, OpenAI, DeepSeek, Qwen, GLM, MiniMax, Doubao, MiMo. Mix and match: use one model for spec generation, another for codegen. Per-run provider override supported.

--provider --codegen-provider

Multi-Repo Workspace

Full-stack in
one command

The only pipeline that wires your backend and frontend together — automatically.

🖥️ Backend — node-express

[W2] Spec + DSL generated

Models, endpoints, behaviors extracted

Code generated + reviewed

DSL contract ready for handoff →

⬇

DSL Contract
5 endpoints
3 models

⬇

injected into
frontend pipeline

🖼️ Frontend — vue / react

[W4] Spec generated with backend contract

HTTP client calls pre-aligned to DSL

Code generated + reviewed

[W5] Cross-stack verifier: 0 phantoms ✔

✔ Cross-Stack Contract Verification (v0.50+)

After frontend generation, the cross-stack verifier scans every API call in the frontend code and checks it against the backend DSL. Phantom routes (hallucinated endpoints), method mismatches, and string-concatenated paths are all detected and reported before you push.

DSL-Derived Artifacts

One contract,
many outputs

The SpecDSL isn't just for codegen — it powers your entire development workflow.

ai-spec export

OpenAPI 3.1.0 Export

DSL → production-ready YAML or JSON. Plug directly into Postman, Swagger UI, or any SDK generator.

openapi.yaml (3.1.0)

Paths, schemas, parameters, responses

--format json · --server <url>

ai-spec mock

Instant Mock Server

DSL → Express mock server + MSW handlers + Vite proxy config. Frontend development without waiting for the backend.

mock/server.js (Express)

mock/handlers.ts (MSW)

--serve --proxy --port 3001

ai-spec types

TypeScript Types

DSL → typed interfaces, request/response types, and API endpoint constants. Shared across frontend and backend.

export interface Model {}

export const API_ENDPOINTS

Request & Response types

ai-spec dashboard

Harness Dashboard

Generate a static HTML quality dashboard. Track harness scores, compliance rates, and review trends across all runs.

Static HTML, no server needed

Score trend charts

Per-run stage breakdown

Live Output

What the pipeline
actually outputs

Every step is visible, every decision is auditable. No black box — you see exactly what's happening, what scored how, and what was fixed automatically.

✔Spec quality assessment with per-dimension scores

✔DSL extraction with validation summary

✔Per-file codegen with layer labels

✔Error auto-fix with cycle count

✔3-pass review with per-pass verdicts

✔Final harness score breakdown (4 dimensions)

ai-spec create "Add task management"

[1/10]  Loading project context...
        Constitution : ✔ found (§1–§9)
        Tech stack   : vue · vite · pinia

[2/10]  Generating spec with glm/glm-4.5...
        ✔ Spec generated  ✔ 8 tasks

[3.4/10] Spec quality assessment...
        Coverage     [██████████████████░░]  9/10
        Clarity      [████████████████░░░░]  8/10

[Gate]  Approval Gate — awaiting decision
        ✔ Approved — continuing...

[DSL]   Extracting structured contract...
        ✔ DSL valid — Models: 3  Endpoints: 7

[6/10]  Code generation (8 files)...
        ✔  service  · src/api/task.ts
        ✔  api      · src/stores/taskStore.ts
        ✔  view     · src/views/TaskList.vue
        ████████████████████  100%

[8/10]  ⚠ 3 errors — auto-fixing cycle 1...
        ✔ All errors resolved in 1 cycle

[9/10]  3-pass code review...
        Pass 1  ✔ Architecture aligned
        Pass 2  ✔ Implementation correct
        Score   [████████████████░░░░]  8.2/10

[10/10] Harness Self-Evaluation...
        Total   [██████████████████░░]  92/100
        ✔ 2 lessons → constitution §9
        RunId: 20260409-143022-a7f2

Observability

Quality you can
measure and track

ai-spec turns code generation quality into data — comparable, trackable, and improvable over time.

Harness Score Trend

Track quality across all runs. See if your pipeline is improving.

Run 1

70

Run 2

74

Run 3

82

Run 4

88

Run 5

92

Per-Run Stage Logs

Every stage is timed and logged to .ai-spec-logs/<runId>.json.

context_load   312ms
spec_gen       18.4s
dsl_extract    6.1s
codegen        51.2s
error_feedback 14.3s
review         14.8s
total          94.3s

4-Dimension Scoring

The harness score is deterministic — no AI calls after generation completes.

Compliance30%

DSL Coverage25%

Compile Pass20%

Review Score25%

Instant Rollback

Don't like the result? One command restores all modified files to their pre-run state.

$ ai-spec restore 20260409-a7f2
↩ src/api/task.ts
↩ src/stores/taskStore.ts
↩ src/views/TaskList.vue
✔ 8 files restored

AI Providers

9 providers,
your choice

Use any combination of providers. Mix a reasoning model for spec generation with a fast model for codegen.

MiMo

mimo-v2-pro

Gemini

gemini-2.5-pro

Claude

claude-opus-4-6

OpenAI

o3 · gpt-4o

DeepSeek

deepseek-chat · r1

Qwen

qwen3-235b-a22b

GLM

glm-5 · glm-4.5-air

MiniMax

MiniMax-Text-2.7

Doubao

doubao-pro-256k

MiMo

mimo-v2-pro

Gemini

gemini-2.5-pro

Claude

claude-opus-4-6

OpenAI

o3 · gpt-4o

DeepSeek

deepseek-chat · r1

Qwen

qwen3-235b-a22b

GLM

glm-5 · glm-4.5-air

MiniMax

MiniMax-Text-2.7

Doubao

doubao-pro-256k

    $ ai-spec create "Add login" --provider gemini --codegen-provider deepseek
  

Quick Start

Ready in 60 seconds

Install globally, set your API key, register a repo, and start shipping.

# Install globally

$ npm install -g ai-spec-dev

# Set your API key (any provider)

$ export GEMINI_API_KEY=your_key_here

# Register your repo + generate constitution

$ ai-spec init

# Start developing

$ ai-spec create "Add user authentication to my app"

View on npm → GitHub Repo

One sentence. Production-ready code.

Why existing AI toolsaren't enough

Every step, orchestrated

Project-Aware from the Start

Structured Spec with Task Decomposition

Interactive Polish with Diff Preview

Machine-Readable Dual Contract

Task-Layered Generation with File Cache

Error Feedback Loop — Up to 3 Cycles

Architecture + Implementation + Impact

Automated Quality Score

Everything you need toship with confidence

Project Constitution System

Dual-Layer Contract

Dual Feedback Loops

VCR Record & Replay

Approval Gate

Fix-History Self-Learning

Instant Rollback

9 AI Providers

Full-stack inone command

One contract,many outputs

OpenAPI 3.1.0 Export

Instant Mock Server

TypeScript Types

Harness Dashboard

What the pipelineactually outputs

Quality you canmeasure and track

Harness Score Trend

Per-Run Stage Logs

4-Dimension Scoring

Instant Rollback

9 providers,your choice

Ready in 60 seconds

One sentence.
Production-ready code.

Why existing AI tools
aren't enough

Everything you need to
ship with confidence

Full-stack in
one command

One contract,
many outputs

What the pipeline
actually outputs

Quality you can
measure and track

9 providers,
your choice