Skip to content
Greeto

AI web development · June 29, 2026 · 8 min read · Updated June 29, 2026

Building Websites With Claude Code: How a One-Engineer Studio Ships Production B2B Sites

By Tal Gerafi, Founder & Website Engineer

In short

Building production websites with Claude Code works when you treat it as a supervised system, not a magic prompt. At Greeto we run one repeatable loop — research, plan, build, review, ship — where a senior engineer directs the work, AI does the heavy lifting, and nothing reaches a client without human review. The leverage comes from structure: a clear CLAUDE.md, focused skills, and a roster of subagents that each own one step, not one giant chat trying to do everything at once.

AI web developmentClaude CodeSpec-driven development

Anyone can generate a website now. Type a sentence, get a page. What is not a commodity is a site that is fast, correct, on-brand, and still working a year after launch. That gap — between "generated something" and "shipped something you can stand behind" — is exactly where a supervised AI workflow earns its keep.

This guide is the version we actually use at Greeto, a one-engineer studio that builds B2B and SaaS sites with Claude Code. It is not a list of clever prompts. It is the operating system behind the work: how we structure the project so a coding agent can move fast without going off the rails, and where a human stays firmly in the loop.

What Claude Code actually is (and what it isn't)

Claude Code is a terminal-based coding agent: it reads your repository, runs commands, edits files, and verifies its own work in a loop — not a chat window that hands you snippets to paste. That distinction matters. An agent that can read the real codebase and run the real build is the difference between AI-assisted web development and autocomplete.

What it is not is an autonomous replacement for an engineer. Left unsupervised on a real project, any agent will eventually make a confident, wrong change. The job is to give it enough structure that its good output is reliable and its bad output is caught early. Everything below is about that structure.

The loop we run: research → plan → build → review → ship

Every meaningful piece of work goes through the same five steps, in order:

  1. Research — read the existing code, the brand, the live analytics, and the request before touching anything. Skipping this is the single biggest cause of confident-but-wrong work.
  2. Plan — write the plan down as a short spec before generating code. This is spec-driven development: the spec is the artifact you review, not 600 lines of diff after the fact.
  3. Build — the agent executes the approved plan, keeping the build green at every step.
  4. Review — a human reads the diff and a separate agent stress-tests it. Nothing skips this.
  5. Ship — deploy, then confirm with ground truth: the build passes, a screenshot looks right, the page is actually live.

The order is the discipline. Most failures we see in other people's agentic workflows come from collapsing research and planning into "just build it" — what people now call vibe coding. That is fine for a throwaway prototype and a poor fit for a client's production site.

CLAUDE.md, skills, hooks, and subagents: what goes where

The most common question we get is where each piece of Claude Code configuration belongs. They are not interchangeable. The simple rule: CLAUDE.md is always-on context, skills are on-demand procedures, hooks are guarantees, and subagents are isolation.

PieceWhat it isUse it forDon't use it for
CLAUDE.mdAlways-loaded project memoryThe few rules that apply to every task — stack, conventions, hard constraintsLong procedures you only need sometimes (bloats every request)
SkillsOn-demand instructions loaded when relevantA repeatable procedure ("how we build a page", "how we run SEO")Facts that must always be true (use CLAUDE.md)
HooksCode the harness runs on an eventHard guarantees — run the formatter on save, block a bad commandJudgement calls; hooks are deterministic, not smart
SubagentsA fresh agent with its own context windowIsolating a big or noisy task so it doesn't pollute the main threadQuick edits where the overhead isn't worth it

The mistake almost everyone makes is putting everything in CLAUDE.md. It feels safe, but it loads on every request and quietly eats the context window. Keep CLAUDE.md short and let skills carry the detail — this is progressive disclosure, and it is the single highest-leverage habit for a large build.

The team: a roster of subagents, not one giant prompt

Greeto runs as a small team of subagents, each owning one step of the build: one fetches an existing site, one sets the look, one plans content, one builds, one does SEO, one runs QA, one launches. Each gets a clean context window and a narrow job.

Why split it up? Because a single agent trying to hold the whole project in its head degrades as the context fills. A focused subagent that only has to "build this one page" produces sharper work than one juggling the brand, the analytics, and the deployment all at once. It is the same reason a real studio has roles.

A concrete, honest example: the research behind this very site's growth plan was run as a workflow that fanned out ten specialist agents in parallel — demand research, competitor teardown, an AEO/GEO playbook, the architecture — then an eleventh agent red-teamed the result before a human ever read it. No single prompt could have held all of that coherently. The structure is the product.

Context engineering: keeping a big build coherent

Context engineering is the practical craft of deciding what the agent sees, and when. On a real codebase the failure mode is not "the model isn't smart enough" — it's "the model is looking at the wrong 5% of the repo." Our defaults:

  • Keep CLAUDE.md lean; push detail into skills that load only when needed.
  • Use subagents to keep noisy work (a 200-file search, a long log) out of the main thread.
  • Connect the agent to live tools through the Model Context Protocol (MCP) — Search Console, analytics, the browser — so it reasons over real data instead of guesses.
  • Re-state the goal before long steps; a model that drifts is usually a model that lost the goal three turns ago.

Spinner verbs and agent UX

A small thing we are weirdly known for: customizing Claude Code's "spinner verbs" — the little status words it shows while working. It sounds trivial, but agent UX is how a team trusts a long-running build, and it's a surprisingly under-served corner of the ecosystem. We wrote the team playbook for it separately: Claude Code spinner verbs. Small ergonomics compound when you live in the tool every day.

Spec-driven development vs vibe coding

The biggest reliability decision is whether you write a spec first. Here is the honest trade-off:

Spec-driven developmentVibe coding
You reviewThe plan, before code existsThe diff, after the fact
Best forProduction work, client sites, anything you maintainPrototypes, throwaways, exploration
Speed (small task)Slightly slower to startFastest
Speed (real feature)Faster overall — less reworkSlower — rework piles up "at hour three"
Failure modeOver-planning a trivial changeConfident, wrong code nobody caught

For client websites we default to spec-driven development every time. The spec is cheap; the rework from a wrong direction is not. We keep a fuller write-up here: spec-driven development with Claude Code.

What it costs and what breaks (honestly)

Two things people rarely publish:

It costs real tokens. A deep research or build pass that fans out many agents is not free — it trades money for quality and speed. We spend it deliberately on the work that benefits (research, broad reviews) and stay lean on the work that doesn't (a one-line copy fix doesn't need a committee).

It breaks in predictable ways. The classic failure is the "hour three" drift: a long unsupervised session where small wrong assumptions compound until the output quietly stops matching the goal. The fix is not a better model — it's shorter loops, a written plan to anchor against, and a human reading the diff before it ships. Every confident hallucination we've caught was caught by review, not by the model noticing on its own. That is why human-in-the-loop review is a gate in our process, never an optional step.

How we keep AI work safe

One rule sits above all the others: nothing reaches a client without a human reviewing it. The engineer who signs the work reads the change. A separate agent is often pointed at the diff specifically to argue against it — to find what's wrong before a person does. Speed is the point of the system; accountability is the point of the human. You need both, and they are not in tension when the structure is right.

FAQ

How do Claude Code subagents work, and when should you use them?

A subagent is a fresh agent with its own context window that runs a focused task and returns a result. Use one when a task is big or noisy enough that doing it in the main thread would crowd out everything else — a wide codebase search, a long build, an independent review. For a quick edit, the overhead isn't worth it; just do it inline.

What goes in CLAUDE.md versus skills versus hooks?

CLAUDE.md holds the few rules true for every task. Skills hold procedures you only need sometimes and that load on demand. Hooks are deterministic code the harness runs on an event for a hard guarantee (formatting, blocking a dangerous command). If a rule must always be enforced and isn't a judgement call, a hook beats a sentence in CLAUDE.md.

Is spec-driven development worth it for small changes?

For a genuinely small change, no — write the change. The value of spec-driven development shows up on real features, where reviewing a short plan first prevents the expensive rework of discovering, after 600 lines of diff, that the direction was wrong.

Can you build an entire website with Claude Code?

Yes, and we do — but "build a whole site" is not one prompt. It's the research → plan → build → review → ship loop run many times across the look, the content, the SEO, and the launch, with a human directing and reviewing throughout. The agent is the muscle; the structure and the sign-off are what make it shippable.

Glossary terms in this guide