โก The Brief
- What it is: A detailed case study of how a three-person YC startup called Relayboard shipped a full production SaaS app in 17 days using AI coding agentsโprimarily GPT-5 Codex and Claude Opus 4.7โas core development team members.
- Who it’s for: Technical founders, CTOs, and developer teams at early-stage startups who want to understand how agentic AI workflows can replace or augment traditional engineering bandwidth at seed and pre-seed stages.
- Key takeaways: Over 40,000 lines of code were AI-generated across a Next.js 15 frontend, tRPC/Node backend, Temporal workers, and Terraform infra; humans primarily wrote specs and reviewed diffs. Specialized agents with narrow interfaces outperformed monolithic AI prompting approaches.
- Pricing/Cost: Total AI agent costs stayed below the equivalent salary cost of one senior engineer for the same period, achieved through prompt caching, tool-use APIs, and using Gemini 3.1 Flash Lite for low-complexity boilerplate and refactoring tasks.
- Bottom line: For resource-constrained YC startups, orchestrating multiple AI coding agents as first-class team membersโnot glorified autocompleteโis a proven, repeatable path to hitting demo-day velocity expectations with fewer than three engineers.
โ Instant accessโ No spamโ Unsubscribe anytime
Why AI Coding Agents Matter Inside a YC Startup in 2026
A three-person YC startup pushed a working SaaS product to paying users in 17 days, with less than 900 human-written lines of code. Everything elseโover 40k lines across frontend, backend, infra, and testsโwas generated and iterated by AI coding agents orchestrated around GPT-5 Codex and Claude Opus 4.7.
Based on community reports and YC batch surveys, that cadence is no longer an outlier. Reported figures from YC W24 and W25 founder discussions suggest that a majority of teams use AI assistance for the bulk of their first production code, and a meaningful minority describe agentic workflows as โprimary developersโ rather than helpers. For a subset of teams, the most senior โengineerโ in the room is now an orchestration layer coordinating multiple models plus CI tools.
The constraints are familiar: two technical founders, one doing product and GTM, the other nominally โCTO,โ but spending half the batch on fundraising and customer calls. Hiring is slow, equity is expensive, and burn is non-negotiable. Yet expectations around velocity are higher than theyโve ever been. Demo-day-ready means:
- Polished, responsive frontend with real users and stateful auth
- Non-trivial backend logic with integrations (Stripe, Slack, email, etc.)
- Reasonable test coverage and basic observability
- CI/CD that can keep up with daily or hourly pushes
The gap between what two humans can code manually and what investors expect by week 4 is wide. That is the gap AI coding agents are filling when used as first-class citizens in the stack rather than glorified autocomplete.
This article walks through how one YC startupโcall it โRelayboardโโbuilt and shipped a production-grade full-stack app using AI agents as core team members. The focus is not aspirational demos, but the specific architectures, prompts, tools, and trade-offs that actually held up under real traffic and paying customers.
The Relayboard team started from a cold repository and a product spec for โa shared ops dashboard for B2B teams,โ integrating calendar, tickets, and lightweight automation. Four weeks later they had:
- A Next.js 15 / React 19 frontend with Tailwind CSS
- A tRPC + Node backend with Prisma + Postgres
- Background workers on Temporal for long-running automations
- Stripe billing, Slack and Google Calendar integrations
- End-to-end tests on Playwright; API tests in Jest
- Infra on AWS (ECS + RDS + CloudFront) provisioned via Terraform
Human engineers primarily wrote specs, reviewed diffs, and resolved ambiguous product trade-offs. GPT-5 Codex (source) and Claude Opus 4.7 (source) handled nearly all implementation. Gemini 3.1 Flash Lite filled in as a fast, low-cost agent for boilerplate and refactoring. Prompt caching and tool-use APIs kept latency manageable and costs below what a single senior engineer would have cost for the same period.
If you are trying to understand how far you can push AI agents inside your own startupโand where the sharp edges still areโthis is the pattern worth dissecting.
For a closer look at the tools and patterns covered here, see our analysis in How to Use OpenAI Codex in ChatGPT for Full-Stack Development Projects, which covers the practical implementation details and trade-offs relevant to engineering teams shipping production AI systems.
Inside the Architecture: How They Shipped a Full-Stack App Using AI Agents
Relayboardโs core insight was simple: treat each AI model as a specialized contributor with a narrow, well-defined interface, not as a single omniscient โcoder.โ The system architecture looked less like one big chatbot and more like a micro-team:
- Spec Agent โ converts product requirements into technical design artifacts
- Frontend Agent โ owns React/Next.js UI implementation
- Backend Agent โ owns API, data models, and business logic
- Infra Agent โ owns Terraform, Docker, GitHub Actions
- Test Agent โ generates and maintains tests
- Refactor/Docs Agent โ handles cleanup, comments, and docs
Each of these agents used different models and temperature settings, with a central orchestrator deciding which agent to call and with what context. Structurally, the orchestrator looked closer to a workflow engine than a conventional chat UI.
Model choices and roles
The team standardized on three primary APIs, all available on public APIs as of 2026 (source):
- GPT-5 Codex ($1.25/$10 per M tokens, 400k context, released 2025-09-23) โ primary code-generation engine; strong on multi-file edits and tool-use
- Claude Opus 4.7 ($5/$25 per M tokens, 1M context, released 2026-04-16) โ long-context reasoning for specs, architecture, and refactors
- Gemini 3.1 Flash Lite ($0.25/$1.50 per M tokens, 1M context) โ fast, cheap agent for repetitive transformations and small diffs
Rough division of labor:
- Spec Agent, Refactor/Docs Agent โ Claude Opus 4.7
- Frontend/Backend/Infra Agents โ GPT-5 Codex
- Test Agent + mechanical changes (rename, lint, comments) โ Gemini 3.1 Flash Lite
Context-window sizes actually mattered. Claude Opus 4.7, with its 1M-token context window, could ingest:
- Full routes map
- Database schema (Prisma)
- Key backend services
- Selected frontend pages
That allowed the Spec Agent to suggest consistent architecture decisions across the stackโavoiding the traditional โagents donโt know what other agents did yesterdayโ problem.
Repository-aware agents via tools
Instead of pasting files into prompts, the orchestrator exposed the codebase and infra as tools. A simplified tool schema for GPT-5 Codex looked like:
{
"tools": [
{
"name": "read_file",
"description": "Read file contents from the repo",
"parameters": {
"type": "object",
"properties": {
"path": { "type": "string" }
},
"required": ["path"]
}
},
{
"name": "write_file",
"description": "Create or overwrite a file",
"parameters": {
"type": "object",
"properties": {
"path": { "type": "string" },
"content": { "type": "string" }
},
"required": ["path", "content"]
}
},
{
"name": "list_files",
"description": "List files under a directory",
"parameters": {
"type": "object",
"properties": {
"path": { "type": "string" }
},
"required": ["path"]
}
},
{
"name": "run_tests",
"description": "Run test suite or subset and return results",
"parameters": {
"type": "object",
"properties": {
"scope": { "type": "string" }
},
"required": ["scope"]
}
}
]
}
Function-calling allowed GPT-5 Codex to inspect the current state of the repo, plan changes, and iteratively apply patches. The orchestrator enforced guardrails: no writes outside src/, infra/, tests/, and no tool calls that could access secrets or live AWS accounts without human confirmation.
Prompt scaffolding: system vs developer prompts
The stability of agents came from careful layering of system and developer prompts:
- System prompt โ global, model-specific behavior: style, constraints, safety
- Developer prompt โ per-agent role, stack details, and project conventions
- User prompt โ specific task (โImplement customer billing page with these fieldsโฆโ)
An excerpt from the Backend Agentโs developer prompt:
You are the Backend Agent for the Relayboard app.
Stack:
- Node 22, TypeScript
- tRPC for API layer
- Prisma for Postgres schema and access
- Zod for input validation
- Redis for caching
Conventions:
- All endpoints must be tRPC procedures under src/server/routers
- All DB access goes through Prisma client
- Validation in Zod schemas colocated with routers
- Prefer pure functions; avoid side effects in request handlers
Rules:
- Before writing code, call list_files and read_file to inspect existing patterns.
- Reuse existing utility functions and types where possible.
- After changes, call run_tests with scope="api" and fix any failing tests.
Output:
- Use only the provided tools to modify files.
- Do not invent new libraries without explicit instruction.
Spec Agent prompts enforced stronger chain-of-thought reasoning, but hidden from logs that founders might skim. The orchestrator requested rationale in a structured JSON field (for internal use) and a concise โplanโ summary for humans. This separation meant long reasoning did not clutter git history or PRs but was still available for debugging agent decisions.
For a closer look at the tools and patterns covered here, see our analysis in The Complete Google AI Stack 2026: 50+ Tools, Cloud Next Keynote Breakdown, and How They Compare to OpenAI, Anthropic & Microsoft, which covers the practical implementation details and trade-offs relevant to engineering teams shipping production AI systems.
Prompt caching and latency
Long system and developer prompts can dominate context and costs. Relayboard used server-side prompt templates with caching:
- System + developer prompts registered once per agent per model version
- Only the user prompt and recent tool-call state varied per task
- OpenAIโs and Anthropicโs prompt-caching features were used wherever available, cutting prompt billing by an estimated 30โ40% based on the teamโs internal logs
Latency for a multi-step feature (e.g., โadd recurring billing with prorationsโ) typically ran in the 90โ180 second range end-to-end based on the teamโs telemetry: design, code edits, tests, refactor. That was acceptable because orchestrations ran asynchronously; founders reviewed diffs after the fact, similar to PR reviews from a remote teammate.
Implementation Walkthrough: From Spec to Production Deployment
Get Free Access to 40,000+ AI Prompts
Join 40,000+ AI professionals. Get instant access to our curated Notion Prompt Library with prompts for ChatGPT, Claude, Codex, Gemini, and more โ completely free.
Get Free Access Now โNo spam. Instant access. Unsubscribe anytime.
To make this concrete, consider a single feature the team shipped entirely via agents: โAdd a billing settings page where admins can upgrade plans, view invoices, and manage payment methods. Use Stripe. Respect existing role-based access control. Ensure end-to-end tests pass.โ
Step 1: Product spec to technical design
The human founder wrote a 1.5-page Notion doc with:
- User stories (admin, member, billing manager)
- Wireframe screenshots
- Stripe object fields that should appear in UI
- Non-goals for the first iteration
The Spec Agent (Claude Opus 4.7) used a RAG layer to pull:
- Existing RBAC policy docs
- Database schema: users, teams, subscriptions
- Stripe integration code already used for initial checkout
It then generated a technical design doc stored in docs/billing-design-v1.md:
- New endpoints:
team.billing.getPortalUrl,team.billing.getInvoices - DB schema changes: additional Stripe customer metadata
- Required UI components in
src/app/settings/billing - Error states and loading behaviors
Humans skimmed and lightly edited this design, then marked it approved in Notion. This approval triggered the orchestrator to create a โfeature workflowโ consisting of four tasks: Backend, Frontend, Tests, and Infra (small update to webhook URL).
Step 2: Backend implementation with GPT-5 Codex
The Backend Agent received:
- Link to the approved design doc
- Paths to relevant routers and Prisma schema files
- Instructions to avoid new abstractions unless necessary
The agentโs chain looked like:
- Call
list_filesonsrc/server/routersto locate existing team-related endpoints - Call
read_fileonteam.tsrouter and auth middleware - Draft new tRPC procedures using Stripe SDK already instantiated in a shared
stripe.ts - Call
write_fileto add new procedures, keeping changes minimal and localized - Call
run_testswith scope"api" - On failure, repeat until tests pass or retries limit hit (usually 2โ3 attempts)
New endpoints were fully implemented without human keypresses. Human review focused on Stripe usage correctness and ensuring no sensitive data leaked to the client.
Step 3: Frontend implementation with contract-first approach
Next, the Frontend Agent (GPT-5 Codex) acted only after the Backend Agent registered its API contract in a JSON schema document automatically generated by a small utility:
{
"team.billing.getInvoices": {
"input": { "teamId": "string" },
"output": [
{
"id": "string",
"amount": "number",
"currency": "string",
"status": "string",
"createdAt": "string"
}
]
},
"team.billing.getPortalUrl": {
"input": { "teamId": "string", "returnUrl": "string" },
"output": { "url": "string" }
}
}
The Frontend Agentโs developer prompt required:
- Use Tailwind and existing design tokens
- Use tRPC hooks like
trpc.team.billing.getInvoices.useQuery - Handle loading, error, and empty states explicitly
- No inline styling; use existing component primitives
Its orchestration flow:
- Read
src/app/settings/layout.tsxto integrate the new โBillingโ tab - Create
src/app/settings/billing/page.tsxwith basic skeleton - Integrate tRPC hooks for data fetching
- Wire up โManage subscriptionโ button to the Stripe billing portal URL
The first run overscoped the UI (adding plan upgrade downgrades that werenโt in scope). The orchestrator detected this by comparing the implementation diff against the approved design doc via a small โscope checkerโ agent running on Gemini 3.1 Flash Lite. That agent flagged out-of-scope elements, and the orchestrator prompted the Frontend Agent to remove them in a second pass.
Step 4: Tests and regression protection
The Test Agent used Gemini 3.1 Flash Lite for speed and cost. Its prompt emphasized:
- Use existing test utilities; no new patterns without reason
- Focus on RBAC, happy-path billing flows, and key regression points
- Target ~80% route coverage for new endpoints
It generated:
- Jest tests for
team.billing.getInvoicesandgetPortalUrl - A Playwright test that:
- Logs in as admin
- Navigates to settings โ billing
- Checks invoice list renders
- Asserts โManage subscriptionโ opens a Stripe-hosted page in a new tab
Human review mostly tweaked test flakiness around Stripeโs sandbox behavior. Over time, the team added heuristics to the Test Agent to avoid relying on external network calls where mocks already existed.
Step 5: Infra and deployment
The Infra Agent used GPT-5 Codex with a Terraform-focused prompt and a narrower toolset that only accessed infra/ and GitHub Actions definitions. For this feature, it:
- Updated environment variable definitions for new Stripe webhook URLs
- Modified ECS task definitions to include extra secrets
- Updated
stagingandproductiondeployment workflows in GitHub Actions
Every infra change required human approval before merge, enforced by a protected-branch rule and a GitHub label needs-human-infra-review added automatically by the orchestrator whenever Infra Agent touched infra/.
Step 6: Human review and production rollout
Founders reviewed the agent-generated PRs like they would review contributions from a junior engineer:
- Scan design โ backend โ frontend โ tests for coherence
- Spot-check edge cases (RBAC, error handling, observability)
- Trigger canary deployment to 10% of workspaces for 24 hours
Error rates and latency were monitored via Datadog dashboards that the Infra Agent had initially scaffolded and humans later refined. Once metrics stayed stable under real usage, the feature rolled out to 100% and became part of the standard product.
End-to-end calendar time: 2.5 days from initial spec to full production rollout. Net human time: roughly 4 hours of review and small changes.
For a closer look at the tools and patterns covered here, see our analysis in Case Study: How a SaaS Startup Cut Development Time by 60% Using OpenAI Codex, which covers the practical implementation details and trade-offs relevant to engineering teams shipping production AI systems.
Benchmarks, Costs, and Trade-offs vs Traditional Teams
The obvious question is whether this is actually better than hiring one or two more engineers. Relayboard tracked detailed metrics across their four-week build to compare:
- Agent-assisted workflow (their real approach)
- Counterfactual: a manual implementation trajectory based on founder historical output
Velocity and scope delivered
Over 28 days, the team logged:
- Approx. 40k lines of code added (excluding generated type files)
- ~600 commits, 70% initiated by agents
- 96 merged PRs, 68 of which originated entirely from agents
For comparison, the founder-CTOโs past output on a similar stack was ~400โ600 lines of production code per day under optimal conditions. Accounting for context-switching, investor meetings, and customer calls, realistic manual output would have been closer to 10โ15k lines in the same period, with a narrower feature set.
The effective throughput increase was roughly 3x, but with caveats: more time on review, more time debugging subtle issues, and a heavy up-front investment in the agent orchestration layer. Those 40k lines also included more churnโagents refactoring their own output, removing dead code, and iterating on tests.
Cost model: API vs headcount
API costs for the month, simplified using verified 2026 pricing (source):
| Category | Model | Tokens (approx.) | Cost per 1M tokens (input/output) | Total Cost (USD) |
|---|---|---|---|---|
| Spec + architecture | Claude Opus 4.7 | 80M | $5 / $25 | ~$1,000 |
| Code generation | GPT-5 Codex | 220M | $1.25 / $10 | ~$900 |
| Tests + refactors | Gemini 3.1 Flash Lite | 150M | $0.25 / $1.50 | ~$80 |
| Prompt caching savings | Mixed | -100M (avoided) | โ | -~$500 |
| Total | โ | ~350M net | โ | ~$1,500 |
All-in, API bills landed in the low single-digit thousands for the month. Add one-time engineering time to build the orchestrator (roughly two human-weeks) and ongoing maintenance (a few hours per week).
By contrast, hiring a single senior full-stack engineer in SF would have run $18kโ$25k per month in cash comp during YC, plus equity. Contracting out the build at market rates would have been north of $50kโ$80k for a comparable scope and polish.
Quality and bug profile
Quality was not โautomatically handled.โ Bugs fell into three main classes:
- Misaligned business logic โ agents interpreted ambiguous specs too literally
- Integration edge cases โ especially around third-party APIs and webhook retries
- Type drift โ TypeScript types slowly diverged from reality when agents refactored code in pieces
Relayboard tracked defect density during the first production month:
| Source | Bugs per 1k LOC (first 30 days) | Notes |
|---|---|---|
| Agent-authored code | ~0.9 | Higher share of minor UX/API mismatch issues |
| Human-authored code | ~0.6 | More complex but fewer cosmetic issues |
The gap closed over time as the team hardened prompts, especially around schema changes and TypeScript types. A โschema guardianโ agent (Claude Sonnet 4.6) was added later, whose only job was to compare any proposed schema diff against existing usage and suggest migration/test updates before merge.
When agents failed badly
There were concrete failure modes:
- Overfitting to local patterns โ agents copied early suboptimal decisions, making later refactors painful
- Non-idempotent infra changes โ Terraform edits that broke
terraform planuntil humans intervened - Hidden coupling โ agents leaked assumptions across boundaries (e.g., relying on particular error message strings for control flow)
Agent workflows were explicitly disabled for:
- Security-sensitive flows (auth, encryption, key management)
- Data migrations that could destroy or corrupt production data
- Anything with regulatory impact (GDPR deletion, audit logging)
In those areas, the team used agents only as pair programmersโsuggesting code in an IDE or reviewing human-written draftsโbut never with direct write access to the repo.
Latency vs. human pairing
Compared to a human junior engineer, agent round-trips were:
- Slower on a single change (minutes vs. seconds) due to tool-calls and tests
- Faster on bulk edits (e.g., rename a core type across 120 files)
- Much faster on boilerplate-heavy tasks (forms, DTOs, simple CRUD)
Actual developer experience looked like this:
- Founder writes a spec at 11pm
- Orchestrator kicks off multi-agent workflow overnight
- By morning, 1โ3 PRs exist, passing tests, waiting for review
Instead of โliveโ human pairing, Relayboard leaned into asynchronous collaboration with the agents, very similar to collaborating across time zones.
What This Means for Early-Stage Product Strategy
Get Free Access to 40,000+ AI Prompts
Join 40,000+ AI professionals. Get instant access to our curated Notion Prompt Library with prompts for ChatGPT, Claude, Codex, Gemini, and more โ completely free.
Get Free Access Now โNo spam. Instant access. Unsubscribe anytime.
The Relayboard story is not a one-off curiosity; by 2026, YCโs internal tooling already assumes teams will be AI-heavy by default. The question for a new startup is not โshould they use AI coding agents?โ but โhow aggressively should they treat agents as core team members versus glorified autocomplete?โ
When this approach makes sense
Agent-centric full-stack development is especially viable when:
- Your product is CRUD-heavy SaaS with clear domain models and workflows
- Your stack is conventional โ React, Node, Rails, Django, Go REST, etc.
- You can articulate UX and behavior clearly in text and simple diagrams
- Youโre willing to treat prompts and orchestration as first-class infra
It is less attractive when:
- Youโre pushing the boundary on systems-level performance (custom databases, zero-copy networking)
- Your product surface area is small but correctness requirements are extreme (e.g., medical, financial trading engines)
- Your senior engineers already ship at a very high velocity and resist additional abstraction layers
For many YC startups building internal tools, dashboards, and SaaS workflows, the upside dominates. For teams building a new kernel or on-chain protocol, agents are better kept in an assistive role.
Organizational implications
Treating models as first-class contributors forces changes to how you run engineering:
- Specs over tickets โ you write fewer JIRA tickets and more rich product docs with examples
- Prompts as code โ agent prompts live in the repo, versioned, reviewed, and tested
- Git hygiene โ agents can drown you in PRs unless you design batching and scoping carefully
- Metrics on agents โ track agent success rates, revert rates, and bug attribution explicitly
Relayboard instrumented their orchestrator to emit metrics to Datadog:
- Success vs. failure per agent type
- Average number of tool-calls and retries per task
- Time from spec creation to PR ready
This made it possible to debug not only the app but also the โengineering teamโ made of agents. They iterated on prompts the same way they tuned database indices or cache policies.
Designing your own agent stack
A minimal viable agent stack for a new YC team in 2026 might look like:
- Start with one orchestrator service that:
- Knows how to call GPT-5 Codex (or the newer GPT-5.1-Codex / GPT-5.1-Codex-Max), Claude Opus 4.7, and Gemini 3.1 Flash Lite
- Implements repo tools (
read_file,write_file,list_files,run_tests) - Persists task state and logs to a Postgres table
- Define 2โ3 agents to start:
- One for backend, one for frontend, one for tests
- Each with a clear developer prompt and stack conventions
- Wire into GitHub:
- Agents open PRs under a bot account
- Require one human review before merge
- Scope your first features tightly:
- CRUD page, simple form, or dashboard with read-only data
- Avoid multi-tenant auth or billing as first agent tasks
- Iterate on metrics:
- Track how often humans have to rewrite agent code
- Adjust prompts, temperatures, and model choices accordingly
A simple orchestrator loop in pseudo-TypeScript:
type AgentName = "frontend" | "backend" | "tests";
async function runTask(agent: AgentName, request: TaskRequest) {
const config = getAgentConfig(agent); // model, system prompt, tools
const messages = buildMessages(config, request);
const response = await callLLM(config.model, {
messages,
tools: config.tools,
tool_choice: "auto"
});
await handleToolCallsAndIterations(response, config, request);
await persistTaskResult(request.id, response);
}
Founders do not need a full โagent platformโ to benefit. A 300โ500 line orchestrator plus a handful of prompts is enough to turn a good LLM into a reliable teammate on the repo.
Risks, governance, and future direction
Several risks deserve explicit handling:
- Data leakage โ avoid sending secrets, production data, or PII to external APIs; use anonymization and test data
- Model drift โ when new model versions ship (GPT-5.1, GPT-5.2, GPT-5.3-Codex, Claude Opus 4.7, etc.), re-run a regression suite on your prompts
- Vendor risk โ avoid hard-coding everything around one model; keep interfaces thin and swappable
Relayboard mitigated model drift by pinning model versions in config and running nightly synthetic tasks as health checks. When a provider announced a deprecation or new default, the team tested new versions behind a feature flag on the orchestrator before rolling out.
Looking forward, the likely direction is tighter integration between:
- Agent orchestration and CI/CD pipelines
- Internal code search / RAG against your repo and design docs
- IDE plugins that let humans โhand offโ chunks of work to the orchestrator mid-flow
The YC batch after Relayboard already saw teams where the โdefaultโ way they shipped a full-stack app using AI was: spec in Notion โ agent workflow โ daily PR review. Human engineers focused on system design, product discovery, and the 20% of code where correctness and safety requirements exceed what current models can guarantee.
Useful Links
- OpenAI Function Calling and Tool Use Documentation
- OpenAI Platform Models Reference (GPT-5, GPT-5 Codex, GPT-5.1, GPT-5.2, GPT-5.3-Codex, GPT-5.4, GPT-5.5)
- Anthropic Claude Opus 4.7 and Claude Sonnet 4.6 Model Docs
- Google Gemini 3.1 Pro and Gemini 3.1 Flash Lite Model Guide
- OpenRouter Cross-Provider Model Catalog
- OpenAI Cookbook: Patterns for Tool Use, RAG, and Agents
- Prisma ORM GitHub Repository
- Next.js Documentation
- Stripe Developer Documentation for Billing and Subscriptions
- GitHub Actions Documentation
- HashiCorp Terraform Documentation
Frequently Asked Questions
Which AI coding agents did Relayboard use to ship their product?
Relayboard primarily used GPT-5 Codex and Claude Opus 4.7 for core implementation tasks. Gemini 3.1 Flash Lite served as a fast, low-cost agent for boilerplate generation and refactoring. Each model was treated as a specialized contributor with a narrow, well-defined interface rather than a single all-purpose coder.
How many lines of code did AI agents generate versus human engineers?
AI coding agents generated over 40,000 lines of code spanning frontend, backend, infrastructure, and tests. Human engineers wrote fewer than 900 lines directly. The human team focused on writing product specs, reviewing diffs, and resolving ambiguous product trade-offs rather than implementation.
What tech stack did the Relayboard team ship using AI agents?
The stack included Next.js 15 with React 19 and Tailwind CSS on the frontend, a tRPC and Node backend with Prisma and Postgres, Temporal for background workers, Stripe and Slack integrations, Playwright and Jest for testing, and AWS infrastructure provisioned via Terraform.
How did Relayboard structure their AI agents to avoid poor output quality?
They divided work across specialized agents: a Spec Agent for technical design, a Frontend Agent for React/Next.js, and a Backend Agent for APIs and data. This micro-team model with narrow interfaces significantly outperformed monolithic single-prompt approaches and kept outputs focused and reviewable.
What share of YC startups now use agentic AI workflows as primary developers?
Based on community reports and YC batch discussions, the majority of recent YC teams use AI assistance for the bulk of their first production app's code, and a meaningful minority describe agentic workflows as their primary developers rather than helpers โ reflecting a real shift in how early-stage teams are structured.
How did the team keep AI agent costs below one senior engineer's salary?
Cost efficiency came from three practices: using prompt caching to avoid redundant token usage, leveraging tool-use APIs to reduce round-trips, and routing low-complexity tasks like boilerplate and refactoring to Gemini 3.1 Flash Lite ($0.25/$1.50 per M tokens) instead of more expensive frontier models like GPT-5 Codex ($1.25/$10) or Claude Opus 4.7 ($5/$25).
๐ Instantโ Unlimited๐ Free

