/

ChatGPT Coding Masterclass Part 5: 10 Real-World Coding Projects with ChatGPT + Codex

10 Real-World Coding Projects with ChatGPT and Codex
10 Real-World Coding Projects with ChatGPT and Codex
Agent Harness Architecture diagram
Agent Harness Architecture diagram

10 Real-World Coding Projects with ChatGPT + Codex — Part 5: Harnessing Agent Infrastructure for Long-Running AI Tasks

Welcome to Part 5 of the ChatGPT Coding Masterclass series. This module dives deep into building long-running AI-powered coding projects using the cutting-edge GPT-5.3-Codex, focusing on the critical concept of the Agent Harness — the operating system for AI agents.


Table of Contents

  1. Introduction: Why Agent Harnesses Matter
  2. Theoretical Deep-Dive: Agent Harness Engineering
  3. What is an Agent Harness?
  4. Context Engineering Fundamentals
  5. Architectural Constraints & Entropy Management
  6. The 3-Agent Pattern: Planner, Generator, Evaluator
  7. Step-by-Step Implementation Guide
  8. Setting up the Codex CLI environment
  9. Creating your first agent harness project
  10. Implementing AGENTS.md for Progressive Disclosure
  11. Developing the Planner, Generator, Evaluator agents
  12. Integrating Playwright MCP for automated QA
  13. OpenAI Agents SDK and Testing Harnesses
  14. SDK Architecture Overview
  15. Using the SDK to build and orchestrate agents
  16. Writing test harnesses to verify harness logic
  17. Codex CLI Usage Deep Dive
  18. Installation & configuration commands
  19. Common commands for harness lifecycle management
  20. Debugging and log inspection commands
  21. IDE Integration: VS Code, Cursor, Windsurf
  22. Installing GPT-5.3 plugins
  23. Setting up multi-agent workflows in IDE
  24. Live debugging and interactive agent tuning
  25. Advanced Prompt Templates for Agent Harnesses
  26. 5 ready-to-use templates with placeholders
  27. Concrete Code Examples
  28. Rust: Agent harness core orchestration
  29. Python: Planner-Generator-Evaluator loop
  30. TypeScript: Playwright MCP integration
  31. Pro Tips & Edge Cases
  32. Common pitfalls in context inflation
  33. Handling agent hallucinations and verification loops
  34. Efficient entropy management for multi-agent systems

Introduction: Why Agent Harnesses Matter

In 2026, the frontier of AI-assisted software engineering has evolved beyond simple question-answering or code generation. The new paradigm revolves around long-lived, autonomous AI agents capable of managing complex workflows, debugging, and continuous improvement cycles without human intervention.

This is where Agent Harnesses come in — the vital infrastructure layer that:

  • Constrains AI models to prevent runaway generation
  • Informs agents by carefully engineering context windows
  • Verifies and corrects agent outputs through structured feedback loops

Think of the Agent Harness as the operating system for AI agents: GPT-5.3-Codex is the CPU, the conversation context is RAM, and the harness is the OS managing task scheduling, memory, and security.


Theoretical Deep-Dive: Agent Harness Engineering

What is an Agent Harness?

An Agent Harness is a software infrastructure wrapping one or more AI models (agents) to:

  • Manage context injection and progressive disclosure of knowledge
  • Enforce architectural constraints (e.g., token limits, API rate limits)
  • Implement entropy management to reduce randomness and improve determinism
  • Orchestrate multi-agent pipelines with Planner-Generator-Evaluator patterns
  • Maintain state persistence for long-running tasks

This lets agents operate like software processes, each with defined roles, inputs, and outputs.


Context Engineering Fundamentals

The power of GPT-5.3 lies in its enormous context window (100k+ tokens), but how you fill that context determines agent effectiveness.

Agent Harnesses use context engineering to:

  • Start with minimal seed context (e.g., AGENTS.md) acting as a map to deeper documentation or code
  • Apply progressive disclosure, where agents are only given small, relevant context slices at a time, reducing noise and hallucination
  • Include metadata annotations to tag context chunks with versioning, timestamps, or semantic roles (e.g., spec, test cases, bug reports)
  • Define context refresh policies to update or prune context as the agent progresses

This approach is vital for long-running agents managing evolving codebases or specs.


Architectural Constraints & Entropy Management

Architectural constraints are rules baked into the harness to:

  • Limit token usage per agent call
  • Enforce maximum call frequency (rate limits, cooldowns)
  • Define strict input/output schemas (via JSON schemas or protobufs)
  • Restrict allowed APIs or file system access

Entropy management refers to controlling randomness during generation (temperature, top_p) and output validation:

  • Use low temperature (0.0 – 0.3) for deterministic tasks like code generation
  • Introduce stochasticity only during exploratory phases (e.g., brainstorming)
  • Implement multi-step verification loops to catch hallucinations or errors
  • Use Evaluators with Playwright MCP (Model-Checking Protocol) to run real tests and provide feedback

Together, these ensure safe, reliable, and auditable AI agent operation.


The 3-Agent Pattern: Planner, Generator, Evaluator

A powerful architectural pattern in agent harnesses is the Planner-Generator-Evaluator (PGE) triad:

Agent Role Responsibilities Outputs
Planner Expands high-level specs into detailed tasks or sprints Sprint Contracts (task breakdowns)
Generator Creates code or artifacts based on sprint contracts Code, configs, tests
Evaluator Runs tests, reproduces bugs, validates output, generates QA reports Validation results, bug repro videos, PR suggestions

This triad runs in a closed feedback loop, enabling autonomous bug reproduction & fixing.


Use the PGE pattern to achieve full agent autonomy and continuous integration within your harness.


Step-by-Step Implementation Guide

1. Setting up the Codex CLI environment

The Rust-based Codex CLI is your command-line gateway to GPT-5.3-Codex and agent harness orchestration.

# Install Codex CLI via Cargo (Rust's package manager)
cargo install codex-cli

# Verify installation
codex --version
# Expected output:
# codex-cli 1.2.0 (GPT-5.3-Codex support)

Configure your OpenAI credentials:

export OPENAI_API_KEY="sk-xxxxxx-your-token-xxxxxx"
codex configure
# Follow interactive prompt to confirm setup

2. Creating your first agent harness project

# Initialize a new harness project scaffold
codex harness init my-ai-project

cd my-ai-project

# Directory structure overview:
# ├── agents/
# │   ├── planner.agent
# │   ├── generator.agent
# │   └── evaluator.agent
# ├── AGENTS.md
# ├── harness.toml
# └── tests/

3. Implementing AGENTS.md for Progressive Disclosure

AGENTS.md acts as the context map injected into the harness context for agents to reference source-of-truth documents:

# AGENTS.md - Agent Context Map

## Overview
This file maps agents to their authoritative sources.

- Planner: `docs/specs/feature_spec.md`
- Generator: `src/`
- Evaluator: `tests/e2e/`

## References
- Design Doc: `docs/design_doc.md`
- Bug Reports: `logs/bug_reports.md`
- Coding Standards: `docs/coding_standards.md`

In harness.toml, configure progressive disclosure:

[context]
base_context = ["AGENTS.md"]
progressive_disclosure = true
max_context_tokens = 8000

4. Developing the Planner, Generator, Evaluator agents

Each agent is implemented as a .agent file with prompt templates and role-specific logic.

Example planner.agent snippet:

# Planner Agent Prompt

You are the planning agent. Your job is to break down the following feature specification into sprint contracts.
Use the AGENTS.md map to find detailed specs.

Feature Spec:
{feature_spec}

Output format:
- Sprint Name
- Tasks (with descriptions and acceptance criteria)

5. Integrating Playwright MCP for automated QA

The Evaluator uses Playwright MCP to run integration tests and validate code outputs.

Example integration in evaluator.agent:

import { PlaywrightMCP } from 'openai-agents-sdk';

async function runEvaluation() {
  const mcp = new PlaywrightMCP();

  const testResult = await mcp.runTest('tests/e2e/my_feature.spec.ts');

  if (!testResult.passed) {
    return {
      status: 'fail',
      logs: testResult.logs,
      screenshot: testResult.screenshotPath,
    };
  }
  return { status: 'pass' };
}

OpenAI Agents SDK and Testing Harnesses

SDK Architecture Overview

The OpenAI Agents SDK provides:

  • Agent classes to build and manage agent roles
  • Context management utilities for progressive disclosure
  • Orchestration tools to run multi-agent pipelines asynchronously
  • Testing harnesses to simulate agent inputs and verify outputs

Using the SDK to build and orchestrate agents

Example Rust snippet using the SDK:

use openai_agents_sdk::{Agent, Harness};

#[tokio::main]
async fn main() {
    let planner = Agent::new("planner.agent").await.unwrap();
    let generator = Agent::new("generator.agent").await.unwrap();
    let evaluator = Agent::new("evaluator.agent").await.unwrap();

    let mut harness = Harness::new(vec![planner, generator, evaluator]);

    let feature_spec = std::fs::read_to_string("docs/specs/feature_spec.md").unwrap();

    // Run the planner
    let sprint_contracts = harness.run_agent("planner", feature_spec).await.unwrap();

    // Pass contracts to generator
    let generated_code = harness.run_agent("generator", sprint_contracts).await.unwrap();

    // Evaluate generated code
    let eval_result = harness.run_agent("evaluator", generated_code).await.unwrap();

    println!("Evaluation Status: {:?}", eval_result);
}

Writing test harnesses to verify harness logic

Use the SDK’s test harness utilities to simulate agents and verify state transitions:

#[cfg(test)]
mod tests {
    use super::*;
    use openai_agents_sdk::test_harness::*;

    #[tokio::test]
    async fn test_planner_generates_sprints() {
        let planner = Agent::new("planner.agent").await.unwrap();
        let harness = TestHarness::new(planner);

        let feature_spec = "Build a user login system";

        let output = harness.run(feature_spec).await.unwrap();

        assert!(output.contains("Sprint 1"));
        assert!(output.contains("Task"));
    }
}

Codex CLI Usage Deep Dive

Installation & Configuration Commands

# Install globally
cargo install codex-cli --locked

# Setup API key
codex configure

# View current config
codex config show

Common commands for harness lifecycle management

# Initialize a new harness project
codex harness init my-project

# Run a specific agent with input file
codex agent run -a planner.agent -i docs/specs/feature_spec.md

# Start full harness run
codex harness run

# View harness logs
codex logs show -p my-project

# Update harness dependencies
codex harness update

Debugging and log inspection commands

# Enable verbose logging for harness
codex harness run --verbose

# Export harness logs to file
codex logs export -p my-project -o logs/output.log

# Inspect agent context window
codex context inspect -a planner.agent

IDE Integration: VS Code, Cursor, Windsurf

Installing GPT-5.3 Plugins

  • VS Code Marketplace: OpenAI GPT-5.3 Codex plugin
  • Cursor IDE: Native GPT-5.3 integration pre-bundled
  • Windsurf: Install windsurf-gpt extension

Setting up multi-agent workflows in IDE

  1. Open your harness project directory
  2. Load .agent files in the IDE workspace
  3. Use IDE commands to trigger agent runs (Ctrl+Shift+P > Codex: Run Agent)
  4. Visualize agent outputs and logs in sidebars
  5. Configure environment variables for API keys

Live debugging and interactive agent tuning

  • Use IDE breakpoint features on agent prompts
  • Modify prompt templates on-the-fly
  • Trigger evaluations and view Playwright MCP test results inline
  • Use inline annotations for agent suggestions


Leverage IDE-integrated GPT-5.3 models to iteratively refine agent behavior and harness policies.


Advanced Prompt Templates for Agent Harnesses

Below are 5 advanced prompt templates for harness agents, ready to copy-paste and customize.


1. Planner Agent: Sprint Contract Generation

You are a Planner Agent tasked with decomposing the following feature specification into actionable sprint contracts. Reference the AGENTS.md for detailed requirements.

Feature Spec:
{feature_spec}

Output JSON format:

{
  "sprints": [
    {
      "name": "Sprint 1",
      "tasks": [
        {
          "title": "Task title",
          "description": "Task detailed description",
          "acceptance_criteria": ["criterion 1", "criterion 2"]
        }
      ]
    }
  ]
}

2. Generator Agent: Code Scaffold Creation

You are a Generator Agent. Based on the sprint contract below, generate code scaffolds, configuration files, and initial test cases.

Sprint Contract:
{sprint_contract}

Ensure adherence to company coding standards per AGENTS.md.

Output: File tree structure with code snippets, in markdown.

3. Evaluator Agent: Automated Test Execution & Report

You are an Evaluator Agent responsible for running automated tests on generated code. Use Playwright MCP to execute tests and capture logs.

Test Suite:
{test_suite_files}

Output JSON with:
- status: pass/fail
- error_logs: string
- screenshot_path: string (if fail)
- remediation_suggestions: string

4. Bug Reproducer Agent

You are a Bug Reproducer Agent. Given a bug report, write steps and code to reproduce the bug reliably.

Bug Report:
{bug_report}

Output: Step-by-step reproduction instructions and minimal code snippet.

5. PR Generator Agent

You are a PR Generator Agent. Based on the fix applied, generate a detailed pull request description including issue references, change summary, and test coverage.

Fix Summary:
{fix_summary}

Output: Markdown formatted PR description.

Concrete Code Examples

1. Rust: Agent Harness Core Orchestration

use openai_agents_sdk::{Agent, Harness, Context};

#[tokio::main]
async fn main() {
    // Load agents
    let planner = Agent::load("agents/planner.agent").await.unwrap();
    let generator = Agent::load("agents/generator.agent").await.unwrap();
    let evaluator = Agent::load("agents/evaluator.agent").await.unwrap();

    // Initialize harness with agents
    let mut harness = Harness::new(vec![planner, generator, evaluator]);

    // Load initial context map
    let context_map = std::fs::read_to_string("AGENTS.md").unwrap();

    // Provide feature specification
    let feature_spec = std::fs::read_to_string("docs/specs/new_feature.md").unwrap();

    // Inject context with progressive disclosure
    let context = Context::new()
        .with_base_context(context_map)
        .with_input(feature_spec);

    // Run planner to get sprint contracts
    let sprint_contracts = harness.run_agent("planner", context.clone()).await.unwrap();

    // Run generator to produce code artifacts
    let code_output = harness.run_agent("generator", sprint_contracts).await.unwrap();

    // Run evaluator to validate code
    let evaluation = harness.run_agent("evaluator", code_output).await.unwrap();

    println!("Evaluation Result: {:?}", evaluation);
}

2. Python: Planner-Generator-Evaluator Loop

from openai_agents_sdk import Agent, Harness

async def main():
    planner = await Agent.load('agents/planner.agent')
    generator = await Agent.load('agents/generator.agent')
    evaluator = await Agent.load('agents/evaluator.agent')

    harness = Harness([planner, generator, evaluator])

    with open('AGENTS.md') as f:
        base_context = f.read()

    with open('docs/specs/feature_spec.md') as f:
        feature_spec = f.read()

    # Run planner
    sprint_contracts = await harness.run_agent('planner', base_context + '\n' + feature_spec)

    # Run generator
    code_output = await harness.run_agent('generator', sprint_contracts)

    # Run evaluator
    eval_result = await harness.run_agent('evaluator', code_output)

    print('Evaluation:', eval_result)

import asyncio
asyncio.run(main())

3. TypeScript: Playwright MCP Integration for Evaluator Agent

import { PlaywrightMCP } from 'openai-agents-sdk';

export async function evaluateCode(testSuitePath: string) {
    const mcp = new PlaywrightMCP();

    const result = await mcp.runTest(testSuitePath);

    if (!result.passed) {
        return {
            status: 'fail',
            errorLogs: result.logs,
            screenshot: result.screenshotPath,
            remediation: 'Check failing assertions and fix implementation',
        };
    }

    return { status: 'pass' };
}

Pro Tips & Edge Cases

1. Managing Context Inflation

  • Problem: Agents receive too much context → increased token usage and hallucinations
  • Solution: Use AGENTS.md as a context map, enable progressive disclosure to feed only relevant chunks
  • Tip: Implement context pruning policies to discard stale or irrelevant data periodically.

2. Handling Agent Hallucinations & Verification Loops

  • Problem: Agents generate plausible but incorrect code or specs
  • Solution: Use the Evaluator with Playwright MCP to run real tests and validate outputs
  • Tip: Automate bug reproduction → fix → validate → PR creation cycle to close the loop.

3. Efficient Entropy Management

  • Problem: Too high randomness causes flaky agents; too low causes repetitive outputs
  • Solution: Tune temperature and top_p per agent role: low for Generator/Evaluator, moderate for Planner
  • Tip: Use multi-pass generation with different seeds for exploratory planning phases.


By mastering Agent Harness Engineering, you unlock the full power of GPT-5.3-Codex for complex, autonomous software projects — transforming AI from a writing assistant into a full software engineer partner.

🔒 Unlock the Full Coding Masterclass Library

This is a premium deep-dive module. Register for free to access all 7 parts, downloadable templates, and the complete prompt library.

Register Free & Access Now →


Subscribe
& Get free 25000++ Prompts across 41+ Categories

Sign up to receive awesome content in your inbox, every Week.

More on this