ChatGPT Coding Masterclass Part 5: 10 Real-World Coding Projects with ChatGPT + Codex

March 26, 2026

ChatGPT Coding Masterclass Series

Part 1: ChatGPT for Coding Part 2: Prompt Engineering for Developers Part 3: What is Codex? The AI Coding A Part 4: Codex CLI Deep DivePart 5 (You are here)Part 6: Advanced Workflows Part 7: ChatGPT Coding Cheat Sheet + R

10 Real-World Coding Projects with ChatGPT and Codex

10 Real-World Coding Projects with ChatGPT + Codex — Part 5: Harnessing Agent Infrastructure for Long-Running AI Tasks

Welcome to Part 5 of the ChatGPT Coding Masterclass series. This module dives deep into building long-running AI-powered coding projects using the cutting-edge GPT-5.3-Codex, focusing on the critical concept of the Agent Harness — the operating system for AI agents.

Introduction: Why Agent Harnesses Matter
Theoretical Deep-Dive: Agent Harness Engineering
What is an Agent Harness?
Context Engineering Fundamentals
Architectural Constraints & Entropy Management
The 3-Agent Pattern: Planner, Generator, Evaluator
Step-by-Step Implementation Guide
Setting up the Codex CLI environment
Creating your first agent harness project
Implementing AGENTS.md for Progressive Disclosure
Developing the Planner, Generator, Evaluator agents
Integrating Playwright MCP for automated QA
OpenAI Agents SDK and Testing Harnesses
SDK Architecture Overview
Using the SDK to build and orchestrate agents
Writing test harnesses to verify harness logic
Codex CLI Usage Deep Dive
Installation & configuration commands
Common commands for harness lifecycle management
Debugging and log inspection commands
IDE Integration: VS Code, Cursor, Windsurf
Installing GPT-5.3 plugins
Setting up multi-agent workflows in IDE
Live debugging and interactive agent tuning
Advanced Prompt Templates for Agent Harnesses
5 ready-to-use templates with placeholders
Concrete Code Examples
Rust: Agent harness core orchestration
Python: Planner-Generator-Evaluator loop
TypeScript: Playwright MCP integration
Pro Tips & Edge Cases
Common pitfalls in context inflation
Handling agent hallucinations and verification loops
Efficient entropy management for multi-agent systems

Introduction: Why Agent Harnesses Matter

In 2026, the frontier of AI-assisted software engineering has evolved beyond simple question-answering or code generation. The new paradigm revolves around long-lived, autonomous AI agents capable of managing complex workflows, debugging, and continuous improvement cycles without human intervention.

This is where Agent Harnesses come in — the vital infrastructure layer that:

Constrains AI models to prevent runaway generation
Informs agents by carefully engineering context windows
Verifies and corrects agent outputs through structured feedback loops

Think of the Agent Harness as the operating system for AI agents: GPT-5.3-Codex is the CPU, the conversation context is RAM, and the harness is the OS managing task scheduling, memory, and security.

Theoretical Deep-Dive: Agent Harness Engineering

What is an Agent Harness?

An Agent Harness is a software infrastructure wrapping one or more AI models (agents) to:

Manage context injection and progressive disclosure of knowledge
Enforce architectural constraints (e.g., token limits, API rate limits)
Implement entropy management to reduce randomness and improve determinism
Orchestrate multi-agent pipelines with Planner-Generator-Evaluator patterns
Maintain state persistence for long-running tasks

This lets agents operate like software processes, each with defined roles, inputs, and outputs.

Context Engineering Fundamentals

The power of GPT-5.3 lies in its enormous context window (100k+ tokens), but how you fill that context determines agent effectiveness.

Agent Harnesses use context engineering to:

Start with minimal seed context (e.g., AGENTS.md) acting as a map to deeper documentation or code
Apply progressive disclosure, where agents are only given small, relevant context slices at a time, reducing noise and hallucination
Include metadata annotations to tag context chunks with versioning, timestamps, or semantic roles (e.g., spec, test cases, bug reports)
Define context refresh policies to update or prune context as the agent progresses

This approach is vital for long-running agents managing evolving codebases or specs.

Architectural Constraints & Entropy Management

Architectural constraints are rules baked into the harness to:

Limit token usage per agent call
Enforce maximum call frequency (rate limits, cooldowns)
Define strict input/output schemas (via JSON schemas or protobufs)
Restrict allowed APIs or file system access

Entropy management refers to controlling randomness during generation (temperature, top_p) and output validation:

Use low temperature (0.0 – 0.3) for deterministic tasks like code generation
Introduce stochasticity only during exploratory phases (e.g., brainstorming)
Implement multi-step verification loops to catch hallucinations or errors
Use Evaluators with Playwright MCP (Model-Checking Protocol) to run real tests and provide feedback

Together, these ensure safe, reliable, and auditable AI agent operation.

The 3-Agent Pattern: Planner, Generator, Evaluator

A powerful architectural pattern in agent harnesses is the Planner-Generator-Evaluator (PGE) triad:

Agent Role	Responsibilities	Outputs
Planner	Expands high-level specs into detailed tasks or sprints	Sprint Contracts (task breakdowns)
Generator	Creates code or artifacts based on sprint contracts	Code, configs, tests
Evaluator	Runs tests, reproduces bugs, validates output, generates QA reports	Validation results, bug repro videos, PR suggestions

This triad runs in a closed feedback loop, enabling autonomous bug reproduction & fixing.

Use the PGE pattern to achieve full agent autonomy and continuous integration within your harness.

Step-by-Step Implementation Guide

1. Setting up the Codex CLI environment

The Rust-based Codex CLI is your command-line gateway to GPT-5.3-Codex and agent harness orchestration.

# Install Codex CLI via Cargo (Rust's package manager)
cargo install codex-cli

# Verify installation
codex --version
# Expected output:
# codex-cli 1.2.0 (GPT-5.3-Codex support)

Configure your OpenAI credentials:

export OPENAI_API_KEY="sk-xxxxxx-your-token-xxxxxx"
codex configure
# Follow interactive prompt to confirm setup

2. Creating your first agent harness project

# Initialize a new harness project scaffold
codex harness init my-ai-project

cd my-ai-project

# Directory structure overview:
# ├── agents/
# │   ├── planner.agent
# │   ├── generator.agent
# │   └── evaluator.agent
# ├── AGENTS.md
# ├── harness.toml
# └── tests/

3. Implementing AGENTS.md for Progressive Disclosure

AGENTS.md acts as the context map injected into the harness context for agents to reference source-of-truth documents:

# AGENTS.md - Agent Context Map

## Overview
This file maps agents to their authoritative sources.

- Planner: `docs/specs/feature_spec.md`
- Generator: `src/`
- Evaluator: `tests/e2e/`

## References
- Design Doc: `docs/design_doc.md`
- Bug Reports: `logs/bug_reports.md`
- Coding Standards: `docs/coding_standards.md`

In harness.toml, configure progressive disclosure:

[context]
base_context = ["AGENTS.md"]
progressive_disclosure = true
max_context_tokens = 8000

4. Developing the Planner, Generator, Evaluator agents

Each agent is implemented as a .agent file with prompt templates and role-specific logic.

Example planner.agent snippet:

# Planner Agent Prompt

You are the planning agent. Your job is to break down the following feature specification into sprint contracts.
Use the AGENTS.md map to find detailed specs.

Feature Spec:
{feature_spec}

Output format:
- Sprint Name
- Tasks (with descriptions and acceptance criteria)

5. Integrating Playwright MCP for automated QA

The Evaluator uses Playwright MCP to run integration tests and validate code outputs.

Example integration in evaluator.agent:

import { PlaywrightMCP } from 'openai-agents-sdk';

async function runEvaluation() {
  const mcp = new PlaywrightMCP();

  const testResult = await mcp.runTest('tests/e2e/my_feature.spec.ts');

  if (!testResult.passed) {
    return {
      status: 'fail',
      logs: testResult.logs,
      screenshot: testResult.screenshotPath,
    };
  }
  return { status: 'pass' };
}

OpenAI Agents SDK and Testing Harnesses

SDK Architecture Overview

The OpenAI Agents SDK provides:

Agent classes to build and manage agent roles
Context management utilities for progressive disclosure
Orchestration tools to run multi-agent pipelines asynchronously
Testing harnesses to simulate agent inputs and verify outputs

Using the SDK to build and orchestrate agents

Example Rust snippet using the SDK:

use openai_agents_sdk::{Agent, Harness};

#[tokio::main]
async fn main() {
    let planner = Agent::new("planner.agent").await.unwrap();
    let generator = Agent::new("generator.agent").await.unwrap();
    let evaluator = Agent::new("evaluator.agent").await.unwrap();

    let mut harness = Harness::new(vec![planner, generator, evaluator]);

    let feature_spec = std::fs::read_to_string("docs/specs/feature_spec.md").unwrap();

    // Run the planner
    let sprint_contracts = harness.run_agent("planner", feature_spec).await.unwrap();

    // Pass contracts to generator
    let generated_code = harness.run_agent("generator", sprint_contracts).await.unwrap();

    // Evaluate generated code
    let eval_result = harness.run_agent("evaluator", generated_code).await.unwrap();

    println!("Evaluation Status: {:?}", eval_result);
}

Writing test harnesses to verify harness logic

Use the SDK’s test harness utilities to simulate agents and verify state transitions:

#[cfg(test)]
mod tests {
    use super::*;
    use openai_agents_sdk::test_harness::*;

    #[tokio::test]
    async fn test_planner_generates_sprints() {
        let planner = Agent::new("planner.agent").await.unwrap();
        let harness = TestHarness::new(planner);

        let feature_spec = "Build a user login system";

        let output = harness.run(feature_spec).await.unwrap();

        assert!(output.contains("Sprint 1"));
        assert!(output.contains("Task"));
    }
}

Codex CLI Usage Deep Dive

Installation & Configuration Commands

# Install globally
cargo install codex-cli --locked

# Setup API key
codex configure

# View current config
codex config show

Common commands for harness lifecycle management

# Initialize a new harness project
codex harness init my-project

# Run a specific agent with input file
codex agent run -a planner.agent -i docs/specs/feature_spec.md

# Start full harness run
codex harness run

# View harness logs
codex logs show -p my-project

# Update harness dependencies
codex harness update

Debugging and log inspection commands

# Enable verbose logging for harness
codex harness run --verbose

# Export harness logs to file
codex logs export -p my-project -o logs/output.log

# Inspect agent context window
codex context inspect -a planner.agent

IDE Integration: VS Code, Cursor, Windsurf

Installing GPT-5.3 Plugins

VS Code Marketplace: OpenAI GPT-5.3 Codex plugin
Cursor IDE: Native GPT-5.3 integration pre-bundled
Windsurf: Install windsurf-gpt extension

Setting up multi-agent workflows in IDE

Open your harness project directory
Load .agent files in the IDE workspace
Use IDE commands to trigger agent runs (Ctrl+Shift+P > Codex: Run Agent)
Visualize agent outputs and logs in sidebars
Configure environment variables for API keys

Live debugging and interactive agent tuning

Use IDE breakpoint features on agent prompts
Modify prompt templates on-the-fly
Trigger evaluations and view Playwright MCP test results inline
Use inline annotations for agent suggestions

Leverage IDE-integrated GPT-5.3 models to iteratively refine agent behavior and harness policies.

Advanced Prompt Templates for Agent Harnesses

Below are 5 advanced prompt templates for harness agents, ready to copy-paste and customize.

1. Planner Agent: Sprint Contract Generation

You are a Planner Agent tasked with decomposing the following feature specification into actionable sprint contracts. Reference the AGENTS.md for detailed requirements.

Feature Spec:
{feature_spec}

Output JSON format:

{
  "sprints": [
    {
      "name": "Sprint 1",
      "tasks": [
        {
          "title": "Task title",
          "description": "Task detailed description",
          "acceptance_criteria": ["criterion 1", "criterion 2"]
        }
      ]
    }
  ]
}

2. Generator Agent: Code Scaffold Creation

You are a Generator Agent. Based on the sprint contract below, generate code scaffolds, configuration files, and initial test cases.

Sprint Contract:
{sprint_contract}

Ensure adherence to company coding standards per AGENTS.md.

Output: File tree structure with code snippets, in markdown.

3. Evaluator Agent: Automated Test Execution & Report

You are an Evaluator Agent responsible for running automated tests on generated code. Use Playwright MCP to execute tests and capture logs.

Test Suite:
{test_suite_files}

Output JSON with:
- status: pass/fail
- error_logs: string
- screenshot_path: string (if fail)
- remediation_suggestions: string

4. Bug Reproducer Agent

You are a Bug Reproducer Agent. Given a bug report, write steps and code to reproduce the bug reliably.

Bug Report:
{bug_report}

Output: Step-by-step reproduction instructions and minimal code snippet.

5. PR Generator Agent

You are a PR Generator Agent. Based on the fix applied, generate a detailed pull request description including issue references, change summary, and test coverage.

Fix Summary:
{fix_summary}

Output: Markdown formatted PR description.

Concrete Code Examples

1. Rust: Agent Harness Core Orchestration

use openai_agents_sdk::{Agent, Harness, Context};

#[tokio::main]
async fn main() {
    // Load agents
    let planner = Agent::load("agents/planner.agent").await.unwrap();
    let generator = Agent::load("agents/generator.agent").await.unwrap();
    let evaluator = Agent::load("agents/evaluator.agent").await.unwrap();

    // Initialize harness with agents
    let mut harness = Harness::new(vec![planner, generator, evaluator]);

    // Load initial context map
    let context_map = std::fs::read_to_string("AGENTS.md").unwrap();

    // Provide feature specification
    let feature_spec = std::fs::read_to_string("docs/specs/new_feature.md").unwrap();

    // Inject context with progressive disclosure
    let context = Context::new()
        .with_base_context(context_map)
        .with_input(feature_spec);

    // Run planner to get sprint contracts
    let sprint_contracts = harness.run_agent("planner", context.clone()).await.unwrap();

    // Run generator to produce code artifacts
    let code_output = harness.run_agent("generator", sprint_contracts).await.unwrap();

    // Run evaluator to validate code
    let evaluation = harness.run_agent("evaluator", code_output).await.unwrap();

    println!("Evaluation Result: {:?}", evaluation);
}

2. Python: Planner-Generator-Evaluator Loop

from openai_agents_sdk import Agent, Harness

async def main():
    planner = await Agent.load('agents/planner.agent')
    generator = await Agent.load('agents/generator.agent')
    evaluator = await Agent.load('agents/evaluator.agent')

    harness = Harness([planner, generator, evaluator])

    with open('AGENTS.md') as f:
        base_context = f.read()

    with open('docs/specs/feature_spec.md') as f:
        feature_spec = f.read()

    # Run planner
    sprint_contracts = await harness.run_agent('planner', base_context + '\n' + feature_spec)

    # Run generator
    code_output = await harness.run_agent('generator', sprint_contracts)

    # Run evaluator
    eval_result = await harness.run_agent('evaluator', code_output)

    print('Evaluation:', eval_result)

import asyncio
asyncio.run(main())

3. TypeScript: Playwright MCP Integration for Evaluator Agent

import { PlaywrightMCP } from 'openai-agents-sdk';

export async function evaluateCode(testSuitePath: string) {
    const mcp = new PlaywrightMCP();

    const result = await mcp.runTest(testSuitePath);

    if (!result.passed) {
        return {
            status: 'fail',
            errorLogs: result.logs,
            screenshot: result.screenshotPath,
            remediation: 'Check failing assertions and fix implementation',
        };
    }

    return { status: 'pass' };
}

Pro Tips & Edge Cases

1. Managing Context Inflation

Problem: Agents receive too much context → increased token usage and hallucinations
Solution: Use AGENTS.md as a context map, enable progressive disclosure to feed only relevant chunks
Tip: Implement context pruning policies to discard stale or irrelevant data periodically.

2. Handling Agent Hallucinations & Verification Loops

Problem: Agents generate plausible but incorrect code or specs
Solution: Use the Evaluator with Playwright MCP to run real tests and validate outputs
Tip: Automate bug reproduction → fix → validate → PR creation cycle to close the loop.

3. Efficient Entropy Management

Problem: Too high randomness causes flaky agents; too low causes repetitive outputs
Solution: Tune temperature and top_p per agent role: low for Generator/Evaluator, moderate for Planner
Tip: Use multi-pass generation with different seeds for exploratory planning phases.

By mastering Agent Harness Engineering, you unlock the full power of GPT-5.3-Codex for complex, autonomous software projects — transforming AI from a writing assistant into a full software engineer partner.

🔒 Unlock the Full Coding Masterclass Library

This is a premium deep-dive module. Register for free to access all 7 parts, downloadable templates, and the complete prompt library.

ChatGPT Coding Masterclass Series

Markos Symeonides

ChatGPT Coding Masterclass Part 7: ChatGPT Coding Cheat Sheet + Resource Library

Posted in AI Guides & Tutorials, Tutorial

Reading Time: 10 minutes

ChatGPT Coding Masterclass Series Part 1: ChatGPT for CodingPart 2: Prompt Engineering for DevelopersPart 3: What is Codex? The AI Coding APart 4: Codex CLI Deep DivePart 5: 10 Real-World Coding Projects Part 6: Advanced WorkflowsPart 7 (You are here)…

ChatGPT Coding Masterclass Part 6: Advanced Workflows with Multi-Agent Coding Pipelines

Posted in AI Guides & Tutorials, Tutorial

Reading Time: 10 minutes

ChatGPT Coding Masterclass Part 4: Codex CLI Deep Dive — Terminal-First AI Coding with GPT-5.3-Codex

Posted in AI Guides & Tutorials, Tutorial

Reading Time: 10 minutes

ChatGPT Coding Masterclass Series Part 1: ChatGPT for CodingPart 2: Prompt Engineering for DevelopersPart 3: What is Codex? The AI Coding APart 4 (You are here)Part 5: 10 Real-World Coding Projects Part 6: Advanced WorkflowsPart 7: ChatGPT Coding Cheat Sheet…

ChatGPT Coding Masterclass Part 3: What is Codex? The AI Coding Agent Explained

Posted in AI Guides & Tutorials, Tutorial

Reading Time: 9 minutes

ChatGPT Coding Masterclass Series Part 1: ChatGPT for CodingPart 2: Prompt Engineering for DevelopersPart 3 (You are here)Part 4: Codex CLI Deep DivePart 5: 10 Real-World Coding Projects Part 6: Advanced WorkflowsPart 7: ChatGPT Coding Cheat Sheet + R Codex…

ChatGPT Coding Masterclass Part 5: 10 Real-World Coding Projects with ChatGPT + Codex

ChatGPT Coding Masterclass Series

10 Real-World Coding Projects with ChatGPT + Codex — Part 5: Harnessing Agent Infrastructure for Long-Running AI Tasks

Table of Contents

Introduction: Why Agent Harnesses Matter

Theoretical Deep-Dive: Agent Harness Engineering

What is an Agent Harness?

Context Engineering Fundamentals

Architectural Constraints & Entropy Management

The 3-Agent Pattern: Planner, Generator, Evaluator

Step-by-Step Implementation Guide

1. Setting up the Codex CLI environment

2. Creating your first agent harness project

3. Implementing AGENTS.md for Progressive Disclosure

4. Developing the Planner, Generator, Evaluator agents

5. Integrating Playwright MCP for automated QA

OpenAI Agents SDK and Testing Harnesses

SDK Architecture Overview

Using the SDK to build and orchestrate agents

Writing test harnesses to verify harness logic

Codex CLI Usage Deep Dive

Installation & Configuration Commands

Common commands for harness lifecycle management

Debugging and log inspection commands

IDE Integration: VS Code, Cursor, Windsurf

Installing GPT-5.3 Plugins

Setting up multi-agent workflows in IDE

Live debugging and interactive agent tuning

Advanced Prompt Templates for Agent Harnesses

1. Planner Agent: Sprint Contract Generation

2. Generator Agent: Code Scaffold Creation

3. Evaluator Agent: Automated Test Execution & Report

4. Bug Reproducer Agent

5. PR Generator Agent

Concrete Code Examples

1. Rust: Agent Harness Core Orchestration

2. Python: Planner-Generator-Evaluator Loop

3. TypeScript: Playwright MCP Integration for Evaluator Agent

Pro Tips & Edge Cases

1. Managing Context Inflation

2. Handling Agent Hallucinations & Verification Loops

3. Efficient Entropy Management

🔒 Unlock the Full Coding Masterclass Library

ChatGPT Coding Masterclass Series

Subscribe & Get free 25000++ Prompts across 41+ Categories

More on this