The Complete Guide to OpenAI Codex Modes: Plan, Execute, and Review — Choosing the Right Mode for Every Task

The Complete Guide to OpenAI Codex Modes: Plan, Execute, and Review — Choosing the Right Mode for Every Task

The Complete Guide to OpenAI Codex Modes: Plan, Execute, and Review — Choosing the Right Mode for Every Task

Table of Contents

  1. Introduction: Why Codex Modes Matter
  2. How Modes Work Across Desktop App, IDE Extensions, and CLI
  3. Plan Mode Deep Dive: From Objectives to a Concrete Task Graph
  4. Execute Mode Deep Dive: Safe, Autonomous Implementation
  5. Review Mode Deep Dive: Precise, Actionable Code Review
  6. Switching Between Modes Efficiently
  7. Best Practices for Each Mode
  8. Real-World Workflows and End-to-End Examples
  9. Integration Patterns and Automation
  10. Governance, Compliance, and Safety
  11. Troubleshooting and Optimization
  12. Capability Matrix and Decision Guide
  13. Conclusion and Next Steps

Introduction: Why Codex Modes Matter

Software development happens in loops: you plan what to do, you implement it, and you review the result. Codex Modes make that loop explicit and tool-supported. Plan mode helps you shape a precise objective and task decomposition. Execute mode performs the change safely and deterministically, with guardrails and observability. Review mode inspects diffs, comments on risks, and proposes patches to raise quality. Together, they provide a shared vocabulary and an operational system that scales from quick scripts to organization-wide initiatives.

Three principles underpin Codex Modes:

  • Clarity over cleverness: plan before you build, and make decisions inspectable.
  • Automation with accountability: autonomous execution must be observable, reversible, and policy-aware.
  • Quality by default: every change is reviewed against explicit rubrics and can be auto-remediated where safe.

This guide explains how to use Plan, Execute, and Review across the desktop app, popular IDE extensions, and the CLI. It goes beyond feature lists to pragmatic workflows, policy patterns, and code examples you can adapt. If you’re building a team-wide rollout strategy, pairing a senior developer with Codex as a co-implementer, or integrating AI review into CI, you’ll find concrete guidance here. For deeper context on eliciting high-signal instructions, see

For a deeper exploration of product management prompts for roadmap planning and PRDs, our comprehensive guide on 30 ChatGPT-5.5 Prompts for Product Managers provides detailed strategies, practical examples, and implementation patterns that complement the techniques discussed in this article.

.

How Modes Work Across Desktop App, IDE Extensions, and CLI

Codex Modes are consistent across three primary surfaces so teams can switch context without rewriting mental models:

  • Desktop app: a workspace for multi-file reasoning, long-form plans, and rich artifact previews.
  • IDE extensions: tight-in-editor operations, inline diffs, and code-lens controls that minimize context switching.
  • CLI: automation-first control plane for CI/CD, ChatOps, and headless batch flows.

At a glance:

  • Plan mode is optimized for gathering context, writing objectives, enumerating risks, and producing a machine- and human-readable task graph.
  • Execute mode is optimized for applying minimally-scoped changes in a branch or sandbox, running tests, and producing granular commits.
  • Review mode is optimized for structured, rubric-based review of staged diffs or pull requests, with explainable suggestions and auto-fixes.

Conceptually, think of Plan as producing a “contract” that Execute fulfills and Review validates. The same artifacts (objective, constraints, acceptance criteria, task graph) flow through all surfaces.

The Complete Guide to OpenAI Codex Modes: Plan, Execute, and Review — Choosing the Right Mode for Every Task - Section 1

Desktop App

The desktop app exposes modes via a mode switcher along the top of the workspace. Switching modes preserves context (your open repository, current branch, and working documents). Plan mode exposes a document canvas and a task graph panel; Execute mode shows a change queue and test dashboard; Review mode shows diff panes and rule insights. Keyboard-first workflows are supported with global shortcuts (e.g., Cmd/Ctrl+1 for Plan, +2 for Execute, +3 for Review) and command palette entries.

IDE Extensions

IDE extensions present modes through a status bar toggle and command palette actions. Plan mode allows drafting objectives as code fences or .plan files; Execute mode applies edits via code actions and a preview panel; Review mode attaches to your VCS to annotate diffs and PRs inline. Mode transitions are designed to be frictionless: accept a plan bullet to generate a local change, or promote a selected diff into a review thread with one command.

CLI

The CLI subcommands map 1:1 to modes and can be combined into pipelines. Use the CLI for reproducibility, automation, and auditability. Below are typical commands you’ll use.

# Initialize Codex in a repo (creates .codex/ and default config)
codex init

# Plan mode: create or update a plan artifact
codex plan --objective "Add passwordless login with magic links" \
           --context src/auth,tests/auth \
           --acceptance "Login via email within 60s; lockout after 5 failures" \
           --risks "Phishing, token replay" \
           --out .codex/plan.yaml

# Execute mode: apply the plan in an isolated branch and run tests
codex exec --plan .codex/plan.yaml \
           --branch feature/auth-magic-links \
           --sandbox docker \
           --run-tests

# Review mode: review staged diffs or a PR URL with explicit rules
codex review --source diff --ruleset .codex/rules/security.yml \
             --out .codex/review-report.md

# Switch modes in an interactive shell
codex switch execute

All three surfaces honor a shared project configuration (e.g., .codex/config.yaml), and each action emits machine-readable logs for traceability.

Plan Mode Deep Dive: From Objectives to a Concrete Task Graph

Plan mode transforms a fuzzy idea into a precise, testable change request. It captures objectives, scope boundaries, constraints, dependencies, non-functional requirements, risks, and acceptance tests. The output is a task graph: prioritized, minimally-coupled tasks with clear completion criteria.

Core Inputs

  • Objective: a crisp statement of what to achieve, not how.
  • Context: source directories, docs, architectural decisions, and external dependencies to consider.
  • Constraints: coding standards, security policies, performance minima, and tech stack limitations.
  • Acceptance criteria: observable, testable conditions that demonstrate success.
  • Risks and mitigations: known pitfalls, attack vectors, migration impacts, and recovery plans.
  • Dependencies: services, libraries, or teams that influence design or delivery order.

Plan Artifact Structure

Plans should be both human-friendly and machine-parseable. YAML is a good fit for readability, with optional JSON for automated pipelines. Below is a representative plan.

# .codex/plan.yaml
version: 1
objective: "Introduce passwordless login via email magic links"
context:
  repo: "."
  include:
    - "src/auth"
    - "src/email"
    - "tests/auth"
  exclude:
    - "legacy/"
constraints:
  language: ["TypeScript"]
  frameworks: ["Express", "Jest"]
  security:
    - "JWT tokens must be signed with rotating keys"
    - "Magic links expire within 60 seconds"
    - "Rate-limit requests by IP and email"
non_functional:
  performance:
    p50_latency_ms: 150
    p95_latency_ms: 300
  observability:
    - "Emit audit events for token issuance and verification"
acceptance:
  - "User can request a magic link from /auth/magic"
  - "Link is delivered via SES sandbox in dev; provider in prod"
  - "Clicking link logs the user in and sets a secure, HttpOnly cookie"
  - "After 5 failed verifications per hour, requests are blocked for 1 hour"
risks:
  - id: "R1"
    description: "Token replay within validity window"
    mitigation: "Single-use nonce stored server-side; invalidate on use"
  - id: "R2"
    description: "Email deliverability in dev/test"
    mitigation: "Fallback to console link for local; SES sandbox in staging"
tasks:
  - id: "T1"
    title: "Design token format and storage"
    type: "design"
    deps: []
    done_when:
      - "Spec reviewed"
      - "Test cases enumerated"
  - id: "T2"
    title: "Implement /auth/magic request endpoint"
    type: "code"
    deps: ["T1"]
    done_when:
      - "Endpoint issues token and sends email"
      - "Unit tests pass"
  - id: "T3"
    title: "Implement /auth/magic/verify endpoint"
    type: "code"
    deps: ["T1"]
    done_when:
      - "Single-use verification; sets cookie"
      - "Unit and integration tests pass"
  - id: "T4"
    title: "Add rate limiting"
    type: "code"
    deps: ["T2","T3"]
    done_when:
      - "Exceeded attempts blocked"
      - "Tests cover edge cases"
  - id: "T5"
    title: "Security review and fuzz tests"
    type: "review"
    deps: ["T2","T3","T4"]
    done_when:
      - "All high-risk findings addressed"
deliverables:
  - "plan.yaml"
  - "api-contract.md"
  - "threat-model.md"
branching:
  strategy: "feature-branch"
  branch_name: "feature/auth-magic-links"
test_plan:
  unit: true
  integration: true
  fuzz: ["token parser", "email payload parser"]
budget:
  timebox_hours: 6
  compute_budget_tokens: 200000

Techniques for High-Signal Plans

  • Set constraints early: specify languages, frameworks, and policies up front to bound the search space.
  • Write acceptance tests as you would for a PR description; they double as automated test outlines.
  • Prefer a DAG of tasks: codify deps (“T3 depends on T1”) so Execute can parallelize safely.
  • Explicit non-functional requirements: latency, footprint, and observability guide implementation choices.
  • List risks with IDs to anchor discussion and to bind mitigations to review checks.

Plan Mode in Practice

In the desktop app, begin with a plan canvas. Paste relevant context (file paths, API docs), draft objectives, and acceptance criteria. Use the built-in graph panel to validate your task DAG. In an IDE, create .codex/plan.yaml or a plan.md scaffold and let the extension propose a DAG given headings and checkboxes. In CLI-first workflows, you can seed a plan from a prompt and refine iteratively.

# Seed a plan from an objective and files
codex plan --objective "Migrate image processing to async jobs" \
           --context src/jobs,src/api \
           --acceptance "No change in API response schema; p95 latency < 300ms" \
           --out .codex/plan.yaml

# Refine with additional constraints and risks
codex plan --in .codex/plan.yaml \
           --constraints security:cis_level=1,perf:p95=300 \
           --risk "R3: job queue backpressure" \
           --write

Reusable Templates

Teams benefit from templated plans for common change types: feature additions, bugfixes, refactors, security patches, and migrations. Store templates under .codex/templates/ and reference them by name.

# .codex/templates/security-patch.yaml
version: 1
objective: ""
context: { repo: "." }
constraints:
  security:
    - "No use of deprecated crypto"
    - "Input validation at trust boundaries"
acceptance:
  - "Reproducer test fails before, passes after"
risks:
  - id: "SEC-1" description: "Regression in auth flow"
tasks:
  - id: "H1" title: "Localize and reproduce vulnerability" type: "analysis"
  - id: "H2" title: "Implement minimal-risk patch" type: "code" deps: ["H1"]
  - id: "H3" title: "Add regression tests" type: "test" deps: ["H2"]
  - id: "H4" title: "Security review" type: "review" deps: ["H2","H3"]
# Instantiate a template
codex plan --template security-patch \
           --objective "Fix SSRF in image proxy" \
           --context src/proxy,tests/proxy \
           --out .codex/plan.yaml

Measuring Plan Quality

  • Coverage: percentage of changed files referenced in context.
  • Completeness: tasks have done_when criteria and dependency closure.
  • Testability: acceptance criteria are executable as tests.
  • Risk alignment: each risk has a mitigation and a corresponding review check.
  • Budget fit: estimated effort aligns with timebox and compute budget.

A good plan cuts cycle time in Execute and reduces rework in Review. Investing an extra 10 minutes to tighten acceptance criteria typically saves hours later. For broader patterns, see

For a deeper exploration of automated code review with Codex agents, our comprehensive guide on How to Use OpenAI Codex for Automated Code Review provides detailed strategies, practical examples, and implementation patterns that complement the techniques discussed in this article.

.

Execute Mode Deep Dive: Safe, Autonomous Implementation

Execute mode is where code changes happen. The goal is autonomy without surprises: changes are isolated, minimal, well-tested, and reversible. Execute consumes a plan, opens a sandbox or branch, applies changes task-by-task, and emits progress as commits and logs.

Execution Scope and Safety

  • Branch isolation: all edits occur in a feature branch unless configured otherwise.
  • Sandboxing: filesystem and process access are constrained to approved tools and directories; prefer containerized sandboxes.
  • Tool whitelisting: only declared tools (e.g., git, npm, pytest) are allowed; others are denied by policy.
  • Secrets hygiene: credentials are sourced from a secure store, never written to disk or logs.
  • Budget enforcement: time and token budgets prevent runaway sessions.

Execution Configuration

Define policies in .codex/config.yaml and .codex/policy.yaml. The config declares defaults; the policy enforces guardrails.

# .codex/config.yaml
project:
  name: "web-auth"
  default_branch: "main"
  default_mode: "plan"
exec:
  sandbox: "docker"
  branch_prefix: "feature/"
  allowed_tools:
    - "git"
    - "node"
    - "npm"
    - "jest"
    - "tsc"
  test:
    run: ["npm test", "npm run lint"]
    require_green: true
  commit:
    granularity: "task"
    message_template: "feat(auth): {task_title} ({task_id})"
budgets:
  time_minutes: 45
  tokens: 150000
# .codex/policy.yaml
policies:
  fs_access:
    allow:
      - "src/**"
      - "tests/**"
      - "package.json"
      - "tsconfig.json"
    deny:
      - "**/*.pem"
      - "**/.env*"
  net_access:
    allow: ["registry.npmjs.org"]
    deny: ["*"]
  commands:
    allow: ["git *", "npm *", "node *", "jest *", "tsc *"]
    deny: ["curl *", "wget *", "ssh *"]
  secrets:
    sources: ["env:AWS_SES_KEY", "env:JWT_SECRET"]
    never_log: true
  review_gates:
    required:
      - "tests_green"
      - "security_scan"

Execution Flow

  1. Resolve the plan and validate dependencies.
  2. Create or switch to an isolated branch or sandbox.
  3. For each task, apply minimal edits, update or add tests, and run the declared test suite.
  4. If all checks pass, commit changes with a task-scoped message; if not, remediate or pause for human input.
  5. When the plan is complete, open a PR with a summary report and link to artifacts.
# Execute with a plan, containerized, and commit per task
codex exec --plan .codex/plan.yaml \
           --branch feature/auth-magic-links \
           --sandbox docker \
           --commit-per task \
           --run-tests \
           --open-pr

Observability and Logs

Every action emits structured logs that help you audit and debug. Use codex logs and codex trace to inspect execution.

# Tail live logs
codex logs --follow

# Show a detailed trace for a specific task
codex trace --task T3 --format json | jq '.events[] | {ts, action, file, outcome}'

Commit Granularity and Messages

Commit granularity should match task boundaries, and messages should follow your conventional commit standard.

# Example commits produced by Execute
feat(auth): Design token format and storage (T1)

feat(auth): Implement /auth/magic request endpoint (T2)

feat(auth): Implement /auth/magic/verify endpoint (T3)

chore(auth): Add rate limiting with tests (T4)

Test-First and Test-Always

Execute mode adds tests if they’re missing and updates them when APIs evolve. It also runs formatters and linters where configured. Treat failing tests as a hard stop; they either signal an implementation issue or a gap in the plan that requires revision.

# Configure test commands and thresholds
codex exec --plan .codex/plan.yaml \
           --test-cmd "npm test" \
           --coverage-threshold 80 \
           --lint-cmd "npm run lint" \
           --format-cmd "npm run format"

Policy Gates and Approvals

Policies define checks that must pass before changes can exit Execute or merge. You can include human approval gates for high-risk changes.

# Require a human approval for high-risk tasks
codex exec --plan .codex/plan.yaml \
           --require-approval R1 \
           --approver "[email protected]"

Sandboxing Strategies

  • Local container: fastest iteration using Docker or Podman with a bind-mounted repo.
  • Remote ephemeral VM: consistent environments with pre-baked caches for toolchains.
  • Hermetic build: Nix- or Bazel-backed sandboxes for bit-for-bit reproducibility.
# Use a remote sandbox profile
codex exec --plan .codex/plan.yaml \
           --sandbox remote:linux-amd64-20gb \
           --cache restore

Rollback and Recovery

Every change is reversible. Execute mode supports an undo stack and can revert the last task or the entire plan application.

# Revert the last task's changes
codex exec --revert --task T3

# Abort entire plan execution
codex exec --abort --soft # keep working tree for inspection

Collaborative Execution

For large efforts, distribute tasks across contributors and Execute instances. Each instance claims a task, applies changes, and pushes a commit that references the task ID. The plan’s DAG prevents conflicts, and the branch strategy aggregates work into a coordinated PR.

Review Mode Deep Dive: Precise, Actionable Code Review

Review mode is the quality gate. It reasons over diffs, inspects code against rules and risks, and produces actionable comments with suggested patches. It integrates with your VCS and CI to run pre-merge checks and continuous sweeps on critical areas (e.g., authentication, cryptography, billing).

The Complete Guide to OpenAI Codex Modes: Plan, Execute, and Review — Choosing the Right Mode for Every Task - Section 2

Review Sources

  • Local diff: uncommitted or staged changes in your working tree.
  • Commits: a commit range (e.g., HEAD~3..HEAD) or a single commit.
  • Pull requests: remote PR URLs for GitHub, GitLab, or Bitbucket.
  • Directory sweep: targeted reviews of certain paths (e.g., src/security/**) on a schedule.
# Review staged diffs with a security ruleset
codex review --source diff \
             --ruleset .codex/rules/security.yml \
             --out .codex/review-report.md

# Review a GitHub PR
codex review --source pr --url https://github.com/org/repo/pull/123 \
             --ruleset .codex/rules/quality.yml \
             --comment --label "codex-review"

Rules, Rubrics, and Risk Ties

Rulesets describe what “good” looks like: security, performance, style, documentation, and test coverage. Tie rules to plan risks to guarantee that mitigations were implemented.

# .codex/rules/security.yml
rules:
  - id: "SEC-JWT-001"
    title: "JWT signing and verification"
    severity: "high"
    checks:
      - "No use of none algorithm"
      - "Verify issuer and audience"
  - id: "SEC-TOKEN-EXPIRY"
    title: "Magic link expiry within 60 seconds"
    severity: "high"
    checks:
      - "Token TTL <= 60 seconds"
tie_risks:
  - risk_id: "R1" enforce: ["SEC-TOKEN-EXPIRY"]

Outputs You Can Act On

Review mode generates a concise summary, line-level comments, and suggested patches for safe auto-fixes. Reports are rendered as Markdown, SARIF for security scanners, or inline PR comments.

# .codex/review-report.md (excerpt)
Summary
- 2 high, 1 medium findings
- 3 suggested patches auto-applicable

Findings
1) SEC-TOKEN-EXPIRY (high)
   src/auth/token.ts:42
   Issue: Token expiry set to 120 seconds (exceeds 60-second policy)
   Suggestion: Reduce to 60 seconds and add regression test

Patch
--- a/src/auth/token.ts
+++ b/src/auth/token.ts
@@ -40,7 +40,7 @@ export function issueMagicToken(email: string): string {
-  const ttl = 120 * 1000;
+  const ttl = 60 * 1000; // Policy: 60-second expiry
   ...
}

Integrating with CI and PR Workflows

Hook Review into CI to block merges on high-severity findings and to post comments automatically. Use labels or check runs to drive triage workflows.

# .github/workflows/codex-review.yml
name: Codex Review
on:
  pull_request:
    types: [opened, synchronize, reopened]
jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install Codex CLI
        run: |
          curl -sSL https://example.com/install-codex.sh | bash
      - name: Run Codex Review
        run: |
          codex review --source pr --url ${{ github.event.pull_request.html_url }} \
                       --ruleset .codex/rules/security.yml \
                       --sarif out.sarif \
                       --comment --label "codex-review"
      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: out.sarif

Reducing Noise, Increasing Signal

  • Deduplicate comments across commits; prefer one thread per issue.
  • Suppress findings outside changed hunks unless they amplify a risk tied to the plan.
  • Calibrate severities using your incident history: not all “security” checks are equal in practice.
  • Favor patches over prose: where safe, propose code; where risky, explain trade-offs and invite human input.

Human-in-the-Loop Patterns

Reviews aren’t about gatekeeping; they’re about elevating the change. Encourage reviewers to accept safe patches, discuss riskier alterations, and record decisions for future tooling to learn from.

# Example PR comment by Review mode
"Finding SEC-TOKEN-EXPIRY: expiry exceeds policy (60s).
Rationale: Short TTL reduces replay risk. Proposed patch reduces TTL to 60s and adds a unit test.
If you accept, I'll update the integration tests accordingly."

Switching Between Modes Efficiently

Mode switching should be rapid and context-preserving. The system is designed so you can evolve a plan into execution or review with minimal friction.

Desktop App

  • Mode switcher: select Plan, Execute, or Review from the top bar; context (repo, branch, open panes) persists.
  • Command palette: type “Plan: Create Task Graph,” “Execute: Run Next Task,” or “Review: Annotate Diff.”
  • Quick promote: highlight a plan task and choose “Execute This Task,” or pick a diff hunk and choose “Review This Change.”
  • Keyboard shortcuts: Cmd/Ctrl+1 (Plan), +2 (Execute), +3 (Review).

IDE Extensions

  • Status bar toggle: shows current mode; click to switch.
  • CodeLens: above plan headings or task list items, “Execute Task T3” appears; above diffs, “Open in Review.”
  • Context handoff: accepted plan checkboxes become Execute tasks; applied edits become Review subjects.

CLI

In CLI and automation, mode switching is explicit via subcommands. You can chain them in one invocation or use an interactive shell.

# Chain modes: plan -> execute -> review in one pipeline
codex plan --objective "Add feature flags" --out .codex/plan.yaml && \
codex exec --plan .codex/plan.yaml --branch feature/flags --run-tests && \
codex review --source diff --ruleset .codex/rules/quality.yml --out review.md

# Interactive shell with mode context
codex shell
> mode plan
> open .codex/plan.yaml
> mode execute
> run --task T2
> mode review
> review --source diff

Defaults and Contextual Modes

Set a project default mode and let heuristics promote you as needed. For example, opening a .plan.yaml suggests Plan mode; staging changes suggests Review mode. You can override heuristics anytime.

# Set default mode
codex config set project.default_mode execute

# Override for a single session
codex switch plan

Best Practices for Each Mode

Plan Mode Best Practices

  • Write unambiguous objectives: one sentence, one outcome.
  • Enumerate constraints early: language, frameworks, security, and performance.
  • Describe acceptance criteria as test assertions; link to existing tests if any.
  • Create a DAG, not a list: declare dependencies to enable safe parallelism.
  • Quantify non-functional goals: latency, throughput, memory.
  • Attach risks to rules: every risk must map to at least one review rule.
  • Set budgets: timebox and token caps, so execution remains predictable.
# Acceptance criteria as tests (pseudocode)
describe("Magic link login", () => {
  it("expires tokens after 60s", () => { ... });
  it("sets Secure, HttpOnly cookie", () => { ... });
  it("rate-limits after 5 failures/hour", () => { ... });
});

Execute Mode Best Practices

  • One task, one commit: preserves traceability and simplifies rollbacks.
  • Test-first changes for public APIs: write or update tests before refactors.
  • Small, iterative edits: avoid sweeping changes unless your test coverage is high.
  • Prefer auto-generated scaffolds but verify security- and perf-critical code manually.
  • Use sandboxes for risky tasks: crypto, parsing, or migrations benefit from hermetic runs.
  • Respect budgets: if you hit a budget cap, pause and refine the plan.
# Example: running only impacted tests for speed
codex exec --plan .codex/plan.yaml \
           --impacted-tests only \
           --test-cmd "jest --runTestsByPath $(codex impacted-tests list)"

Review Mode Best Practices

  • Start with scope: does the change match the plan’s objective and constraints?
  • Apply the rubric: evaluate security, correctness, performance, and maintainability.
  • Prefer patches to prose: where safe, propose code suggestions.
  • Reduce noise: group similar comments and avoid restating linter output.
  • Elevate severe issues: block on high-risk findings and suggest mitigations.
# Simple review rubric (YAML)
rubric:
  scope_alignment: ["Plan objective satisfied", "No unrelated changes"]
  correctness: ["All tests green", "Edge cases covered"]
  security: ["Secrets safe", "Inputs validated", "Authz enforced"]
  performance: ["No hot-path regressions", "Memory bounded"]
  maintainability: ["Docs updated", "Clear commit messages"]

Real-World Workflows and End-to-End Examples

Workflow 1: Deliver a New Feature with Guardrails

Scenario: Add passwordless login using magic links to an Express + TypeScript app. You’ll see the full Plan → Execute → Review loop.

Plan

codex plan --objective "Introduce passwordless login via email magic links" \
           --context src/auth,src/email,tests/auth \
           --acceptance "Login within 60s; secure HttpOnly cookie; rate limit" \
           --risks "Replay within TTL; spam; deliverability" \
           --out .codex/plan.yaml

Refine non-functional requirements and tie risks to rules. Ensure done_when criteria are concrete. Freeze the task graph before execution.

Execute

codex exec --plan .codex/plan.yaml \
           --branch feature/auth-magic-links \
           --sandbox docker \
           --run-tests --commit-per task

During T2, Execute scaffolds a route, generates a token issuer, and stubs SES integration in dev. For T3, it adds verification, sets a cookie, and updates tests. For T4, it wires rate limiting using a shared middleware. Tests are run after each task; failing coverage triggers remediation.

Review

codex review --source diff \
             --ruleset .codex/rules/security.yml \
             --out .codex/review-report.md \
             --apply-safe-patches

Findings include a TTL exceeding policy. Review proposes a patch; you accept it. The final PR includes the plan, execution logs, and the review report.

Workflow 2: Safe Refactor of a Core Module

Scenario: Refactor a JSON schema validator for performance without changing public APIs.

Plan

codex plan --objective "Refactor JSON validator to reduce p95 latency to < 100ms" \
           --context src/validator,tests \
           --constraints "No change to API surface" \
           --acceptance "p95 < 100ms; zero failing tests; memory delta < 10%" \
           --out .codex/plan.yaml

Execute

codex exec --plan .codex/plan.yaml \
           --branch refactor/validator-perf \
           --run-tests --perf-bench "npm run bench:validator -- --json out.json"

Execute measures baseline perf, applies changes like memoization and streaming parse, and benchmarks again. It logs deltas and halts if targets aren’t met.

Review

codex review --source diff --ruleset .codex/rules/performance.yml \
             --out .codex/review-perf.md

Review checks for algorithmic regressions, hot-path allocations, and ensures comments document trade-offs.

Workflow 3: Library Migration Across a Monorepo

Scenario: Migrate a deprecated HTTP client to a supported library across multiple packages.

Plan

codex plan --objective "Migrate request to axios across monorepo" \
           --context packages/**/src \
           --acceptance "No API regressions; tests green; perf stable" \
           --risks "Edge case differences; proxy settings" \
           --out .codex/plan.yaml

Execute

codex exec --plan .codex/plan.yaml \
           --branch chore/migrate-axios \
           --run-tests --parallel 4 --shard-by package

Execute shards tasks by package, updates imports, adapts APIs, and fixes tests. Commits reference package names and tasks. Failures in a shard do not block others.

Review

codex review --source diff --ruleset .codex/rules/compatibility.yml \
             --out .codex/review-migration.md

Review flags subtle behavior changes (timeouts, error objects) and recommends compatibility shims where needed.

Access 40,000+ AI Prompts for ChatGPT, Claude & Codex — Free!

Subscribe to get instant access to our complete Notion Prompt Library — the largest curated collection of prompts for ChatGPT, Claude, OpenAI Codex, and other leading AI models. Optimized for real-world workflows across coding, research, content creation, and business.

Get Free Access Now →

Workflow 4: Emergency Security Patch

Scenario: Patch an SSRF vulnerability in an image proxy path with minimal risk.

Plan

codex plan --template security-patch \
           --objective "Mitigate SSRF in image proxy by enforcing allowlist and URL parser" \
           --context src/proxy,tests/proxy \
           --out .codex/plan.yaml

Execute

codex exec --plan .codex/plan.yaml \
           --branch hotfix/ssrf-proxy \
           --sandbox docker --run-tests --require-approval SEC-1

Review

codex review --source diff --ruleset .codex/rules/security.yml \
             --out .codex/review-ssrf.md --block-on high

The review enforces strict rules and blocks merge until a senior approves mitigation details.

Workflow 5: Data Pipeline Update with Backfill

Scenario: Modify a Spark job to include a new attribute and backfill 30 days of data.

Plan

codex plan --objective "Add country_code to user events and backfill 30 days" \
           --context jobs/spark,schemas,tests \
           --acceptance "Schema versioned; idempotent backfill; zero data loss" \
           --risks "Skew, OOM, cost overruns" \
           --out .codex/plan.yaml

Execute

codex exec --plan .codex/plan.yaml \
           --branch feature/events-country-code \
           --sandbox remote:spark-3.4 \
           --dry-run --emit-spark-plan

Execute runs a dry-run MR on a sample and surfaces cost estimates. Once approved, it triggers a controlled backfill with checkpoints.

Review

codex review --source diff --ruleset .codex/rules/data.yml \
             --out .codex/review-data.md

Review confirms schema evolution best practices, partitioning, and lineage docs.

Integration Patterns and Automation

CI Pipelines with Modes

Automate the loop so every change follows a predictable path. A typical pattern: pre-merge Review on PRs, ephemeral Execute for preview environments, and scheduled Review sweeps on sensitive directories.

# .gitlab-ci.yml (excerpt)
stages: [test, review, preview]
review:
  stage: review
  image: alpine:3.19
  script:
    - curl -sSL https://example.com/install-codex.sh | sh
    - codex review --source mr --url $CI_MERGE_REQUEST_PROJECT_URL/merge_requests/$CI_MERGE_REQUEST_IID \
                   --ruleset .codex/rules/quality.yml \
                   --sarif out.sarif --comment
  artifacts:
    paths: [out.sarif]
preview:
  stage: preview
  script:
    - codex exec --plan .codex/plan.yaml --branch $CI_COMMIT_REF_NAME --run-tests

ChatOps

Expose safe commands in chat for rapid iteration. For example, in Slack: “/codex review this PR with security rules,” or “/codex execute T4 from plan.yaml.” Ensure RBAC prevents dangerous operations.

# Pseudo-handler for ChatOps
onCommand("/codex review <url>") {
  ensureRole("reviewer")
  run("codex review --source pr --url <url> --ruleset .codex/rules/security.yml --comment")
}

Pre-commit Hooks

Catch obvious issues before they reach CI by running lightweight Review checks locally.

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: codex-review
        name: Codex Quick Review
        entry: codex review --source diff --ruleset .codex/rules/quick.yml --out .codex/quick-review.md
        language: system
        pass_filenames: false

Make Targets

Standardize developer ergonomics with Make or npm scripts.

# Makefile (excerpt)
plan:
	@codex plan --objective "$$OBJ" --context src,tests --out .codex/plan.yaml

exec:
	@codex exec --plan .codex/plan.yaml --branch "$$BR" --run-tests

review:
	@codex review --source diff --ruleset .codex/rules/quality.yml --out .codex/review.md

Governance, Compliance, and Safety

Policy as Code

Treat execution and review policies as versioned code. Store them in .codex/policy.yaml and .codex/rules/*.yml, require approvals for changes, and audit modifications. Policies travel with the repo and apply consistently across surfaces.

RBAC and Approvals

Not every operation should be universally available. Define roles and map them to capabilities: e.g., Engineers can Plan and Execute in sandboxes; Maintainers can approve policy exceptions; Security can approve high-risk tasks. Approvals attach to tasks or PRs and are recorded in logs.

# .codex/access.yaml
roles:
  engineer:
    allow: ["plan:*", "exec:sandbox", "review:local"]
  maintainer:
    allow: ["plan:*", "exec:*", "review:*", "policy:approve"]
  security:
    allow: ["review:security", "exec:require-approval", "policy:edit"]
users:
  - id: "alice" roles: ["engineer"]
  - id: "bob" roles: ["maintainer"]
  - id: "secops" roles: ["security"]

Secrets and Data Handling

  • Never echo secrets: mark env vars as sensitive and mask them in logs.
  • Restrict network access: define allowlists for package registries; block outbound calls by default.
  • Minimize PII exposure: provide synthetic or redacted datasets for local runs where feasible.

Audit Trails

All mode transitions, commands, and outputs should be logged with timestamps, user IDs, and checksums. Export logs to your SIEM and retain them per your compliance policy. Use deterministic reports (e.g., SARIF, JSON) for cross-tool analysis.

Troubleshooting and Optimization

When Plans Under-Specify Work

Symptoms: Execute stalls, adds speculative changes, or fails tests unexpectedly. Fix by refining acceptance criteria, splitting tasks, and clarifying constraints. A good heuristic: if a task can’t be completed with two or fewer commits, it’s probably too big or under-specified.

# Refine an under-specified plan
codex plan --in .codex/plan.yaml \
           --split T2 --into T2a,T2b \
           --acceptance "Include explicit test cases for errors" \
           --write

When Execute Hits Environmental Issues

Symptoms: tool not found, dependency mismatches, flaky tests. Fix by pinning toolchain versions, using hermetic sandboxes, and caching dependencies. Prefer remote sandboxes for consistency across contributors.

# Diagnose environment with a codex-provided container
codex exec --plan .codex/plan.yaml \
           --sandbox docker:ghcr.io/org/codex-node-20:latest \
           --diag

When Review is Too Noisy

Symptoms: excessive comments, low signal-to-noise, repetitive findings. Fix by scoping to changed hunks, tuning severities, and consolidating findings by rule. Integrate with your linter to avoid duplication.

# Reduce noise with scoped review and tuned severities
codex review --source diff --ruleset .codex/rules/quality.yml \
             --only-changed --min-severity medium --out .codex/review.md

Performance and Cost Optimization

  • Cache indexes: pre-index the repo and reuse embeddings for code navigation.
  • Shard large plans: parallelize tasks across independent directories or packages.
  • Run impacted tests only: use change analysis to shorten feedback loops.
  • Tune budgets: lower token/time caps for routine changes; raise them for complex refactors.
# Pre-index repository for faster context
codex index --paths src,tests,docs --out .codex/index.db

# Use index during execution
codex exec --plan .codex/plan.yaml --use-index .codex/index.db

Capability Matrix and Decision Guide

At-a-Glance Matrix

  • Plan mode excels at: objective clarity, risk capture, and task DAG creation.
  • Execute mode excels at: minimal, reversible changes with tests and policy guardrails.
  • Review mode excels at: rubric-based evaluation, diffs, and actionable patches.

Decision Tree

  1. If you don’t have a crisp objective or acceptance criteria, start in Plan.
  2. If the change is small, local, and tests are ready, go directly to Execute with a lightweight plan.
  3. If you have a diff or a PR, use Review to validate quality and compliance.
  4. For high-risk areas (auth, payments, parsing), always do all three: Plan → Execute → Review.

Mode Combinations

  • Plan + Execute: for routine features where review can be lightweight (e.g., docs or UI text changes).
  • Execute + Review: for emergency fixes with templated plans.
  • Plan + Review: for audits or design reviews where implementation is deferred.

Conclusion and Next Steps

Codex Modes formalize what great engineers already do: clarify objectives, implement safely, and review rigorously. By treating plans as contracts, execution as an auditable process, and reviews as structured quality gates, teams gain speed without sacrificing trust. Start by templatizing common plans, enforcing minimal policies in Execute, and integrating Review into PRs. Expand with sandbox profiles, richer rulesets, and ChatOps as your practices mature. For guidance on building measurement into your rollout, see

For a deeper exploration of enterprise AI governance and compliance tools, our comprehensive guide on How Enterprise AI Governance Is Evolving in 2026 provides detailed strategies, practical examples, and implementation patterns that complement the techniques discussed in this article.

.

Adopt the smallest set of practices that deliver value immediately: a plan template for your top three change types, Execute in a sandboxed branch with commit-per-task, and Review that posts concise, actionable comments. Iterate from there. In weeks, you’ll see fewer surprises, faster merges, and higher confidence across engineering and leadership.

Get Free Access to 40,000+ AI Prompts for ChatGPT, Claude & Codex

Subscribe for instant access to the largest curated Notion Prompt Library for AI workflows.

More on this