Goal-Oriented Action Planning (GOAP) for AI Agents

Concept Introduction

Simple explanation: GOAP is like having a GPS for decisions. You tell the agent where you want to end up (the goal), and it figures out the series of actions to get there, automatically finding alternatives if one path is blocked.

Technical detail: GOAP is a real-time planning architecture where an agent dynamically constructs action sequences to satisfy goal conditions. Unlike finite state machines or behavior trees that predefine transitions, GOAP agents search through possible actions at runtime, selecting those whose effects satisfy preconditions of subsequent actions until the goal is reached.

Historical & Theoretical Context

GOAP was pioneered by Jeff Orkin for the 2005 game F.E.A.R., revolutionizing game AI by allowing NPCs to exhibit believable, adaptive behavior. The architecture draws from STRIPS (Stanford Research Institute Problem Solver) from the 1970s, which formalized planning as searching through states defined by predicates.

GOAP connects to classical AI planning, but optimizes for real-time performance by limiting plan horizon, using precomputed heuristics, and caching common plans. It represents a middle ground between fully reactive systems and heavyweight planners.

Algorithms & Math

GOAP uses backward-chaining search from goal to current state:

Algorithm: GOAP Planner
Input: current_state, goal_state, available_actions
Output: action_sequence

1. Create open_set with initial node (goal_state, cost=0, actions=[])
2. While open_set not empty:
   a. Pop node with lowest cost
   b. If node.unsatisfied_conditions ⊆ current_state:
      return node.actions (plan found)
   c. For each action in available_actions:
      If action.effects ∩ node.unsatisfied_conditions ≠ ∅:
        - new_conditions = (node.unsatisfied_conditions - action.effects) ∪ action.preconditions
        - new_cost = node.cost + action.cost
        - Add new node to open_set
3. Return failure (no plan found)

The search is essentially A* where heuristic estimates remaining actions needed to satisfy preconditions.

Design Patterns & Architectures

GOAP fits into the Planner-Executor pattern:

World State: Key-value representation of current conditions
Goals: Desired world state conditions with priorities
Actions: Preconditions, effects, and costs
Planner: Searches for action sequences
Executor: Runs planned actions, monitors for replanning triggers

In LLM agents, GOAP integrates as a structured reasoning layer:

User Query → Goal Extraction → GOAP Planner → Action Sequence → Tool Execution

Practical Application

Here’s a Python implementation for an LLM agent using GOAP:

from dataclasses import dataclass
from typing import Dict, Set, List
import heapq

@dataclass
class Action:
    name: str
    preconditions: Dict[str, bool]
    effects: Dict[str, bool]
    cost: float = 1.0

@dataclass 
class GOAPPlanner:
    def plan(self, current: Dict[str, bool], goal: Dict[str, bool], 
             actions: List[Action]) -> List[str]:
        
        # Priority queue: (cost, unsatisfied, action_list)
        open_set = [(0, goal.copy(), [])]
        
        while open_set:
            cost, unsatisfied, plan = heapq.heappop(open_set)
            
            # Check if current state satisfies remaining conditions
            if all(current.get(k) == v for k, v in unsatisfied.items()):
                return plan
            
            for action in actions:
                # Does action help satisfy any unsatisfied condition?
                helps = any(
                    action.effects.get(k) == v 
                    for k, v in unsatisfied.items()
                )
                
                if helps:
                    # Remove satisfied conditions, add preconditions
                    new_unsatisfied = {
                        k: v for k, v in unsatisfied.items()
                        if action.effects.get(k) != v
                    }
                    new_unsatisfied.update(action.preconditions)
                    
                    heapq.heappush(open_set, (
                        cost + action.cost,
                        new_unsatisfied,
                        [action.name] + plan
                    ))
        
        return []  # No plan found

# Example: Research agent
actions = [
    Action("search_web", {"has_query": True}, {"has_sources": True}),
    Action("read_sources", {"has_sources": True}, {"has_content": True}),
    Action("synthesize", {"has_content": True}, {"has_answer": True}),
    Action("extract_query", {}, {"has_query": True}),
]

planner = GOAPPlanner()
current_state = {"has_query": False}
goal_state = {"has_answer": True}

plan = planner.plan(current_state, goal_state, actions)
print(f"Plan: {' → '.join(plan)}")
# Output: Plan: extract_query → search_web → read_sources → synthesize

Comparisons & Tradeoffs

Approach	Pros	Cons
GOAP	Flexible, emergent behavior, handles novel situations	Planning overhead, requires good action modeling
Behavior Trees	Predictable, easy to debug, visual editing	Rigid, hard to handle unexpected situations
FSM	Simple, fast, minimal overhead	Combinatorial explosion for complex behaviors
ReAct	Uses LLM reasoning directly	No explicit planning, can meander

GOAP scales well when action spaces are moderate (<100 actions) and goals are clearly definable. It struggles with continuous action spaces and uncertain effects.

Latest Developments & Research

LLM-Enhanced GOAP (2024-2025):

Researchers are using LLMs to automatically generate action schemas from natural language descriptions
Hybrid approaches combine GOAP structure with LLM-based effect prediction for uncertain domains
Integration with LangGraph shows promise for stateful multi-step task execution

Open problems:

Handling probabilistic effects in LLM tool calling
Learning action costs from experience
Hierarchical GOAP for multi-level planning

Cross-Disciplinary Insight

GOAP mirrors means-end analysis from cognitive psychology—how humans solve problems by identifying differences between current and goal states, then finding operations to reduce those differences. Herbert Simon and Allen Newell’s work on human problem-solving directly influenced AI planning systems.

In economics, GOAP resembles backward induction in game theory, where optimal strategies are determined by reasoning backward from desired outcomes. This connection suggests opportunities to incorporate game-theoretic considerations into multi-agent GOAP systems.

Daily Challenge

Implement a GOAP-based LLM agent that can plan how to answer questions requiring multiple tool calls:

Define actions for: web_search, read_file, calculate, write_response
Each action should have realistic preconditions (e.g., read_file requires knowing filename)
Test with query: “What’s 15% of the revenue mentioned in report.pdf?”
Bonus: Add action costs based on estimated token usage

References & Further Reading

Original GOAP paper by Jeff Orkin - Foundational game AI implementation
AI Planning (Russell & Norvig, AIMA) - Chapter 10-11 covers classical planning
LangGraph Documentation - Stateful agent orchestration
arxiv:2411.18241 - LangGraph+CrewAI Multi-Agent Systems - Recent research on combining frameworks

2025-11-24

../