Self-Reflection and Critique in AI Agents

Concept Introduction

Simple Explanation

Imagine you’re writing an essay. You write a first draft, then read it over critically: “This argument is weak,” “This paragraph is unclear,” “I need more evidence here.” You revise based on your own critique, producing a better second draft. You might repeat this process several times.

Self-reflection brings this same iterative self-improvement process to AI agents. Instead of generating a single response and moving on, the agent generates an output, critiques its own work, and revises based on that critique—potentially repeating until the output meets quality standards.

Technical Detail

Self-reflection in AI agents is a meta-cognitive pattern where the agent evaluates and improves its own outputs through iterative critique-and-revision cycles. Introduced prominently in the Reflexion framework (Shinn et al., 2023) and further developed in systems like Self-Refine and CRITIC, this approach addresses a fundamental limitation of single-pass LLM generation: the lack of quality control and self-correction.

Core components:

Generator: Produces an initial output (response, plan, code, etc.)
Critic: Evaluates the output, identifying flaws, errors, or areas for improvement
Refiner: Revises the output based on critique
Termination condition: Determines when output quality is sufficient or max iterations reached

The key insight: LLMs that can generate solutions can also evaluate those solutions and suggest improvements. By making this explicit through prompting or architectural design, we transform single-shot generation into iterative refinement.

Historical & Theoretical Context

Origin

Self-reflection in AI agents draws from multiple intellectual traditions:

From cognitive science: Metacognition—thinking about one’s own thinking—is a hallmark of human intelligence. Developmental psychology shows that self-monitoring and self-correction distinguish expert problem-solvers from novices. The agent architecture mirrors this metacognitive loop.

From software engineering: Code review, testing, and refactoring are formalized self-critique processes. Agile methodologies emphasize iterative improvement. Self-reflective agents apply these principles to AI generation.

From classical AI: Generate-and-test is one of the oldest AI paradigms (GPS - General Problem Solver, 1959). Modern self-reflection is generate-and-test augmented with learned evaluation functions (the LLM’s judgment) rather than hand-coded heuristics.

From modern LLMs: Constitutional AI (Anthropic, 2022) showed that LLMs can critique and revise outputs according to specified principles. Chain-of-Thought showed LLMs can reason step-by-step. Self-reflection combines these: use reasoning to critique, then use that critique to improve.

Theoretical Foundation

Self-reflection can be formalized as an iterative optimization process:

Let f(x) be the generator function producing output y = f(x) from input x.
Let g(y) be the critic function evaluating output y, producing critique c = g(y).
Let h(y, c) be the refiner function producing improved output y’ = h(y, c).

The self-reflection loop:

y₀ = f(x)           // Initial generation
for i in 1..N:
    cᵢ = g(yᵢ₋₁)    // Critique current output
    yᵢ = h(yᵢ₋₁, cᵢ) // Refine based on critique
    if termination_condition(yᵢ, cᵢ): break
return yᵢ

This resembles iterative refinement algorithms in optimization, but with the “gradient” provided by natural language critique rather than numerical derivatives.

Algorithms & Math

Core Algorithm: Reflexion Pattern

function SelfReflexion(task, max_iterations=3):
    # Initial attempt
    output = Generator(task)
    memory = []
    
    for iteration in 1..max_iterations:
        # Evaluate current output
        critique = Critic(task, output, memory)
        
        # Check termination
        if critique.is_satisfactory():
            return output
        
        # Store critique in memory
        memory.append({
            'iteration': iteration,
            'output': output,
            'critique': critique,
            'issues': critique.identified_problems()
        })
        
        # Generate improved output using critique
        output = Refiner(task, output, critique, memory)
    
    return output  # Return best effort after max iterations

function Critic(task, output, memory):
    prompt = f"""
    Task: {task}
    Current output: {output}
    
    Previous attempts and issues: {memory}
    
    Evaluate this output:
    1. Does it correctly solve the task?
    2. What are its weaknesses or errors?
    3. How could it be improved?
    4. Rate quality 0-10.
    
    Provide specific, actionable critique.
    """
    return LLM(prompt)

function Refiner(task, output, critique, memory):
    prompt = f"""
    Task: {task}
    Previous output: {output}
    
    Critique: {critique}
    Previous attempts: {memory}
    
    Generate an improved output that addresses the critique.
    Learn from previous mistakes: {memory.common_issues()}
    """
    return LLM(prompt)

Mathematical Formulation: Expected Quality Improvement

Assume critic quality score Q(y) ∈ [0, 10] for output y.

Expected quality improvement per iteration:

ΔQ = E[Q(yᵢ) - Q(yᵢ₋₁)]

In practice, quality tends to improve logarithmically with diminishing returns:

Q(yᵢ) ≈ Q_max - (Q_max - Q₀) * e^(-λi)

Where:

Q_max: maximum achievable quality
Q₀: initial quality
λ: improvement rate constant
i: iteration number

This suggests optimal iteration count is typically 2-4—enough for substantial improvement but before diminishing returns dominate.

Variant: Self-Consistency with Critique

Instead of refining a single output, generate multiple candidates and use critique to select the best:

function SelfConsistencyWithCritique(task, num_samples=5):
    candidates = [Generator(task) for _ in range(num_samples)]
    
    critiques = []
    for candidate in candidates:
        critique = Critic(task, candidate)
        critiques.append((candidate, critique.quality_score()))
    
    # Select highest-quality candidate
    best_candidate, best_score = max(critiques, key=lambda x: x[1])
    
    # Optionally refine the best candidate
    if best_score < quality_threshold:
        return Refiner(task, best_candidate, Critic(task, best_candidate))
    
    return best_candidate

This combines exploration (diverse generation) with exploitation (critique-guided selection).

Design Patterns & Architectures

Integration with Agent Architectures

Self-reflection typically augments the Executor or Planner components:

┌──────────────────────────────────────┐
│        Agent Architecture            │
├──────────────────────────────────────┤
│  Perception                          │
│      ↓                                │
│  Planner                             │
│      ↓                                │
│  [Self-Reflection Wrapper]          │
│    ├─ Generate Action               │
│    ├─ Critique Action               │
│    └─ Refine Action                 │
│      ↓                                │
│  Executor                            │
│      ↓                                │
│  Memory                              │
└──────────────────────────────────────┘

Common Patterns

1. Critique-Driven Debugging Pattern

Use self-reflection specifically for error correction in code generation or problem-solving:

def debug_with_reflection(code, test_cases, max_attempts=3):
    for attempt in range(max_attempts):
        # Execute code against test cases
        results = execute_tests(code, test_cases)
        
        if all(r.passed for r in results):
            return code  # Success
        
        # Critique based on test failures
        critique = f"""
        Code failed these tests:
        {format_failures(results)}
        
        Identify the bug and explain the fix needed.
        """
        
        diagnosis = LLM(critique)
        
        # Refine code based on diagnosis
        code = LLM(f"""
        Original code:
        {code}
        
        Bug diagnosis:
        {diagnosis}
        
        Write corrected code:
        """)
    
    return code  # Best effort

2. Multi-Aspect Critique Pattern

Evaluate outputs on multiple dimensions (correctness, efficiency, readability, safety):

def multi_aspect_reflection(code, task):
    critique_aspects = {
        'correctness': "Does this correctly solve the task?",
        'efficiency': "Are there performance issues or inefficiencies?",
        'readability': "Is the code clear and well-structured?",
        'safety': "Are there security vulnerabilities or edge cases?"
    }
    
    critiques = {}
    for aspect, question in critique_aspects.items():
        critique = LLM(f"{question}\n\nCode:\n{code}")
        critiques[aspect] = critique
    
    # Refine addressing all aspects
    improved = LLM(f"""
    Task: {task}
    Code: {code}
    
    Critiques:
    {format_critiques(critiques)}
    
    Rewrite code addressing all critique points:
    """)
    
    return improved

3. External Feedback Integration Pattern

Combine self-critique with external feedback (user input, tool execution, test results):

def reflection_with_external_feedback(task, user_feedback=None):
    output = Generator(task)
    
    # Internal critique
    internal_critique = Critic(task, output)
    
    # Combine with external feedback if available
    if user_feedback:
        combined_critique = f"""
        Internal assessment: {internal_critique}
        User feedback: {user_feedback}
        
        Synthesize a comprehensive improvement plan:
        """
        critique = LLM(combined_critique)
    else:
        critique = internal_critique
    
    # Refine based on combined critique
    return Refiner(task, output, critique)

Practical Application

Real-World Example: Self-Reflective Code Generator

import openai
from typing import Dict, List, Tuple

class SelfReflectiveCodeAgent:
    def __init__(self, model="gpt-4", max_iterations=3):
        self.model = model
        self.max_iterations = max_iterations
        self.memory = []
    
    def generate_code(self, task: str, test_cases: List[Dict] = None) -> str:
        """
        Generate code with self-reflection and iterative improvement.
        """
        print(f"Task: {task}\n")
        
        # Initial generation
        code = self._generate(task)
        print(f"Initial attempt:\n{code}\n")
        
        for iteration in range(1, self.max_iterations + 1):
            # Critique current code
            critique = self._critique(task, code, test_cases)
            print(f"Iteration {iteration} critique:\n{critique['feedback']}\n")
            
            # Check if satisfactory
            if critique['quality_score'] >= 8:
                print(f"Quality threshold met (score: {critique['quality_score']}/10)")
                break
            
            # Store in memory
            self.memory.append({
                'iteration': iteration,
                'code': code,
                'critique': critique,
                'quality': critique['quality_score']
            })
            
            # Refine based on critique
            code = self._refine(task, code, critique)
            print(f"Iteration {iteration} refined code:\n{code}\n")
        
        return code
    
    def _generate(self, task: str) -> str:
        """Generate initial code attempt."""
        prompt = f"""Write Python code to solve this task:

{task}

Provide clean, well-commented code:"""
        
        response = openai.ChatCompletion.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7
        )
        
        return self._extract_code(response.choices[0].message.content)
    
    def _critique(self, task: str, code: str, test_cases: List[Dict] = None) -> Dict:
        """Critique current code implementation."""
        
        # Execute tests if provided
        test_results = ""
        if test_cases:
            test_results = self._run_tests(code, test_cases)
        
        prompt = f"""Critique this Python code:

Task: {task}

Code:
```python
{code}

{test_results}

Evaluate on these dimensions:

Correctness: Does it solve the task correctly?
Edge cases: Does it handle edge cases?
Efficiency: Is it reasonably efficient?
Code quality: Is it readable and well-structured?

Provide:

Specific issues found
Suggestions for improvement
Quality score (0-10)

Format as JSON: {{ “issues”: [“issue1”, “issue2”, …], “suggestions”: [“suggestion1”, “suggestion2”, …], “quality_score”: 7, “feedback”: “detailed critique…” }}"""

    response = openai.ChatCompletion.create(
        model=self.model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3
    )
    
    import json
    try:
        critique = json.loads(response.choices[0].message.content)
    except:
        # Fallback if JSON parsing fails
        critique = {
            "issues": ["Unable to parse critique"],
            "suggestions": [],
            "quality_score": 5,
            "feedback": response.choices[0].message.content
        }
    
    return critique

def _refine(self, task: str, code: str, critique: Dict) -> str:
    """Refine code based on critique."""
    
    # Include memory of previous attempts
    previous_attempts = "\n".join([
        f"Attempt {m['iteration']}: Score {m['quality']}/10 - {m['critique']['issues']}"
        for m in self.memory[-2:]  # Last 2 attempts
    ])
    
    prompt = f"""Improve this Python code based on critique:

Task: {task}

Current code:

{code}

Critique: {critique[‘feedback’]}

Specific issues to fix: {chr(10).join(f"- {issue}" for issue in critique[‘issues’])}

Improvement suggestions: {chr(10).join(f"- {suggestion}" for suggestion in critique[‘suggestions’])}

Previous attempts: {previous_attempts if previous_attempts else “First iteration”}

Write improved code that addresses all critique points:"""

    response = openai.ChatCompletion.create(
        model=self.model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.7
    )
    
    return self._extract_code(response.choices[0].message.content)

def _extract_code(self, response: str) -> str:
    """Extract code from markdown code blocks."""
    if "```python" in response:
        code = response.split("```python")[1].split("```")[0]
    elif "```" in response:
        code = response.split("```")[1].split("```")[0]
    else:
        code = response
    return code.strip()

def _run_tests(self, code: str, test_cases: List[Dict]) -> str:
    """Execute code against test cases and format results."""
    results = []
    for i, test in enumerate(test_cases):
        try:
            # Create isolated namespace for code execution
            namespace = {}
            exec(code, namespace)
            
            # Run test
            output = namespace[test['function']](*test['input'])
            passed = output == test['expected']
            
            results.append(f"Test {i+1}: {'PASS' if passed else 'FAIL'}")
            if not passed:
                results.append(f"  Input: {test['input']}")
                results.append(f"  Expected: {test['expected']}")
                results.append(f"  Got: {output}")
        except Exception as e:
            results.append(f"Test {i+1}: ERROR - {str(e)}")
    
    return "\nTest Results:\n" + "\n".join(results)

Example usage

if name == “main”: agent = SelfReflectiveCodeAgent(max_iterations=3)

task = """
Write a function that finds the longest palindromic substring in a given string.
Function signature: def longest_palindrome(s: str) -> str
"""

test_cases = [
    {'function': 'longest_palindrome', 'input': ['babad'], 'expected': 'bab'},
    {'function': 'longest_palindrome', 'input': ['cbbd'], 'expected': 'bb'},
    {'function': 'longest_palindrome', 'input': ['a'], 'expected': 'a'},
    {'function': 'longest_palindrome', 'input': ['racecar'], 'expected': 'racecar'},
]

final_code = agent.generate_code(task, test_cases)
print("=" * 50)
print("FINAL CODE:")
print(final_code)


### Using Self-Reflection with LangGraph

```python
from langgraph.graph import StateGraph, END
from typing import TypedDict, List

class ReflectionState(TypedDict):
    task: str
    output: str
    critique: str
    iteration: int
    quality_score: float
    max_iterations: int

def generate_output(state: ReflectionState) -> ReflectionState:
    """Generate or regenerate output."""
    if state["iteration"] == 0:
        # Initial generation
        output = llm.generate(state["task"])
    else:
        # Refinement based on critique
        output = llm.refine(
            task=state["task"],
            previous_output=state["output"],
            critique=state["critique"]
        )
    
    return {
        "output": output,
        "iteration": state["iteration"] + 1
    }

def critique_output(state: ReflectionState) -> ReflectionState:
    """Evaluate current output quality."""
    critique_result = llm.critique(
        task=state["task"],
        output=state["output"]
    )
    
    return {
        "critique": critique_result["feedback"],
        "quality_score": critique_result["score"]
    }

def should_continue(state: ReflectionState) -> str:
    """Decide whether to continue iterating."""
    if state["quality_score"] >= 8:
        return "satisfied"
    if state["iteration"] >= state["max_iterations"]:
        return "max_iterations"
    return "continue"

# Build the self-reflection graph
workflow = StateGraph(ReflectionState)

workflow.add_node("generate", generate_output)
workflow.add_node("critique", critique_output)

workflow.set_entry_point("generate")
workflow.add_edge("generate", "critique")
workflow.add_conditional_edges(
    "critique",
    should_continue,
    {
        "continue": "generate",
        "satisfied": END,
        "max_iterations": END
    }
)

reflection_agent = workflow.compile()

# Use the agent
result = reflection_agent.invoke({
    "task": "Write a function to merge two sorted lists",
    "output": "",
    "critique": "",
    "iteration": 0,
    "quality_score": 0,
    "max_iterations": 3
})

Comparisons & Tradeoffs

Self-Reflection vs Single-Pass Generation

Aspect	Single-Pass	Self-Reflection
Quality	Variable, one attempt	Higher, iteratively improved
Cost	Low (1 LLM call)	High (3-6 LLM calls)
Latency	Fast	Slower (sequential iterations)
Best for	Simple tasks, cost-sensitive apps	Complex tasks requiring high quality

When to use single-pass: Straightforward tasks with high base success rates, real-time applications, cost-constrained scenarios.

When to use self-reflection: Code generation, creative writing, complex problem-solving, any task where output quality is critical and cost/latency are acceptable.

Self-Reflection vs Tool-Based Verification

Tool-based verification: Execute code, run tests, check against formal specifications.
Self-reflection: LLM evaluates its own output based on learned quality criteria.

Tool-based verification is more reliable when available (tests don’t lie), but requires executable environments and formal specifications. Self-reflection works on any output type (text, plans, ideas) and catches issues tools can’t (readability, user experience, semantic correctness).

Best practice: Combine both—use tools for objective verification, self-reflection for subjective quality improvement.

Limitations

Critic reliability: Quality depends on the LLM’s ability to accurately evaluate its own outputs. If the model can’t recognize errors, reflection won’t help.
Diminishing returns: Quality improves logarithmically. Going from 3 to 4 iterations often adds cost without meaningful improvement.
Hallucinated critique: The critic might identify “problems” that don’t exist or suggest “improvements” that make things worse.
Cost multiplier: Self-reflection typically 3-5x the cost of single-pass generation.
Local optima: Iterative refinement may get stuck in local quality maxima rather than finding fundamentally better approaches.

Latest Developments & Research

Recent Papers (2023-2025)

“Reflexion: Language Agents with Verbal Reinforcement Learning” (Shinn et al., 2023)
Introduced the Reflexion framework where agents learn from task failures through verbal feedback stored in memory, improving over time on sequential decision tasks.

“Self-Refine: Iterative Refinement with Self-Feedback” (Madaan et al., 2023)
Showed that single-model iterative refinement improves outputs across diverse tasks (code, math, dialogue) without external feedback or tools.

“CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing” (Gou et al., 2024)
Combined self-reflection with external tool verification, using tools to ground critiques in objective reality and prevent hallucinated feedback.

“Learning to Self-Correct via Reinforcement Learning” (Kumar et al., 2024)
Fine-tuned models specifically for self-correction using RL, showing that explicit training on critique-and-revision improves both critic and refiner quality.

“Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding” (Reynolds et al., 2024)
Used meta-prompts to orchestrate multiple LLM instances—one generator, one critic, one integrator—showing improved performance over single-instance reflection.

Current Benchmarks

Self-reflection shows consistent improvements across benchmarks:

HumanEval (code generation): +12% pass rate with 2-iteration reflection
GSM8K (math word problems): +15% accuracy with critique-based refinement
Creative writing: Human evaluators rate self-refined outputs 30% higher on coherence and quality
Agent decision tasks: Reflexion improves success rate on ALFWorld and WebShop benchmarks by 20-30%

Open Problems

Optimal iteration count: How do we automatically determine when to stop iterating? Can we learn termination conditions?
Critic training: Can we fine-tune specialized critic models that outperform general-purpose LLMs at evaluation?
Multi-agent reflection: Should generator and critic be separate model instances, or should a single model self-reflect?
Critique hallucination detection: How do we detect when critique is unreliable or hallucinatory?
Scaling to long-horizon tasks: How does self-reflection compose with multi-step agent tasks that span hundreds of actions?

Cross-Disciplinary Insight: Software Engineering

Self-reflection in AI agents mirrors test-driven development (TDD) and continuous integration practices in software engineering:

TDD cycle:

Write test (specification)
Write code (generation)
Run test (critique)
Refactor (refinement)

Self-reflection cycle:

Receive task (specification)
Generate output (generation)
Critique output (critique)
Refine output (refinement)

The parallel runs deep: both recognize that first attempts rarely achieve quality standards and that explicit evaluation-improvement loops produce better results than single-pass efforts.

Code review is another analog. In human teams, someone other than the author reviews code. In self-reflective agents, the same model reviews its work—analogous to “reviewing your own code” before submission. While less robust than external review, it catches many issues.

Continuous improvement cultures in software engineering (Kaizen, retrospectives, postmortems) emphasize learning from mistakes and iterating toward excellence. Self-reflection brings this same philosophy to AI generation: outputs are not final until evaluated and refined.

This connection suggests organizational strategies: just as companies invest in testing infrastructure and code review processes, deploying self-reflective agents may require investment in critique prompt engineering, evaluation metrics, and iteration budgets.

Daily Challenge

Problem: Implement a self-reflective agent for mathematical problem-solving that:

Generates a solution to a math problem
Verifies its own answer by checking the solution
If verification fails, critiques its approach and retries
Repeats for up to 3 iterations

Test problem:

A farmer has 17 sheep, and all but 9 die. How many sheep are left?

(This is a trick question—common mistake is to calculate 17 - 9 = 8, but “all but 9” means 9 remain.)

Your task:

Implement the three functions: generate_solution(), verify_solution(), critique_and_refine()
Test on the sheep problem and other math problems
Track whether self-reflection catches the error and produces the correct answer

Bonus challenges:

Add a fourth function meta_critique() that evaluates whether the critique itself is helpful
Implement termination logic that stops early if verification passes
Test on more complex math problems (algebra, geometry) to see if the pattern generalizes

Hint: The verification step can use symbolic reasoning: “If 9 sheep remain, then 17 - 9 = 8 should have died. But the problem states ‘all but 9 die,’ implying 9 are alive. This is consistent.”

References & Further Reading

Foundational Papers

“Reflexion: Language Agents with Verbal Reinforcement Learning”
Shinn et al., 2023
https://arxiv.org/abs/2303.11366
“Self-Refine: Iterative Refinement with Self-Feedback”
Madaan et al., 2023
https://arxiv.org/abs/2303.17651
“CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing”
Gou et al., 2024
https://arxiv.org/abs/2305.11738

Advanced Research

“Constitutional AI: Harmlessness from AI Feedback”
Bai et al., 2022
https://arxiv.org/abs/2212.08073
“Learning to Self-Correct via Reinforcement Learning”
Kumar et al., 2024
[ArXiv preprint, check latest]

Implementation Resources

LangChain Self-Critique Guide
https://python.langchain.com/docs/use_cases/agents/self_critique
Reflexion GitHub Repository
https://github.com/noahshinn024/reflexion
LangGraph Multi-Agent Patterns
https://langchain-ai.github.io/langgraph/

“Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”
Wei et al., 2022
https://arxiv.org/abs/2201.11903
“Tree of Thoughts: Deliberate Problem Solving with Large Language Models”
Yao et al., 2023
https://arxiv.org/abs/2305.10601

2025-11-09

../

Self-Reflection and Critique in AI Agents

Self-Reflection and Critique in AI Agents

Concept Introduction

Simple Explanation

Technical Detail

Historical & Theoretical Context

Origin

Theoretical Foundation

Algorithms & Math

Core Algorithm: Reflexion Pattern

Mathematical Formulation: Expected Quality Improvement

Variant: Self-Consistency with Critique

Design Patterns & Architectures

Integration with Agent Architectures

Common Patterns

Practical Application

Real-World Example: Self-Reflective Code Generator

Example usage

Comparisons & Tradeoffs

Self-Reflection vs Single-Pass Generation

Self-Reflection vs Tool-Based Verification

Limitations

Latest Developments & Research

Recent Papers (2023-2025)

Current Benchmarks

Open Problems

Cross-Disciplinary Insight: Software Engineering

Daily Challenge

References & Further Reading

Foundational Papers

Advanced Research

Implementation Resources

Related Concepts