Self-Reflection and Critique in AI Agents
Self-Reflection and Critique in AI Agents
Concept Introduction
Simple Explanation
Imagine you’re writing an essay. You write a first draft, then read it over critically: “This argument is weak,” “This paragraph is unclear,” “I need more evidence here.” You revise based on your own critique, producing a better second draft. You might repeat this process several times.
Self-reflection brings this same iterative self-improvement process to AI agents. Instead of generating a single response and moving on, the agent generates an output, critiques its own work, and revises based on that critique—potentially repeating until the output meets quality standards.
Technical Detail
Self-reflection in AI agents is a meta-cognitive pattern where the agent evaluates and improves its own outputs through iterative critique-and-revision cycles. Introduced prominently in the Reflexion framework (Shinn et al., 2023) and further developed in systems like Self-Refine and CRITIC, this approach addresses a fundamental limitation of single-pass LLM generation: the lack of quality control and self-correction.
Core components:
- Generator: Produces an initial output (response, plan, code, etc.)
- Critic: Evaluates the output, identifying flaws, errors, or areas for improvement
- Refiner: Revises the output based on critique
- Termination condition: Determines when output quality is sufficient or max iterations reached
The key insight: LLMs that can generate solutions can also evaluate those solutions and suggest improvements. By making this explicit through prompting or architectural design, we transform single-shot generation into iterative refinement.
Historical & Theoretical Context
Origin
Self-reflection in AI agents draws from multiple intellectual traditions:
From cognitive science: Metacognition—thinking about one’s own thinking—is a hallmark of human intelligence. Developmental psychology shows that self-monitoring and self-correction distinguish expert problem-solvers from novices. The agent architecture mirrors this metacognitive loop.
From software engineering: Code review, testing, and refactoring are formalized self-critique processes. Agile methodologies emphasize iterative improvement. Self-reflective agents apply these principles to AI generation.
From classical AI: Generate-and-test is one of the oldest AI paradigms (GPS - General Problem Solver, 1959). Modern self-reflection is generate-and-test augmented with learned evaluation functions (the LLM’s judgment) rather than hand-coded heuristics.
From modern LLMs: Constitutional AI (Anthropic, 2022) showed that LLMs can critique and revise outputs according to specified principles. Chain-of-Thought showed LLMs can reason step-by-step. Self-reflection combines these: use reasoning to critique, then use that critique to improve.
Theoretical Foundation
Self-reflection can be formalized as an iterative optimization process:
Let f(x) be the generator function producing output y = f(x) from input x.
Let g(y) be the critic function evaluating output y, producing critique c = g(y).
Let h(y, c) be the refiner function producing improved output y’ = h(y, c).
The self-reflection loop:
y₀ = f(x) // Initial generation
for i in 1..N:
cᵢ = g(yᵢ₋₁) // Critique current output
yᵢ = h(yᵢ₋₁, cᵢ) // Refine based on critique
if termination_condition(yᵢ, cᵢ): break
return yᵢ
This resembles iterative refinement algorithms in optimization, but with the “gradient” provided by natural language critique rather than numerical derivatives.
Algorithms & Math
Core Algorithm: Reflexion Pattern
function SelfReflexion(task, max_iterations=3):
# Initial attempt
output = Generator(task)
memory = []
for iteration in 1..max_iterations:
# Evaluate current output
critique = Critic(task, output, memory)
# Check termination
if critique.is_satisfactory():
return output
# Store critique in memory
memory.append({
'iteration': iteration,
'output': output,
'critique': critique,
'issues': critique.identified_problems()
})
# Generate improved output using critique
output = Refiner(task, output, critique, memory)
return output # Return best effort after max iterations
function Critic(task, output, memory):
prompt = f"""
Task: {task}
Current output: {output}
Previous attempts and issues: {memory}
Evaluate this output:
1. Does it correctly solve the task?
2. What are its weaknesses or errors?
3. How could it be improved?
4. Rate quality 0-10.
Provide specific, actionable critique.
"""
return LLM(prompt)
function Refiner(task, output, critique, memory):
prompt = f"""
Task: {task}
Previous output: {output}
Critique: {critique}
Previous attempts: {memory}
Generate an improved output that addresses the critique.
Learn from previous mistakes: {memory.common_issues()}
"""
return LLM(prompt)
Mathematical Formulation: Expected Quality Improvement
Assume critic quality score Q(y) ∈ [0, 10] for output y.
Expected quality improvement per iteration:
ΔQ = E[Q(yᵢ) - Q(yᵢ₋₁)]
In practice, quality tends to improve logarithmically with diminishing returns:
Q(yᵢ) ≈ Q_max - (Q_max - Q₀) * e^(-λi)
Where:
- Q_max: maximum achievable quality
- Q₀: initial quality
- λ: improvement rate constant
- i: iteration number
This suggests optimal iteration count is typically 2-4—enough for substantial improvement but before diminishing returns dominate.
Variant: Self-Consistency with Critique
Instead of refining a single output, generate multiple candidates and use critique to select the best:
function SelfConsistencyWithCritique(task, num_samples=5):
candidates = [Generator(task) for _ in range(num_samples)]
critiques = []
for candidate in candidates:
critique = Critic(task, candidate)
critiques.append((candidate, critique.quality_score()))
# Select highest-quality candidate
best_candidate, best_score = max(critiques, key=lambda x: x[1])
# Optionally refine the best candidate
if best_score < quality_threshold:
return Refiner(task, best_candidate, Critic(task, best_candidate))
return best_candidate
This combines exploration (diverse generation) with exploitation (critique-guided selection).
Design Patterns & Architectures
Integration with Agent Architectures
Self-reflection typically augments the Executor or Planner components:
┌──────────────────────────────────────┐
│ Agent Architecture │
├──────────────────────────────────────┤
│ Perception │
│ ↓ │
│ Planner │
│ ↓ │
│ [Self-Reflection Wrapper] │
│ ├─ Generate Action │
│ ├─ Critique Action │
│ └─ Refine Action │
│ ↓ │
│ Executor │
│ ↓ │
│ Memory │
└──────────────────────────────────────┘
Common Patterns
1. Critique-Driven Debugging Pattern
Use self-reflection specifically for error correction in code generation or problem-solving:
def debug_with_reflection(code, test_cases, max_attempts=3):
for attempt in range(max_attempts):
# Execute code against test cases
results = execute_tests(code, test_cases)
if all(r.passed for r in results):
return code # Success
# Critique based on test failures
critique = f"""
Code failed these tests:
{format_failures(results)}
Identify the bug and explain the fix needed.
"""
diagnosis = LLM(critique)
# Refine code based on diagnosis
code = LLM(f"""
Original code:
{code}
Bug diagnosis:
{diagnosis}
Write corrected code:
""")
return code # Best effort
2. Multi-Aspect Critique Pattern
Evaluate outputs on multiple dimensions (correctness, efficiency, readability, safety):
def multi_aspect_reflection(code, task):
critique_aspects = {
'correctness': "Does this correctly solve the task?",
'efficiency': "Are there performance issues or inefficiencies?",
'readability': "Is the code clear and well-structured?",
'safety': "Are there security vulnerabilities or edge cases?"
}
critiques = {}
for aspect, question in critique_aspects.items():
critique = LLM(f"{question}\n\nCode:\n{code}")
critiques[aspect] = critique
# Refine addressing all aspects
improved = LLM(f"""
Task: {task}
Code: {code}
Critiques:
{format_critiques(critiques)}
Rewrite code addressing all critique points:
""")
return improved
3. External Feedback Integration Pattern
Combine self-critique with external feedback (user input, tool execution, test results):
def reflection_with_external_feedback(task, user_feedback=None):
output = Generator(task)
# Internal critique
internal_critique = Critic(task, output)
# Combine with external feedback if available
if user_feedback:
combined_critique = f"""
Internal assessment: {internal_critique}
User feedback: {user_feedback}
Synthesize a comprehensive improvement plan:
"""
critique = LLM(combined_critique)
else:
critique = internal_critique
# Refine based on combined critique
return Refiner(task, output, critique)
Practical Application
Real-World Example: Self-Reflective Code Generator
import openai
from typing import Dict, List, Tuple
class SelfReflectiveCodeAgent:
def __init__(self, model="gpt-4", max_iterations=3):
self.model = model
self.max_iterations = max_iterations
self.memory = []
def generate_code(self, task: str, test_cases: List[Dict] = None) -> str:
"""
Generate code with self-reflection and iterative improvement.
"""
print(f"Task: {task}\n")
# Initial generation
code = self._generate(task)
print(f"Initial attempt:\n{code}\n")
for iteration in range(1, self.max_iterations + 1):
# Critique current code
critique = self._critique(task, code, test_cases)
print(f"Iteration {iteration} critique:\n{critique['feedback']}\n")
# Check if satisfactory
if critique['quality_score'] >= 8:
print(f"Quality threshold met (score: {critique['quality_score']}/10)")
break
# Store in memory
self.memory.append({
'iteration': iteration,
'code': code,
'critique': critique,
'quality': critique['quality_score']
})
# Refine based on critique
code = self._refine(task, code, critique)
print(f"Iteration {iteration} refined code:\n{code}\n")
return code
def _generate(self, task: str) -> str:
"""Generate initial code attempt."""
prompt = f"""Write Python code to solve this task:
{task}
Provide clean, well-commented code:"""
response = openai.ChatCompletion.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=0.7
)
return self._extract_code(response.choices[0].message.content)
def _critique(self, task: str, code: str, test_cases: List[Dict] = None) -> Dict:
"""Critique current code implementation."""
# Execute tests if provided
test_results = ""
if test_cases:
test_results = self._run_tests(code, test_cases)
prompt = f"""Critique this Python code:
Task: {task}
Code:
```python
{code}
{test_results}
Evaluate on these dimensions:
- Correctness: Does it solve the task correctly?
- Edge cases: Does it handle edge cases?
- Efficiency: Is it reasonably efficient?
- Code quality: Is it readable and well-structured?
Provide:
- Specific issues found
- Suggestions for improvement
- Quality score (0-10)
Format as JSON: {{ “issues”: [“issue1”, “issue2”, …], “suggestions”: [“suggestion1”, “suggestion2”, …], “quality_score”: 7, “feedback”: “detailed critique…” }}"""
response = openai.ChatCompletion.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
import json
try:
critique = json.loads(response.choices[0].message.content)
except:
# Fallback if JSON parsing fails
critique = {
"issues": ["Unable to parse critique"],
"suggestions": [],
"quality_score": 5,
"feedback": response.choices[0].message.content
}
return critique
def _refine(self, task: str, code: str, critique: Dict) -> str:
"""Refine code based on critique."""
# Include memory of previous attempts
previous_attempts = "\n".join([
f"Attempt {m['iteration']}: Score {m['quality']}/10 - {m['critique']['issues']}"
for m in self.memory[-2:] # Last 2 attempts
])
prompt = f"""Improve this Python code based on critique:
Task: {task}
Current code:
{code}
Critique: {critique[‘feedback’]}
Specific issues to fix: {chr(10).join(f"- {issue}" for issue in critique[‘issues’])}
Improvement suggestions: {chr(10).join(f"- {suggestion}" for suggestion in critique[‘suggestions’])}
Previous attempts: {previous_attempts if previous_attempts else “First iteration”}
Write improved code that addresses all critique points:"""
response = openai.ChatCompletion.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
temperature=0.7
)
return self._extract_code(response.choices[0].message.content)
def _extract_code(self, response: str) -> str:
"""Extract code from markdown code blocks."""
if "```python" in response:
code = response.split("```python")[1].split("```")[0]
elif "```" in response:
code = response.split("```")[1].split("```")[0]
else:
code = response
return code.strip()
def _run_tests(self, code: str, test_cases: List[Dict]) -> str:
"""Execute code against test cases and format results."""
results = []
for i, test in enumerate(test_cases):
try:
# Create isolated namespace for code execution
namespace = {}
exec(code, namespace)
# Run test
output = namespace[test['function']](*test['input'])
passed = output == test['expected']
results.append(f"Test {i+1}: {'PASS' if passed else 'FAIL'}")
if not passed:
results.append(f" Input: {test['input']}")
results.append(f" Expected: {test['expected']}")
results.append(f" Got: {output}")
except Exception as e:
results.append(f"Test {i+1}: ERROR - {str(e)}")
return "\nTest Results:\n" + "\n".join(results)
Example usage
if name == “main”: agent = SelfReflectiveCodeAgent(max_iterations=3)
task = """
Write a function that finds the longest palindromic substring in a given string.
Function signature: def longest_palindrome(s: str) -> str
"""
test_cases = [
{'function': 'longest_palindrome', 'input': ['babad'], 'expected': 'bab'},
{'function': 'longest_palindrome', 'input': ['cbbd'], 'expected': 'bb'},
{'function': 'longest_palindrome', 'input': ['a'], 'expected': 'a'},
{'function': 'longest_palindrome', 'input': ['racecar'], 'expected': 'racecar'},
]
final_code = agent.generate_code(task, test_cases)
print("=" * 50)
print("FINAL CODE:")
print(final_code)
### Using Self-Reflection with LangGraph
```python
from langgraph.graph import StateGraph, END
from typing import TypedDict, List
class ReflectionState(TypedDict):
task: str
output: str
critique: str
iteration: int
quality_score: float
max_iterations: int
def generate_output(state: ReflectionState) -> ReflectionState:
"""Generate or regenerate output."""
if state["iteration"] == 0:
# Initial generation
output = llm.generate(state["task"])
else:
# Refinement based on critique
output = llm.refine(
task=state["task"],
previous_output=state["output"],
critique=state["critique"]
)
return {
"output": output,
"iteration": state["iteration"] + 1
}
def critique_output(state: ReflectionState) -> ReflectionState:
"""Evaluate current output quality."""
critique_result = llm.critique(
task=state["task"],
output=state["output"]
)
return {
"critique": critique_result["feedback"],
"quality_score": critique_result["score"]
}
def should_continue(state: ReflectionState) -> str:
"""Decide whether to continue iterating."""
if state["quality_score"] >= 8:
return "satisfied"
if state["iteration"] >= state["max_iterations"]:
return "max_iterations"
return "continue"
# Build the self-reflection graph
workflow = StateGraph(ReflectionState)
workflow.add_node("generate", generate_output)
workflow.add_node("critique", critique_output)
workflow.set_entry_point("generate")
workflow.add_edge("generate", "critique")
workflow.add_conditional_edges(
"critique",
should_continue,
{
"continue": "generate",
"satisfied": END,
"max_iterations": END
}
)
reflection_agent = workflow.compile()
# Use the agent
result = reflection_agent.invoke({
"task": "Write a function to merge two sorted lists",
"output": "",
"critique": "",
"iteration": 0,
"quality_score": 0,
"max_iterations": 3
})
Comparisons & Tradeoffs
Self-Reflection vs Single-Pass Generation
| Aspect | Single-Pass | Self-Reflection |
|---|---|---|
| Quality | Variable, one attempt | Higher, iteratively improved |
| Cost | Low (1 LLM call) | High (3-6 LLM calls) |
| Latency | Fast | Slower (sequential iterations) |
| Best for | Simple tasks, cost-sensitive apps | Complex tasks requiring high quality |
When to use single-pass: Straightforward tasks with high base success rates, real-time applications, cost-constrained scenarios.
When to use self-reflection: Code generation, creative writing, complex problem-solving, any task where output quality is critical and cost/latency are acceptable.
Self-Reflection vs Tool-Based Verification
Tool-based verification: Execute code, run tests, check against formal specifications.
Self-reflection: LLM evaluates its own output based on learned quality criteria.
Tool-based verification is more reliable when available (tests don’t lie), but requires executable environments and formal specifications. Self-reflection works on any output type (text, plans, ideas) and catches issues tools can’t (readability, user experience, semantic correctness).
Best practice: Combine both—use tools for objective verification, self-reflection for subjective quality improvement.
Limitations
Critic reliability: Quality depends on the LLM’s ability to accurately evaluate its own outputs. If the model can’t recognize errors, reflection won’t help.
Diminishing returns: Quality improves logarithmically. Going from 3 to 4 iterations often adds cost without meaningful improvement.
Hallucinated critique: The critic might identify “problems” that don’t exist or suggest “improvements” that make things worse.
Cost multiplier: Self-reflection typically 3-5x the cost of single-pass generation.
Local optima: Iterative refinement may get stuck in local quality maxima rather than finding fundamentally better approaches.
Latest Developments & Research
Recent Papers (2023-2025)
“Reflexion: Language Agents with Verbal Reinforcement Learning” (Shinn et al., 2023)
Introduced the Reflexion framework where agents learn from task failures through verbal feedback stored in memory, improving over time on sequential decision tasks.
“Self-Refine: Iterative Refinement with Self-Feedback” (Madaan et al., 2023)
Showed that single-model iterative refinement improves outputs across diverse tasks (code, math, dialogue) without external feedback or tools.
“CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing” (Gou et al., 2024)
Combined self-reflection with external tool verification, using tools to ground critiques in objective reality and prevent hallucinated feedback.
“Learning to Self-Correct via Reinforcement Learning” (Kumar et al., 2024)
Fine-tuned models specifically for self-correction using RL, showing that explicit training on critique-and-revision improves both critic and refiner quality.
“Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding” (Reynolds et al., 2024)
Used meta-prompts to orchestrate multiple LLM instances—one generator, one critic, one integrator—showing improved performance over single-instance reflection.
Current Benchmarks
Self-reflection shows consistent improvements across benchmarks:
- HumanEval (code generation): +12% pass rate with 2-iteration reflection
- GSM8K (math word problems): +15% accuracy with critique-based refinement
- Creative writing: Human evaluators rate self-refined outputs 30% higher on coherence and quality
- Agent decision tasks: Reflexion improves success rate on ALFWorld and WebShop benchmarks by 20-30%
Open Problems
Optimal iteration count: How do we automatically determine when to stop iterating? Can we learn termination conditions?
Critic training: Can we fine-tune specialized critic models that outperform general-purpose LLMs at evaluation?
Multi-agent reflection: Should generator and critic be separate model instances, or should a single model self-reflect?
Critique hallucination detection: How do we detect when critique is unreliable or hallucinatory?
Scaling to long-horizon tasks: How does self-reflection compose with multi-step agent tasks that span hundreds of actions?
Cross-Disciplinary Insight: Software Engineering
Self-reflection in AI agents mirrors test-driven development (TDD) and continuous integration practices in software engineering:
TDD cycle:
- Write test (specification)
- Write code (generation)
- Run test (critique)
- Refactor (refinement)
Self-reflection cycle:
- Receive task (specification)
- Generate output (generation)
- Critique output (critique)
- Refine output (refinement)
The parallel runs deep: both recognize that first attempts rarely achieve quality standards and that explicit evaluation-improvement loops produce better results than single-pass efforts.
Code review is another analog. In human teams, someone other than the author reviews code. In self-reflective agents, the same model reviews its work—analogous to “reviewing your own code” before submission. While less robust than external review, it catches many issues.
Continuous improvement cultures in software engineering (Kaizen, retrospectives, postmortems) emphasize learning from mistakes and iterating toward excellence. Self-reflection brings this same philosophy to AI generation: outputs are not final until evaluated and refined.
This connection suggests organizational strategies: just as companies invest in testing infrastructure and code review processes, deploying self-reflective agents may require investment in critique prompt engineering, evaluation metrics, and iteration budgets.
Daily Challenge
Problem: Implement a self-reflective agent for mathematical problem-solving that:
- Generates a solution to a math problem
- Verifies its own answer by checking the solution
- If verification fails, critiques its approach and retries
- Repeats for up to 3 iterations
Test problem:
A farmer has 17 sheep, and all but 9 die. How many sheep are left?
(This is a trick question—common mistake is to calculate 17 - 9 = 8, but “all but 9” means 9 remain.)
Your task:
- Implement the three functions:
generate_solution(),verify_solution(),critique_and_refine() - Test on the sheep problem and other math problems
- Track whether self-reflection catches the error and produces the correct answer
Bonus challenges:
- Add a fourth function
meta_critique()that evaluates whether the critique itself is helpful - Implement termination logic that stops early if verification passes
- Test on more complex math problems (algebra, geometry) to see if the pattern generalizes
Hint: The verification step can use symbolic reasoning: “If 9 sheep remain, then 17 - 9 = 8 should have died. But the problem states ‘all but 9 die,’ implying 9 are alive. This is consistent.”
References & Further Reading
Foundational Papers
“Reflexion: Language Agents with Verbal Reinforcement Learning”
Shinn et al., 2023
https://arxiv.org/abs/2303.11366“Self-Refine: Iterative Refinement with Self-Feedback”
Madaan et al., 2023
https://arxiv.org/abs/2303.17651“CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing”
Gou et al., 2024
https://arxiv.org/abs/2305.11738
Advanced Research
“Constitutional AI: Harmlessness from AI Feedback”
Bai et al., 2022
https://arxiv.org/abs/2212.08073“Learning to Self-Correct via Reinforcement Learning”
Kumar et al., 2024
[ArXiv preprint, check latest]
Implementation Resources
LangChain Self-Critique Guide
https://python.langchain.com/docs/use_cases/agents/self_critiqueReflexion GitHub Repository
https://github.com/noahshinn024/reflexionLangGraph Multi-Agent Patterns
https://langchain-ai.github.io/langgraph/
Related Concepts
“Chain-of-Thought Prompting Elicits Reasoning in Large Language Models”
Wei et al., 2022
https://arxiv.org/abs/2201.11903“Tree of Thoughts: Deliberate Problem Solving with Large Language Models”
Yao et al., 2023
https://arxiv.org/abs/2305.10601