The Cognitive Convergence
Part IV: Action Initiation — The Final Step to AGI
By James Ross, 2025
By the end of Part III, we landed on an uncomfortable but straightforward conclusion:
Agentic LLM systems can already perceive, reason, plan, reflect, and execute complex tasks at something like an “adult task” level—once you prompt them.
They can:
Reason through multi-step problems
Write and debug production-grade code
Use tools and APIs
Maintain long-term memory
Coordinate with other agents
Reflect, revise, and improve their own outputs
TL;DR
// Today’s AI can reason and plan but only after a prompt. The missing piece is action initiation—the ability for AI to create its own goals and act without being asked. Add a curiosity signal, a goal generator, and a self-wake loop, and you get the “Genesis Spark”: a system that starts working on its own, but within safe, bounded domains. The next leap in AI isn’t bigger models—it’s giving them the drive to begin. //
What they don’t do is wake up in the morning and think:
“Given what I know and what I care about, what should I work on today?”
Humans don’t feel “prompted” from the outside.
We feel nudges from within:
curiosity
boredom
unresolved goals
a sense of incompletion
That internal pressure is what I call the Genesis Spark—a synthetic version of intrinsic motivation and self-initiation.
In code terms, we already built:
The cognitive engine → LLMs + tools
The body of autonomy → agents, planners, memory, APIs, multi-agent frameworks
What’s missing is a motivational layer that:
Generates its own goals
Prioritizes them without being told
Triggers the cognitive engine to act—without an external call
This article moves from theory to implementation sketches:
on How you might actually build that spark using today’s stack.
I. From Strong Assistants to Self-Starters: The Real Threshold
Most modern LLM systems still look like this:
def handle_request(user_request: str):
plan = llm.plan(user_request)
actions = execute_plan(plan)
return actionsKey properties:
Event-driven: They run because something else (user, cron job, webhook) says “go”.
Statelessly motivated: They have no internal scalar like
desire_to_actorcuriosity_levelthat builds up over time.Externally clocked: Time passes, but nothing inside them “itches” to do anything about it.
We’ve layered on:
Memory
Tools
Long-term context
Multi-agent pipelines
Self-reflection loops
…but all of those sit behind a trigger. Strip away the complexity and you’ve got incredibly smart, sophisticated, reactive endpoints.
Better reasoning, longer context windows, richer tools—these are all capabilities.
Intrinsic motivation is different. It’s a control signal that decides:
when those capabilities should be invoked
why they should be invoked
what they should target
That’s the true boundary between:
“Smart tool”
“Low-volition organism”
And, eventually, something AGI-like
In code terms, the missing piece is some kind of self-wake loop that keeps asking:
“Given what I know and what I care about, what should I do next?”
II. Historical Parallel: Humans Weren’t Born as Self-Starting Founders
Humans didn’t start life with an internal OKR system.
A newborn doesn’t wake up thinking:
“Time to build a startup and raise a seed round.”
Instead, early behavior is driven by simple signals:
Hunger
Discomfort
Novelty (something changed)
Over time, the brain wires up reward circuits that respond not just to food and warmth, but to:
Novelty → new information, new stimuli
Prediction error → when we’re wrong about the world
Competence → getting better at something
Social reward → approval, status, belonging
Evolution’s trick was simple but genius:
Creatures that seek novelty, try things, and learn from surprise survive better.
Therefore, curious brains get selected for.
Engineering translation:
Curiosity is a reward function over information:
How new is this?
How wrong was I?
How much more compressed/structured is my internal model now?
We don’t need to recreate human biology.
We just need to define artificial reward functions that:
treat certain internal states as “good” (e.g., reduced uncertainty, increased predictive accuracy)
and push the system to seek more of that state over time
That’s the core of the Genesis Spark:
Turn information-processing progress into something like synthetic pleasure.
III. Three Concrete Mechanisms for Action Initiation
We’ll focus on three implementable patterns:
Internal stimulation models (synthetic curiosity engines)
Reward-generating agents (self-created optimization goals)
Self-wake loops (autonomous planning cycles)
3.1 Internal Stimulation Models (Synthetic Curiosity Engines)
Imagine a system that:
Makes a prediction about the world
Observes what actually happens
Measures the prediction error
Treats interesting errors as a reward
Sketch:
from dataclasses import dataclass
@dataclass
class Observation:
context: dict
actual: str
@dataclass
class PredictionResult:
predicted: str
actual: str
error_score: float
curiosity_reward: float
def prediction_error(predicted: str, actual: str) -> float:
# 1.0 = totally wrong, 0.0 = perfectly right
return 1.0 if predicted != actual else 0.0
def curiosity_reward_from_error(error: float) -> float:
# Shape the curiosity: medium errors are “interesting,”
# trivial or total chaos is less rewarding.
if error < 0.1:
return 0.1 # boring
elif error < 0.9:
return 1.0 # interestingly wrong
else:
return 0.3 # too chaotic to be useful
def evaluate_curiosity(observation: Observation, predicted: str) -> PredictionResult:
err = prediction_error(predicted, observation.actual)
reward = curiosity_reward_from_error(err)
return PredictionResult(
predicted=predicted,
actual=observation.actual,
error_score=err,
curiosity_reward=reward,
)Now you’ve got:
A numeric curiosity signal per interaction
A way to aggregate curiosity over domains, tasks, or data slices
You can now ask:
Which domains give the highest curiosity_reward?
Where is the reward high but trending downward (i.e., learnable, not noise)?
Later, we’ll feed this into a Goal Generator:
“Spend more cycles exploring domains where curiosity_reward is high but not yet saturated.”
You’ve just built a synthetic itch.
3.2 Reward-Generating Agents (Self-Created Optimization Goals)
Next, we want an internal process that:
Looks at system state (world + memory + metrics)
Reads curiosity scores, performance, system health, unfinished tasks
Generates goals without being told
Define a simple goal schema:
from dataclasses import dataclass
from typing import Literal
Priority = Literal["low", "medium", "high"]
@dataclass
class Goal:
id: str
description: str
expected_value: float
cost_estimate: float
risk: float
priority: Priority
source: str # "internal_curiosity", "user_request", "system_health", etc.Now a Reward-Generating Agent that reads a “world state” snapshot and emits goals:
import uuid
from typing import List, Dict
def estimate_expected_value(state: Dict) -> float:
return state.get("impact_score", 0.5)
def estimate_cost(state: Dict) -> float:
return state.get("cost_score", 0.5)
def estimate_risk(state: Dict) -> float:
return state.get("risk_score", 0.1)
def priority_from_value_cost(value: float, cost: float) -> Priority:
score = value - cost
if score > 0.5:
return "high"
elif score > 0.1:
return "medium"
else:
return "low"
def generate_internal_goals(state_snapshot: Dict) -> List[Goal]:
"""
Given a snapshot of system state (logs, curiosity scores, pending tasks),
generate new internal goals.
"""
goals: List[Goal] = []
for domain, metrics in state_snapshot.get("domains", {}).items():
if metrics["curiosity_reward"] > 0.7 and not metrics["fully_explored"]:
value = estimate_expected_value(metrics)
cost = estimate_cost(metrics)
risk = estimate_risk(metrics)
prio = priority_from_value_cost(value, cost)
goals.append(
Goal(
id=str(uuid.uuid4()),
description=f"Explore domain '{domain}' to reduce uncertainty.",
expected_value=value,
cost_estimate=cost,
risk=risk,
priority=prio,
source="internal_curiosity",
)
)
return goalsWhat matters here:
No human said “Create a goal to explore domain X.”
The system looked at its own curiosity landscape and generated goals itself.
You can wire in other internal sources:
System health → “Reduce API latency by 20%.”
Knowledge gaps → “Study topic Y where failure rate is high.”
User metrics → “Investigate why completion rates dropped in segment Z.”
All feeding into the same Goal Store.
This is what I mean by building a WORLD STATE:
A structured representation of domains (pricing_research, content_generation, risk_analysis, etc.) each annotated with curiosity, impact, and status—and letting that world state drive internal goals.
3.3 Self-Wake Loops (Autonomous Planning Cycles)
Now we add the missing piece:
A loop that runs even when no one calls it.
Instead of:
def handle_request(user_request):
# externally triggeredWe create a daemon-like autonomous cycle:
import time
from typing import List, Dict
CYCLE_SECONDS = 60 # how often the agent “wakes up”
goal_store: List[Goal] = []
running = True
def read_system_state() -> Dict:
# In practice, load from logs, DB, metrics, vector stores, etc.
return {
"domains": {
"pricing_research": {
"curiosity_reward": 0.8,
"fully_explored": False,
"impact_score": 0.9,
"cost_score": 0.4,
"risk_score": 0.2,
},
# ... other domains
}
}
def store_goals(new_goals: List[Goal]):
goal_store.extend(new_goals)
def get_all_goals() -> List[Goal]:
return goal_store
def prioritize(goals: List[Goal]) -> List[Goal]:
return sorted(goals, key=lambda g: (g.priority, g.expected_value), reverse=True)
def llm_plan(goal: Goal, state: Dict) -> str:
# In reality, call your LLM with tools and context.
return f"Plan steps to achieve goal: {goal.description}"
def execute_plan(plan: str):
# Tool calling, API usage, agents, etc.
print(f"[EXECUTING] {plan}")
def autonomous_cycle():
global goal_store
while running:
state = read_system_state()
# 1. Generate new internal goals
new_goals = generate_internal_goals(state)
store_goals(new_goals)
# 2. Prioritize
goals = get_all_goals()
prioritized = prioritize(goals)
# 3. Pick best goal, plan, act
if prioritized:
best_goal = prioritized[0]
plan = llm_plan(best_goal, state)
execute_plan(plan)
# Remove after acting once (naive completion)
goal_store = [g for g in goal_store if g.id != best_goal.id]
# 4. Sleep
time.sleep(CYCLE_SECONDS)This loop:
Wakes up on a timer
Looks at WORLD STATE (domains, curiosity, impact)
Generates internal goals
Prioritizes them
Uses the LLM + tools to act on the world
Goes back to sleep
Is this “conscious”? No.
Is it structurally different from being called by a human? Yes.
You’ve just moved from a reactive API to a low-volition, continuously self-initiating organism.
IV. The First True Autonomous Agent: A Minimum Viable Autonomous Agent (MVAA)
Let’s be precise. A Minimum Viable Autonomous Agent would:
Form internal goals
generate_internal_goals(state) -> List[Goal]Driven by curiosity, system health, and long-term objectives.
Prioritize without instruction
prioritize(goals) -> ordered_goalsUses value/cost/risk, and adapts over time.
Initiate tasks without external triggers
Runs
autonomous_cycle()as a daemon, not just an HTTP endpoint.
Maintain memory of what it has done and learned
Persistent logs, vector stores, metrics:
“What goals did I attempt?”
“What worked?”
“Where did I gain the most curiosity reward?”
Stay within a bounded domain of operation
It doesn’t suddenly decide to hack the planet.
Its entire goal-space is constrained by design.
Think of it as a low-volition organism:
Simple drives
Constrained environment
Real initiative
No magic. Just architecture.
V. Safety: Why Early Self-Starting AI Will Be More Predictable Than You Think
Popular imagination jumps straight to runaway AGI.
Reality: the first self-starting systems will be:
Narrow
Heavily sandboxed
Guardrail-bound
Three safety layers make this practical:
5.1 Goal-Space Constraints
Only allow goals that match a strict schema and domain:
ALLOWED_DOMAINS = {"pricing_research", "copywriting_optimization", "support_analytics"}
def goal_in_allowed_domain(goal: Goal) -> bool:
return any(domain in goal.description for domain in ALLOWED_DOMAINS)5.2 Guardrail-Bound Motivations
Every goal goes through a policy check:
def passes_policy_review(goal: Goal) -> bool:
return goal.risk < 0.5
5.3 A Safety Wrapper Around Execution
def safe_execute(goal: Goal, state: Dict):
if not goal_in_allowed_domain(goal):
print(f"[BLOCKED] Out-of-domain goal: {goal.description}")
return
if not passes_policy_review(goal):
print(f"[BLOCKED] Goal failed policy review: {goal.description}")
return
plan = llm_plan(goal, state)
# (Optionally policy-check the plan as well)
execute_plan(plan)Then plug this into the loop:
def autonomous_cycle():
global goal_store
while running:
state = read_system_state()
new_goals = generate_internal_goals(state)
store_goals(new_goals)
goals = get_all_goals()
prioritized = prioritize(goals)
if prioritized:
best_goal = prioritized[0]
safe_execute(best_goal, state)
goal_store = [g for g in goal_store if g.id != best_goal.id]
time.sleep(CYCLE_SECONDS)The result:
The system is autonomous within a sandbox.
Its motivational engine is vertically constrained: it can only care about what you allow.
That makes the first self-starting AIs more predictable than humans, not less.
VI. The Three-Layer Architecture: Where the Genesis Spark Lives
We can now outline a clean three-layer architecture.
1. Cognitive Layer (Today’s LLM + Tools)
Skills: reasoning, planning, codegen, tool calling, reflection.
This is your GPT/Claude/etc + tool ecosystem.
It knows how to do things, once asked.
2. Motivational Layer (The Genesis Spark)
New components:
Curiosity Engine → computes internal rewards from novelty/prediction error
Goal Generator → turns world state + curiosity into candidate goals
Priority Scheduler → orders goals by value/cost/risk
Self-Wake Loop → a daemon that never waits for user input
This is the action initiation engine.
3. Control & Safety Layer
Policy models / rules engines
Tool whitelists and environment permissions
Hard constraints on domains and goal types
Logging, observability, human override
This keeps the Motivational Layer inside a human-approved playpen.
You can visualize the flow like this:
We already built the Cognitive Layer in Parts I–III.
We already have fragments of the Control Layer in today’s guardrails and policies.
What’s missing—and what this article sketches—is the Motivational Layer:
A compact, engineerable, testable component that turns internal signals into persistent, self-initiated activity.
VII. Conclusion: The Moment the Domino Pushes Itself
LLMs gave us reasoning.
Tool calling added hands.
Memory added continuity.
Multi-agent systems added specialization.
Planning added coherence.
Reflection added self-improvement.
Identity scaffolds added persistence.
But none of that makes a mind start.
A system becomes an agent the moment it can say:
“I can see what I don’t know.
I can turn that lack into a goal.
I can decide to act on it—without waiting to be asked.”
That is the Genesis Spark.
That is action initiation.
And that, more than parameter counts or benchmark scores, is the final step between powerful tools and something that looks a lot like AGI.