Proc meshes & ProcAgent#
What the ProcAgent Is#
Every proc in a mesh runs a ProcAgent. It plays the same role on the proc side that the HostAgent plays on the host side: it implements the control-plane interface for “managing this proc as part of a mesh”.
The agent has several responsibilities, all of which will be documented on this page:
wiring the proc into the mesh router,
handling resource-style requests (
CreateOrUpdate,Stop,GetState,GetRankStatus, …),forwarding or recording supervision events,
tracking the lifecycle of actors created on the proc,
and supporting both the legacy v0 and the current v1 spawn APIs.
This chapter begins with the v1 “resource-style” spawn path:: how a request of the form
CreateOrUpdate<ActorSpec>
results in an actual actor being constructed inside the proc using the Remote registry.
To anchor that discussion, here is the essential shape of the agent:
pub struct ProcAgent {
proc: Proc, // local actor runtime
remote: Remote, // registry of SpawnableActor entries (built from RemoteSpawn + remote!(...))
state: State, // v0/v1 bootstrapping mode
actor_states: HashMap<Name, ActorInstanceState>, // per-actor spawn results & metadata
record_supervision_events: bool,
supervision_events: HashMap<ActorId, Vec<ActorSupervisionEvent>>,
}
proc: ProcThe proc-local runtime into which new actors will be installedremote: RemoteA snapshot of the process-local registry ofSpawnableActorentries (populated fromRemoteSpawnimpls viaremote!(A)). This is the bridge between global type names and the actual constructors used byRemote::gspawn.actor_statesThe agent’s bookkeeping: for each actor name in the mesh, what happened when this proc tried to spawn it.
The sections that follow walk the spawn flow end-to-end. Additional responsibilities (status, supervision, teardown) will be documented after the spawn discussion.
The V1 Spawn Flow#
At a high level, the v1 path for creating an actor on every proc looks like this:
ProcMeshRef ──(CreateOrUpdate<ActorSpec>)──▶ ProcAgent mesh
ProcAgent ──(Remote::gspawn)──▶ Proc / Remote registry
(”ProcMeshRef turns spawn::<A> into a broadcast CreateOrUpdate<ActorSpec> to the ProcAgent mesh; each ProcAgent then calls Remote::gspawn into its local Proc using the Remote registry.”)
From the caller’s point of view it starts as:
proc_mesh.spawn::<A>(cx, "name", ¶ms).await
which is just a thin wrapper over:
proc_mesh
.spawn_with_name::<A>(cx, Name::new("name"), ¶ms)
.await
The rest of this section unpacks what that call actually does.
From spawn to ActorSpec#
The real work happens in spawn_with_name_inner:
impl ProcMeshRef {
async fn spawn_with_name_inner<A: Actor + Referable>(
&self,
cx: &impl context::Actor,
name: Name,
params: &A::Params,
) -> v1::Result<ActorMesh<A>>
where
A::Params: RemoteMessage,
{
let remote = Remote::collect();
// `RemoteSpawn` + `remote!(A)` ensure that `A` has a
// `SpawnableActor` entry in this registry, so
// `name_of::<A>()` can resolve its global type name.
let actor_type = remote
.name_of::<A>()
.ok_or(Error::ActorTypeNotRegistered(type_name::<A>().to_string()))?
.to_string();
let serialized_params = bincode::serialize(params)?;
let agent_mesh = self.agent_mesh();
agent_mesh.cast(
cx,
resource::CreateOrUpdate::<mesh_agent::ActorSpec> {
name: name.clone(),
rank: Default::default(),
spec: mesh_agent::ActorSpec {
actor_type: actor_type.clone(),
params_data: serialized_params.clone(),
},
},
)?;
// ... wait on GetRankStatus and build ActorMesh<A> ...
}
}
What this does, step by step:
Resolve the Rust type
Ato a global type namelet remote = Remote::collect(); let actor_type = remote .name_of::<A>() .ok_or(Error::ActorTypeNotRegistered(type_name::<A>().to_string()))? .to_string();
This is the point where the type-level contract kicks in:
elsewhere, the user has written
remote!(MyActor)for eachA: RemoteSpawn,that registration adds a
SpawnableActorentry to theRemoteregistry,Remote::name_of::<A>()looks up that entry and reads its global type name.
If
Awas never registered withremote!(A), this call fails withActorTypeNotRegistered, and the spawn never leaves the caller’s process.Serialize the spawn parameters
let serialized_params = bincode::serialize(params)?;
Spawn parameters travel as opaque bytes. The API only enforces that
A::Params: RemoteMessage, meaning the caller’s side can serialize them. On the remote side there is no trait bound — the generatedRemoteSpawn::gspawnsimply attempts to deserialize the incoming byte payload intoA::Paramsand will return an error if it cannot.Broadcast a resource-style
CreateOrUpdate<ActorSpec>let agent_mesh = self.agent_mesh(); agent_mesh.cast( cx, resource::CreateOrUpdate::<mesh_agent::ActorSpec> { name: name.clone(), rank: Default::default(), spec: mesh_agent::ActorSpec { actor_type: actor_type.clone(), params_data: serialized_params.clone(), }, }, )?;
This is where the proc mesh turns a local method call into a distributed control-plane request:
agent_meshis anActorMeshRef<ProcAgent>– oneProcAgentper proc,castsends the sameCreateOrUpdate<ActorSpec>to everyProcAgent,the
namefield is the mesh-level actor name (“this actor, on this mesh”),actor_typeis the global type name resolved viaRemote,params_datais the serializedA::Params.
At this point the proc mesh has done its part: it has told every proc in the mesh: “For mesh actor name, please ensure you have one local actor of type actor_type, constructed from params_data.”
How ProcAgent handles CreateOrUpdate<ActorSpec>#
Once the ProcMeshRef has broadcast a CreateOrUpdate<ActorSpec> to every proc, each proc’s ProcAgent receives that message and attempts to construct the actor locally.
The entry point is:
#[async_trait]
impl Handler<resource::CreateOrUpdate<ActorSpec>> for ProcAgent {
async fn handle(
&mut self,
_cx: &Context<Self>,
create_or_update: resource::CreateOrUpdate<ActorSpec>,
) -> anyhow::Result<()> {
...
}
}
This handler performs four steps:
Idempotence: only the first
CreateOrUpdatematters
if self.actor_states.contains_key(&create_or_update.name) {
// There is no update.
return Ok(());
}
The CreateOrUpdate resource verb supports “update” in principle, but actor meshes never update an existing actor by name. They only create a fresh actor mesh.
So the agent simply ignores subsequent requests for the same name.
Safety check: reject spawn if the proc has supervision errors
if !self.supervision_events.is_empty() {
self.actor_states.insert(
create_or_update.name.clone(),
ActorInstanceState {
spawn: Err(anyhow::anyhow!(
"Cannot spawn new actors on mesh with supervision events"
)),
create_rank,
stopped: false,
},
);
return Ok(());
}
If this proc previously recorded any supervision events for any actor, the proc is considered “poisoned”: it may be in a bad state, and spawning new actors would be unsafe.
The agent records the failure in actor_states and stops.
Later, when the ProcMesh::spawn_with_name_inner calls GetRankStatus::wait to aggregate per-rank results, this proc will contribute a Failed status for that actor name instead of ever reporting it as Running.
Unpack
ActorSpecand callremote.gspawn
let ActorSpec {
actor_type,
params_data,
} = create_or_update.spec;
self.actor_states.insert(
create_or_update.name.clone(),
ActorInstanceState {
create_rank,
spawn: self
.remote
.gspawn(
&self.proc,
&actor_type,
&create_or_update.name.to_string(),
params_data,
)
.await,
stopped: false,
},
);
This is the core of v1 spawning. The agent:
unpacks the
ActorSpec(type name + parameter bytes), andpasses those pieces into
remote.gspawn(...)to construct the local actor.actor_type: String– the logical type name registered byremote!(A), computed inProcMeshRef::spawn_with_name_innerviaremote.name_of::<A>(), and used byRemote::gspawnon each proc to find the right constructor.params_data: DataA raw byte buffer containing serializedA::Params(viabincode::serialize).self.remote.gspawn(...)This method looks up theSpawnableActorentry foractor_typein the localRemoteregistry then invoks:
SpawnableActor::spawn(proc, name, params_data)
Internally this calls the actor’s RemoteSpawn::new(params).await construtor registers it under the given name in the proc’s runtime, and returns an ActorId.
The result – success or failure – is recorded in:
ActorInstanceState {
create_rank, // this proc's rank in the mesh
spawn: Result<ActorId, anyhow::Error>,
stopped: false,
}
The actor_states map is later queried by GetRankStatus and GetState.
Return success locally (no direct reply)
Once the agent has updated actor_states, the handler simply returns:
Ok(())
There is no direct reply back to the caller for CreateOrUpdate<ActorSpec>.
From the agent’s point of view, the work for the message is:
decide whether to attempt a spawn (idempotence + supervision gate),
call
remote.gspawn(...)into the localProc,record the outcome in
actor_states[name]asActorInstanceState.
That’s it. The handler does not try to decide whether the mesh-level spawn “succeeded” or “failed” - it just persists the per-proc result.
Those per-proc results are later read by the resource query handlers (Handler<GetRankStatus>, Handler<GetState<ActorState>> on ProcAgent).
Completing the Spawn: How GetRankStatus Decides Success#
Once every ProcAgent has received the CreateOrUpdate<ActorSpec> message and updated its local actor_states, the caller still does not know:
Did every proc spawn the actor successfully?
Did any proc report a supervision failure?
Are all actors running, or did one terminate immediately?
To answer these questions, the ProcMeshRef performs a second distributed query using the resource verb:
resource::GetRankStatus{ name, reply }
This message is broadcast to the same ProcAgent mesh. Each agent replies with a small “overlay” describing its result for that actor name:
no entry yet ->
NotExistspawn failed ->
Failed(error)spawned and running ->
Runningterminated ->
Stopped/Failed,supervision events present ->
Failed.
The reply port used to collect all GetRankStatus responses is opened via:
let (port, rx) = cx.mailbox().open_accum_port_opts(
StatusMesh::from_single(region.clone(), Status::NotExist),
Some(ReducerOpts { max_update_interval: Some(Duration::from_millis(50)) }),
);
Here, cx is the callers context. In tests this is typically testing::instance(), a tiny driver actor (Instance<>()), so the accumulation port (port/rx)-and thus all collected replies-live in that test instance’s mailbox.
An accumulation port is just a mailbox port that keeps a running aggregate value. Each GetRankStatus reply is an overlay, and the mailbox’s reducer merges those overlays into a single StatusMesh, with one final status per proc/rank.